- 21 Aug, 2020 4 commits
-
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
- 12 Aug, 2020 1 commit
-
-
van den Berg authored
The directory output for the fastqc tasks is causing issues on the shared file system of the cluster, since it cannot properly determine the age of the folder. As a result, it re-runs the fastqc tasks every time a workflow is restarted, regardless of whether the task has already completed. To prevent this, a single dummy output file '.done' has been added to the fastqc tasks which will be written when fastqc exits successfully.
-
- 07 Aug, 2020 3 commits
-
-
van den Berg authored
The base recalibration (BQSR) step of the pipeline can take up to 7 hours for WGS samples, which is a significant part of the total run time. The developers of GATK state that BQSR requires at least 100M bases per read group: "We usually expect to see more than 100M bases per read group; as a rule of thumb, larger numbers will work better." A human WGS sample with an average read depth of 43x has almost 1300 times that amount of bases. The analysis of these samples would be sped up greatly by restricting BQSR to a single chromosome.
-
van den Berg authored
The base recalibration step of the pipeline can take up to 7 hours for WGS samples, which is a significant part of the total run time. The developers of GATK state that BQSR requires at least 100M bases per read group: "We usually expect to see more than 100M bases per read group; as a rule of thumb, larger numbers will work better." A human WGS sample with an average read depth of 43x has almost 1300 times that amount of bases. The analysis of these samples would be sped up greatly by restricting BQSR to a single chromosome.
-
van den Berg authored
Use 8 cores instead of just 1, according to the documentation speed should almost scale linear with the cores provided. Also reduce the compression level on the output file, since most time for cutadapt is spent re-compressing the data after trimming the adapters.
-
- 29 Jul, 2020 2 commits
-
-
van den Berg authored
-
van den Berg authored
-
- 28 Jul, 2020 5 commits
-
-
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
- 24 Jul, 2020 4 commits
-
-
van den Berg authored
-
van den Berg authored
Revert "Remove explicit tmp folder" See merge request !17
-
van den Berg authored
This reverts commit 1b7d807f
-
van den Berg authored
-
- 22 Jul, 2020 5 commits
-
-
van den Berg authored
This should no longer be needed on the slurm cluster, where each task can request the amount of tmp space it requires explicitly. Removing the tmp from the shared filesystem back onto the host running the analysis should also improve the performance of the pipeline.
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
This way, the same version of picard is used in all tasks in the pipeline. This fixes issue #38
-
van den Berg authored
-
- 26 Jun, 2020 2 commits
-
-
van den Berg authored
Add optional bed coverage output files See merge request !16
-
van den Berg authored
-
- 25 Jun, 2020 2 commits
-
-
van den Berg authored
-
van den Berg authored
-
- 24 Jun, 2020 3 commits
-
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
- 23 Jun, 2020 1 commit
-
-
van den Berg authored
Unfortunately, this adds a limitation on the sample names that can be used with Hutspot, since the naming of the samples in the multiQC parsed output of picard MarkDuplicates is partly ambiguous. This limitation has been added to the readme, and a check has been added to the pipeline snakefile to throw an error when overlapping sample names are detected.
-
- 22 Jun, 2020 1 commit
-
-
van den Berg authored
Previously, Hutspot supported multiple bed files to calculate coverage against. Because of this, the stats.json file had a nested structured where the coverage based on each bed file was stored, including the name of the bed file and the gender, according to that bed file. Since the current version of Hutspot only supports a single bed file, this structure has been simplified. All coverage statistics are now directly under 'coverage' for each sample, and the 'gender' has been moved out of the 'coverage' statistics to the sample level.
-
- 17 Jun, 2020 2 commits
-
-
van den Berg authored
-
van den Berg authored
-
- 03 Jun, 2020 5 commits
-
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
Hutspot now supports two bed files to calculate coverage. One is the `targetsfile`, which was called `bedfile` before, and holds the targets of the capture kit. The other one is `baitsfile`, which holds the bait locations of the capture kit. It is possible to specify only the `targetsfile`, but if you specify the `baitsfile`, the `targetsfile` must be specified as well, since both are required by picard HsMetrics. Also added a test for invalid configuration files, and shortened the jsonschema validation error to only show the human readable message.
-