- Mar 10, 2021
van den Berg authored
- Mar 02, 2021
van den Berg authored
- Jan 07, 2021
van den Berg authored
- Dec 10, 2020
van den Berg authored
MultiQC can run out of /tmp space on execution nodes on shark, leading to incorrect results or crashes. This commit fixes this problem by specifying a default location for the python tempfiles that MultiQC uses via the environment variable TMPFILE inside of the shell block for the MultiQC rule.
- Aug 28, 2020
van den Berg authored
This is supposed to speed up MarkDuplicates by up to 16%, see https://github.com/broadinstitute/picard/issues/902 for details.
- Aug 27, 2020
van den Berg authored
van den Berg authored
This reverts commit 0cf92f4b. bwa-mem2 can crash when multiple processes are running on the same system. See https://github.com/bwa-mem2/bwa-mem2/issues/88 for details.
- Aug 25, 2020
van den Berg authored
van den Berg authored
van den Berg authored
van den Berg authored
- Aug 24, 2020
van den Berg authored
van den Berg authored
van den Berg authored
- Aug 21, 2020
van den Berg authored
van den Berg authored
- Start every entry on a new indented line - Make the order of the entries consistent in every rule - Ensure equal signs are surrounded by spaces
van den Berg authored
van den Berg authored
van den Berg authored
van den Berg authored
van den Berg authored
Add test to make sure the markdup and baserecal groups receive the correct inputs when a sample has multiple readgroups.
van den Berg authored
van den Berg authored
van den Berg authored
van den Berg authored
van den Berg authored
van den Berg authored
van den Berg authored
van den Berg authored
van den Berg authored
- Aug 20, 2020
van den Berg authored
- Aug 12, 2020
van den Berg authored
The directory output for the fastqc tasks is causing issues on the shared file system of the cluster, since it cannot properly determine the age of the folder. As a result, it re-runs the fastqc tasks every time a workflow is restarted, regardless of whether the task has already completed. To prevent this, a single dummy output file '.done' has been added to the fastqc tasks which will be written when fastqc exits successfully.
- Aug 07, 2020
van den Berg authored
The base recalibration step of the pipeline can take up to 7 hours for WGS samples, which is a significant part of the total run time. The developers of GATK state that BQSR requires at least 100M bases per read group: "We usually expect to see more than 100M bases per read group; as a rule of thumb, larger numbers will work better." A human WGS sample with an average read depth of 43x has almost 1300 times that amount of bases. The analysis of these samples would be sped up greatly by restricting BQSR to a single chromosome.
van den Berg authored
Use 8 cores instead of just 1, according to the documentation speed should almost scale linear with the cores provided. Also reduce the compression level on the output file, since most time for cutadapt is spent re-compressing the data after trimming the adapters.
- Jul 29, 2020
van den Berg authored
- Jul 28, 2020
van den Berg authored
- Jul 24, 2020
van den Berg authored
This reverts commit 1b7d807f
van den Berg authored
- Jul 22, 2020
van den Berg authored
This should no longer be needed on the slurm cluster, where each task can request the amount of tmp space it requires explicitly. Removing the tmp from the shared filesystem back onto the host running the analysis should also improve the performance of the pipeline.
van den Berg authored