Snakefile · 8670b32abefaad36e9e5ddbb1033b42fd3f1f648 · Klinische Genetica / capture-lumc / hutspot

van den Berg authored Apr 18, 2020

Instead of merging fastq files as the first step of the pipeline, merge
as late as possible to make better use of parallelism, and to prevent
unnecessary reading/writing of all data. Currently, reads are trimmed
and mapped per read group, and are merge in the picard MarkDuplicates
step. Therefore, samples are merged as a side effect of another task
that was performed as well.

Additionally, fastq processing is now done in a single step using
cutadapt, instead of using both sickle and cutadapt sequentially.

As part of these changes, the following changes were made:
 - Use cutadapt to trim both adapters and low quality reads
 - Run bwa align on each readgroup independently
 - Run fastqc on each readgroup independenly
 - Pass multiple bam files to picard MarkDuplicates
 - Remove safe_fastqc.sh script
 - Remove fastqc_stats
 - Remove fastqc coverage from covstats
 - Update test data for slight differences in output vcf files
 - Add tests for fastqc zip files

8670b32a