- 23 Apr, 2020 4 commits
-
-
Ruben Vorderman authored
-
Ruben Vorderman authored
-
van den Berg authored
-
van den Berg authored
-
- 22 Apr, 2020 11 commits
-
-
Ruben Vorderman authored
-
Ruben Vorderman authored
This reverts commit a42d185f.
-
Ruben Vorderman authored
-
Ruben Vorderman authored
-
van den Berg authored
-
van den Berg authored
-
Ruben Vorderman authored
-
van den Berg authored
-
van den Berg authored
Base recalibration takes a long time to run. By running the base recalibration on the separate per-readgroup bam files, instead of on the output of the markduplicates step, we can run these tasks earlier in the pipeline, and in parallel with the markduplicates step. This reduces the total runtime of the pipeline. This commit also adds a test to make sure that statistics for both read groups are present in the base recalibration output file.
-
van den Berg authored
-
van den Berg authored
-
- 21 Apr, 2020 11 commits
-
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
- 20 Apr, 2020 5 commits
-
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
This script currently only collects data from the log file of cutadapt, which has details on the number of reads and bases before and after trimming.
-
van den Berg authored
-
- 18 Apr, 2020 2 commits
-
-
van den Berg authored
The cutadapt summary file contains statistics we want to report, such as the number of reads and bases before and after trimming. This way, we do not need to compute these statistics after the fact, which would require parsing the large fastq files an additional time.
-
van den Berg authored
Instead of merging fastq files as the first step of the pipeline, merge as late as possible to make better use of parallelism, and to prevent unnecessary reading/writing of all data. Currently, reads are trimmed and mapped per read group, and are merge in the picard MarkDuplicates step. Therefore, samples are merged as a side effect of another task that was performed as well. Additionally, fastq processing is now done in a single step using cutadapt, instead of using both sickle and cutadapt sequentially. As part of these changes, the following changes were made: - Use cutadapt to trim both adapters and low quality reads - Run bwa align on each readgroup independently - Run fastqc on each readgroup independenly - Pass multiple bam files to picard MarkDuplicates - Remove safe_fastqc.sh script - Remove fastqc_stats - Remove fastqc coverage from covstats - Update test data for slight differences in output vcf files - Add tests for fastqc zip files
-
- 17 Apr, 2020 2 commits
-
-
van den Berg authored
-
van den Berg authored
-
- 16 Apr, 2020 5 commits
-
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-