Commit 095305f0 authored by van den Berg's avatar van den Berg
Browse files

Add an option to restrict BaseRecalibration

The base recalibration step of the pipeline can take up to 7 hours for
WGS samples, which is a significant part of the total run time.

The developers of GATK state that BQSR requires at least 100M bases per
read group: "We usually expect to see more than 100M bases per read
group; as a rule of thumb, larger numbers will work better."

A human WGS sample with an average read depth of 43x has almost 1300
times that amount of bases. The analysis of these samples would be sped
up greatly by restricting BQSR to a single chromosome.
parent b585ff80
......@@ -156,6 +156,7 @@ The following configuration options are **optional**:
| `female_threshold` | Float between 0 and 1 that signifies the threshold of the ratio between coverage on X/overall coverage that 'calls' a sample as female. Default = 0.6 |
| `scatter_size` | The size of chunks to divide the reference into for parallel execution. Default = 1000000000 |
| `coverage_threshold` | One or more threshold coverage values. For each value, a sample specific bed file will be created that contains the regions where the coverage is above the threshold |
| `restrict_BQSR` | Restrict GATK BaseRecalibration to a single chromosome. This is faster, but the recalibration is possibly less reliable |
## Cluster configuration
......
......@@ -237,11 +237,13 @@ rule baserecal:
known_sites = " ".join(
expand("-knownSites {vcf}", vcf=config["known_sites"])
),
region = "-L "+ config["restrict_BQSR"] if "restrict_BQSR" in config else "",
bams = bqsr_bam_input
container: containers["gatk"]
shell: "java -XX:ParallelGCThreads=1 -jar /usr/GenomeAnalysisTK.jar -T "
"BaseRecalibrator {params.bams} -o {output} -nct 8 "
"-R {input.ref} -cov ReadGroupCovariate -cov QualityScoreCovariate "
"{params.region} "
"-cov CycleCovariate -cov ContextCovariate {params.known_sites}"
checkpoint scatterregions:
......
......@@ -14,6 +14,7 @@
"female_threshold",
"bedfile",
"coverage_threshold",
"restrict_BQSR",
"baitsfile"
],
"properties": {
......@@ -73,6 +74,10 @@
"type": "array",
"minItems": 1
},
"restrict_BQSR": {
"description": "Restrict BQSR to the listed chromosome",
"type": "string"
},
"refflat": {
"description": "RefFlat file with transcripts",
"type": "string"
......
......@@ -12,6 +12,8 @@
- (100%) done
must_not_contain:
- rror
must_not_contain_regex:
- 'BaseRecalibrator.* -L '
files:
- path: micro/vcf/micro.vcf.gz
contains_regex:
......@@ -283,3 +285,14 @@
- path: 'micro/vcf/micro_196.bed'
contains:
- "(null)\t0\t0"
- name: test-integration-restrict-BQSR
tags:
- integration
command: >
snakemake --use-singularity --singularity-args ' --containall --bind /tmp '
--jobs 1 -w 120 -r -p
--configfile tests/data/config/sample_config_restrict_BQSR.json
stderr:
contains_regex:
- 'BaseRecalibrator.* -L chrM '
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment