Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Klinische Genetica
capture-lumc
hutspot
Commits
7b73fae2
Commit
7b73fae2
authored
Aug 29, 2019
by
van den Berg
Browse files
Remove dependency on external GATK jar file
parent
142fa3cc
Changes
2
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
7b73fae2
# Hutspot
This is a multisample DNA variant calling pipeline based on Snakemake, bwa and the
GATK HaplotypeCaller.
GATK HaplotypeCaller.
## Features
*
Any number of samples is supported
...
...
@@ -78,15 +78,6 @@ Please see the installation instructions
to do that.
## GATK
For license reasons, conda and singularity cannot fully install the GATK. The JAR
must be registered by running
`gatk-register`
after the environment is
created, which conflicts with the automated environment/container creation.
For this reason, hutspot
**requires**
you to manually specify the path to
the GATK executable JAR via
`--config GATK=/path/to/gatk.jar`
.
## Operating system
Hutspot was tested on Ubuntu 16.04 only.
...
...
@@ -142,7 +133,6 @@ The following configuration values are **required**:
| ------------- | ----------- |
|
`REFERENCE`
| Absolute path to fasta file |
|
`SAMPLE_CONFIG`
| Path to config file as described above |
|
`GATK`
| Path to GATK jar.
**Must**
be version 3.7 |
|
`DBSNP`
| Path to dbSNP VCF |
|
`ONETHOUSAND`
| Path to 1000Genomes VCF |
|
`HAPMAP`
| Path to HapMap VCF |
...
...
@@ -216,7 +206,6 @@ snakemake -s Snakefile \
--restart-times
2
\
--config
SAMPLE_CONFIG
=
samples.json
\
REFERENCE
=
/path/to/genome.fasta
\
GATK
=
/path/to/GenomeAnalysisTK.jar
\
DBSNP
=
/path/to/dbsnp.vcf.gz
\
ONETHOUSAND
=
/path/to/onekg.vcf
\
HAPMAP
=
/path/to/hapmap.vcf
\
...
...
Snakefile
View file @
7b73fae2
...
...
@@ -14,7 +14,7 @@
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
Main Snakefile for the pipeline.
Main Snakefile for the pipeline.
:copyright: (c) 2017-2019 Sander Bollen
:copyright: (c) 2017-2019 Leiden University Medical Center
...
...
@@ -35,12 +35,6 @@ if not Path(REFERENCE).exists():
raise FileNotFoundError("Reference file {0} "
"does not exist.".format(REFERENCE))
GATK = config.get("GATK")
if GATK is None:
raise ValueError("You must set --config GATK=<path>")
if not Path(GATK).exists():
raise FileNotFoundError("{0} does not exist.".format(GATK))
DBSNP = config.get("DBSNP")
if DBSNP is None:
raise ValueError("You must set --config DBSNP=<path>")
...
...
@@ -322,15 +316,14 @@ rule baserecal:
"""Base recalibrated BAM files"""
input:
bam = "{sample}/bams/{sample}.markdup.bam",
gatk = GATK,
ref = REFERENCE,
dbsnp = DBSNP,
one1kg = ONETHOUSAND,
hapmap = HAPMAP
output:
grp = "{sample}/bams/{sample}.baserecal.grp"
singularity: "docker://
quay.io/biocontainers
/gatk:3.7-
-py36_1
"
shell: "java -XX:ParallelGCThreads=1 -jar
{input.gatk}
-T "
singularity: "docker://
broadinstitute
/gatk
3
:3.7-
0
"
shell: "java -XX:ParallelGCThreads=1 -jar
/usr/GenomeAnalysisTK.jar
-T "
"BaseRecalibrator -I {input.bam} -o {output.grp} -nct 8 "
"-R {input.ref} -cov ReadGroupCovariate -cov QualityScoreCovariate "
"-cov CycleCovariate -cov ContextCovariate -knownSites "
...
...
@@ -344,14 +337,13 @@ rule gvcf_scatter:
bqsr="{sample}/bams/{sample}.baserecal.grp",
dbsnp=DBSNP,
ref=REFERENCE,
gatk=GATK
params:
chunk="{chunk}"
output:
gvcf=temp("{sample}/vcf/{sample}.{chunk}.part.vcf.gz"),
gvcf_tbi=temp("{sample}/vcf/{sample}.{chunk}.part.vcf.gz.tbi")
singularity: "docker://
quay.io/biocontainers
/gatk:3.7-
-py36_1
"
shell: "java -jar -Xmx4G -XX:ParallelGCThreads=1
{input.gatk}
"
singularity: "docker://
broadinstitute
/gatk
3
:3.7-
0
"
shell: "java -jar -Xmx4G -XX:ParallelGCThreads=1
/usr/GenomeAnalysisTK.jar
"
"-T HaplotypeCaller -ERC GVCF -I "
"{input.bam} -R {input.ref} -D {input.dbsnp} "
"-L '{params.chunk}' -o '{output.gvcf}' "
...
...
@@ -362,14 +354,14 @@ rule gvcf_scatter:
rule gvcf_chunkfile:
"""
Create simple text file with paths to chunks for GVCF.
This uses a run directive in stead of a shell directive because
the amount of chunks may be so large the shell would error out with
an "argument list too long" error.
an "argument list too long" error.
See https://unix.stackexchange.com/a/120842 for more info
This also means this rule lives outside of singularity and is
executed in snakemake's own environment.
executed in snakemake's own environment.
"""
params:
chunkfiles = expand("{{sample}}/vcf/{{sample}}.{chunk}.part.vcf.gz",
...
...
@@ -410,8 +402,7 @@ rule genotype_scatter:
gvcfs = expand("{sample}/vcf/{sample}.g.vcf.gz", sample=SAMPLES),
tbis = expand("{sample}/vcf/{sample}.g.vcf.gz.tbi",
sample=SAMPLES),
ref=REFERENCE,
gatk=GATK
ref=REFERENCE
params:
li=" -V ".join(expand("{sample}/vcf/{sample}.g.vcf.gz",
sample=SAMPLES)),
...
...
@@ -419,8 +410,8 @@ rule genotype_scatter:
output:
vcf=temp("multisample/genotype.{chunk}.part.vcf.gz"),
vcf_tbi=temp("multisample/genotype.{chunk}.part.vcf.gz.tbi")
singularity: "docker://
quay.io/biocontainers
/gatk:3.7-
-py36_1
"
shell: "java -jar -Xmx15G -XX:ParallelGCThreads=1
{input.gatk}
-T "
singularity: "docker://
broadinstitute
/gatk
3
:3.7-
0
"
shell: "java -jar -Xmx15G -XX:ParallelGCThreads=1
/usr/GenomeAnalysisTK.jar
-T "
"GenotypeGVCFs -R {input.ref} "
"-V {params.li} -L '{params.chunk}' -o '{output.vcf}'"
...
...
@@ -428,14 +419,14 @@ rule genotype_scatter:
rule genotype_chunkfile:
"""
Create simple text file with paths to chunks for genotyping
This uses a run directive in stead of a shell directive because
the amount of chunks may be so large the shell would error out with
an "argument list too long" error.
an "argument list too long" error.
See https://unix.stackexchange.com/a/120842 for more info
This also means this rule lives outside of singularity and is
executed in snakemake's own environment.
executed in snakemake's own environment.
"""
params:
vcfs = expand("multisample/genotype.{chunk}.part.vcf.gz",
...
...
@@ -475,14 +466,13 @@ rule split_vcf:
input:
vcf="multisample/genotyped.vcf.gz",
tbi = "multisample/genotyped.vcf.gz.tbi",
gatk=GATK,
ref=REFERENCE
params:
s="{sample}"
output:
splitted="{sample}/vcf/{sample}_single.vcf.gz"
singularity: "docker://
quay.io/biocontainers
/gatk:3.7-
-py36_1
"
shell: "java -Xmx15G -XX:ParallelGCThreads=1 -jar
{input.gatk}
"
singularity: "docker://
broadinstitute
/gatk
3
:3.7-
0
"
shell: "java -Xmx15G -XX:ParallelGCThreads=1 -jar
/usr/GenomeAnalysisTK.jar
"
"-T SelectVariants -sn {params.s} -env -R {input.ref} -V "
"{input.vcf} -o {output.splitted}"
...
...
@@ -535,7 +525,7 @@ rule usable_basenum:
rule fastqc_raw:
"""
Run fastqc on raw fastq files
NOTE: singularity version uses 0.11.7 in stead of 0.11.5 due to
NOTE: singularity version uses 0.11.7 in stead of 0.11.5 due to
perl missing in the container of 0.11.5
"""
input:
...
...
@@ -552,8 +542,8 @@ rule fastqc_raw:
rule fastqc_merged:
"""
Run fastqc on merged fastq files
NOTE: singularity version uses 0.11.7 in stead of 0.11.5 due to
Run fastqc on merged fastq files
NOTE: singularity version uses 0.11.7 in stead of 0.11.5 due to
perl missing in the container of 0.11.5
"""
input:
...
...
@@ -573,8 +563,8 @@ rule fastqc_merged:
rule fastqc_postqc:
"""
Run fastqc on fastq files post pre-processing
NOTE: singularity version uses 0.11.7 in stead of 0.11.5 due to
perl missing in the container of 0.11.5
NOTE: singularity version uses 0.11.7 in stead of 0.11.5 due to
perl missing in the container of 0.11.5
"""
input:
r1="{sample}/pre_process/{sample}.cutadapt_R1.fastq",
...
...
van den Berg
@rrvandenberg
mentioned in issue
#34 (closed)
·
Dec 19, 2019
mentioned in issue
#34 (closed)
mentioned in issue #34
Toggle commit list
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment