Commit 026b2015 authored by sajvanderzeeuw's avatar sajvanderzeeuw

Changes in help screen GatkPipeline

parent 9581d328
# GATK-pipeline
## Introduction
The GATK-pipeline is build for variant calling on NGS data (preferably Illumina data).
It uses the <a href="https://www.broadinstitute.org/gatk/guide/best-practices" target="_blank">best practices</a>) of GATK in terms of there approach to variant calling.
The pipeline accepts ```.fastq & .bam``` files as input.
## Tools for this pipeline
* <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>
* [Flexiprep](flexiprep.md)
* <a href="https://www.broadinstitute.org/gatk/" target="_blank">GATK tools</a>:
* Realignertargetcreator
* Indelrealigner
* Baserecalibrator
* Printreads
* Splitncigarreads
* Haplotypecaller
* Variantrecalibrator
* Applyrecalibration
* Genotypegvcfs
* Variantannotator
## Example
Note that one should first create the appropriate [configs](../config.md).
To get the help menu:
~~~
java -jar Biopet.0.2.0.jar pipeline gatkPipeline -h
Arguments for GatkPipeline:
-outDir,--output_directory <output_directory> Output directory
-sample,--onlysample <onlysample> Only Sample
-skipgenotyping,--skipgenotyping Skip Genotyping step
-mergegvcfs,--mergegvcfs Merge gvcfs
-jointVariantCalling,--jointvariantcalling Joint variantcalling
-jointGenotyping,--jointgenotyping Joint genotyping
-config,--config_file <config_file> JSON config file(s)
-DSC,--disablescatterdefault Disable all scatters
~~~
To run the pipeline:
~~~
java -jar Biopet.0.2.0.jar pipeline gatkPipeline -run -config MySamples.json -config MySettings.json -outDir myOutDir
~~~
For LUMC/researchSHARK users there is a module available that sets all your environment settings and default executables/settings.
~~~
module load Biopet/0.2.0
biopet pipeline gatkPipeline -run -config MySamples.json -config MySettings.json -outDir myOutDir
~~~
## Examine results
### Result files
### Best practice
## References
\ No newline at end of file
......@@ -2,22 +2,15 @@
## <a href="https://git.lumc.nl/biopet/biopet/tree/develop/protected/basty/src/main/scala/nl/lumc/sasc/biopet/pipelines/basty" target="_blank">Basty</a>
A pipeline for aligning bacterial genomes and detect structural variations on the level of SNPs. Basty will output phylogenetic trees.
Which makes it very easy to look at the variations between certain species.
Which makes it very easy to look at the variations between certain species or strains.
# Tools for this pipeline
## Tools for this pipeline
* [GATK-pipeline](GATK-pipeline.md)
* [BastyGenerateFasta](../tools/BastyGenerateFasta.md)
* <a href="http://sco.h-its.org/exelixis/software.html" target="_blank">RAxml</a>
* <a href="https://github.com/sanger-pathogens/Gubbins" target="_blank">Gubbins</a>
# Invocation
~~~
java -jar Biopet.0.2.0.jar pipeline basty
~~~
# Example
## Example
To run for a specific species, please do not forget to create the proper index files:
* ```.dict``` (can be produced with <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>)
......@@ -25,10 +18,24 @@ To run for a specific species, please do not forget to create the proper index f
* ```.idxSpecificForAligner``` (depending on which aligner is used one should create a suitable index specific for that aligner.
Each aligner has his own way of creating index files. Therefore the options for creating the index files can be found inside the aligner itself)
# Testcase A
For the help screen:
~~~
java -jar Biopet.0.2.0.jar pipeline basty -h
~~~
#### Run the pipeline:
Note that one should first create the appropriate [configs](../config.md).
~~~
java -jar Biopet.0.2.0.jar pipeline basty -run -config MySamples.json -config MySettings.json -outDir myOutDir
~~~
# Examine results
For LUMC/researchSHARK users there is a module available that sets all your environment settings and default executables/settings.
~~~
module load Biopet/0.2.0
biopet pipeline basty -run -config MySamples.json -config MySettings.json -outDir myOutDir
~~~
## Result files
......@@ -42,7 +49,6 @@ The output files this pipeline produces are:
* FASTA containing all the consensus sequences based on min. coverage (default:8) but can be modified in the config
* A phylogenetic tree based on the variants called with the GATK-pipeline generated with the tool [BastyGenerateFasta](../tools/BastyGenerateFasta.md)
## Best practice
......
......@@ -60,13 +60,8 @@ This can be used in the root of the config or within the flexiprep, within flexi
A dual licensing model is applied. The source code within this project is freely available for non-commercial use under an AGPL license; For commercial users or users who do not want to follow the AGPL license, please contact sasc@lumc.nl to purchase a separate license.
# Invocation
# Example
Note that one should first create the appropriate [configs](../config.md).
# Testcase A
......
......@@ -3,6 +3,7 @@
# Invocation
# Example
Note that one should first create the appropriate [configs](../config.md).
# Testcase A
......
......@@ -3,6 +3,7 @@
# Invocation
# Example
Note that one should first create the appropriate [configs](../config.md).
# Testcase A
......
......@@ -3,6 +3,7 @@
# Invocation
# Example
Note that one should first create the appropriate [configs](../config.md).
# Testcase A
......
......@@ -3,6 +3,7 @@
# Invocation
# Example
Note that one should first create the appropriate [configs](../config.md).
# Testcase A
......
# BastyGenerateFasta
This tool generates Fasta files out of variant (SNP) alignments or full alignments (consensus).
It can be very useful to produce the right input needed for follow up tools, for example phylogenetic tree building.
## Example
To get the help menu:
~~~bash
java -jar Biopet-0.2.0-DEV-801b72ed.jar tool BastyGenerateFasta -h
Usage: BastyGenerateFasta [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-V <file> | --inputVcf <file>
vcf file, needed for outputVariants and outputConsensusVariants
--bamFile <file>
bam file, needed for outputConsensus and outputConsensusVariants
--outputVariants <file>
fasta with only variants from vcf file
--outputConsensus <file>
Consensus fasta from bam, always reference bases else 'N'
--outputConsensusVariants <file>
Consensus fasta from bam with variants from vcf file, always reference bases else 'N'
--snpsOnly
Only use snps from vcf file
--sampleName <value>
Sample name in vcf file
--outputName <value>
Output name in fasta file header
--minAD <value>
min AD value in vcf file for sample
--minDepth <value>
min detp in bam file
--reference <value>
Indexed reference fasta file
~~~
To run the tool please use:
~~~bash
# Minimal example for option: outputVariants (VCF based)
java -jar Biopet-0.2.0.jar tool BastyGenerateFasta --inputVcf myVCF.vcf \
--outputName NiceTool --outputVariants myVariants.fasta
# Minimal example for option: outputConsensus (BAM based)
java -jar Biopet-0.2.0.jar tool BastyGenerateFasta --bamFile myBam.bam \
--outputName NiceTool --outputConsensus myConsensus.fasta
# Minimal example for option: outputConsensusVariants
java -jar Biopet-0.2.0.jar tool BastyGenerateFasta --inputVcf myVCF.vcf --bamFile myBam.bam \
--outputName NiceTool --outputConsensusVariants myConsensusVariants.fasta
~~~
For LUMC/researchSHARK users there is a module available that sets all your environment settings and default executables/settings.
~~~
module load Biopet/0.2.0
# Minimal example for option: outputVariants (VCF based)
biopet tool BastyGenerateFasta --inputVcf myVCF.vcf \
--outputName NiceTool --outputVariants myVariants.fasta
# Minimal example for option: outputConsensus (BAM based)
biopet tool BastyGenerateFasta --bamFile myBam.bam \
--outputName NiceTool --outputConsensus myConsensus.fasta
# Minimal example for option: outputConsensusVariants
biopet tool BastyGenerateFasta --inputVcf myVCF.vcf --bamFile myBam.bam \
--outputName NiceTool --outputConsensusVariants myConsensusVariants.fasta
~~~
## Output
* FASTA containing variants only
* FASTA containing all the consensus sequences based on a minimal coverage (default:8) but can be modified in the settings config
# SamplesTsvToJson
This tool enables a user to create a full sample sheet in JSON format suitable for all our Queue pipelines.
The tool can be started as follows:
......@@ -68,6 +70,7 @@ To get the above example out of the tool one should provide 2 TSV files as follo
The second TSV file can contain as much properties as you would like. Possible option would be: gender, age and family.
Basically anything you want to pass to your pipeline is possible.
----
| sample | treatment |
......
......@@ -210,8 +210,8 @@ class GatkPipeline(val root: Configurable) extends QScript with MultiSampleQScri
if (runConfig.contains("CN")) aorrg.RGCN = runConfig("CN").toString
add(aorrg, isIntermediate = true)
bamFile = aorrg.output
} else throw new IllegalStateException("Readgroup sample and/or library of input bamfile is not correct, file: " + bamFile +
"\nPossible to set 'correct_readgroups' to true on config to automatic fix this")
} else throw new IllegalStateException("Sample readgroup and/or library of input bamfile is not correct, file: " + bamFile +
"\nPlease note that it is possible to set 'correct_readgroups' to true in the config to automatic fix this")
}
addAll(BamMetrics(this, bamFile, runDir + "metrics/").functions)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment