diff --git a/docs/config.md b/docs/config.md index de3342b195b1dc7acb338729792d9dabc59c5228..79bf59bedcd02c908488d9ba311ef138eac85b67 100644 --- a/docs/config.md +++ b/docs/config.md @@ -69,10 +69,18 @@ Global setting examples are: ---- +#### References +Pipelines and tools that use references should now use the reference module. This gives some more fine-grained control over references. +E.g. pipelines and tools that use a fasta references file should now set value `reference_fasta`. +Additionally, we can set `reference_name` for the name to be used (e.g. `hg19`). If unset, Biopet will default to `unknown`. +It is also possible to set the `species` flag. Again, we will default to `unknown` if unset. + #### Example settings config ~~~ { - "reference": "/references/hg19_nohap/ucsc.hg19_nohap.fasta", + "reference_fasta": "/references/hg19_nohap/ucsc.hg19_nohap.fasta", + "reference_name": "hg19_nohap", + "species": "homo_sapiens", "dbsnp": "/references/hg19_nohap/dbsnp_137.hg19_nohap.vcf", "joint_variantcalling": false, "haplotypecaller": { "scattercount": 100 }, diff --git a/docs/general/config.md b/docs/general/config.md index 10d69a2699d3a2a811d0beaa943c71a3aabdc62a..b08d11bf6ae4b03882d1007a5399b53327024ef4 100644 --- a/docs/general/config.md +++ b/docs/general/config.md @@ -69,10 +69,17 @@ Global setting examples are: ---- +#### References +Pipelines and tools that use references should now use the reference module. This gives some more fine-grained control over references. +E.g. pipelines and tools that use a fasta references file should now set value `reference_fasta`. +Additionally, we can set `reference_name` for the name to be used (e.g. `hg19`). If unset, Biopet will default to `unknown`. +It is also possible to set the `species` flag. Again, we will default to `unknown` if unset. #### Example settings config ~~~ { - "reference": "/data/LGTC/projects/vandoorn-melanoma/data/references/hg19_nohap/ucsc.hg19_nohap.fasta", + "reference_fasta": "/references/hg19_nohap/ucsc.hg19_nohap.fasta", + "reference_name": "hg19_nohap", + "species": "homo_sapiens", "dbsnp": "/data/LGTC/projects/vandoorn-melanoma/data/references/hg19_nohap/dbsnp_137.hg19_nohap.vcf", "joint_variantcalling": false, "haplotypecaller": { "scattercount": 100 }, diff --git a/docs/pipelines/bam2wig.md b/docs/pipelines/bam2wig.md index b072cff2d7d4e2fc46870819fbfaa120331d4c92..e683f950fb106c3087b7c5e24aed9085f2f47d43 100644 --- a/docs/pipelines/bam2wig.md +++ b/docs/pipelines/bam2wig.md @@ -2,35 +2,45 @@ ## Introduction -Bam2Wig is a small pipeline consisting of three steps that is used to convert BAM files into track coverage files: bigWig, wiggle, and TDF. While this seems like a task that should be tool, at the time of writing, there are no command line tools that can do such conversion in one go. Thus, the Bam2Wig pipeline was written. +Bam2Wig is a small pipeline consisting of three steps that are used to convert BAM files into track coverage files: bigWig, wiggle, and TDF. While this seems like a task that should be tool, at the time of writing, there are no command line tools that can do such conversion in one go. Thus, the Bam2Wig pipeline was written. ## Configuration - The required configuration file for Bam2Wig is really minimal, only a single JSON file containing an `output_dir` entry: ~~~ {"output_dir": "/path/to/output/dir"} ~~~ +For technical reasons, single sample pipelines, such as this mapping pipeline do **not** take a sample config. +Input files are in stead given on the command line as a flag. +Bam2wig requires a one to set the `--bamfile` command line argument to point to the to-be-converted BAM file. ## Running Bam2Wig As with other pipelines, you can run the Bam2Wig pipeline by invoking the `pipeline` subcommand. There is also a general help available which can be invoked using the `-h` flag: -~~~ -$ java -jar /path/to/biopet.jar pipeline sage -h +~~~bash +$ java -jar /path/to/biopet.jar pipeline bam2wig -h + +Arguments for Bam2Wig: + --bamfile <bamfile> Input bam file + -config,--config_file <config_file> JSON / YAML config file(s) + -cv,--config_value <config_value> Config values, value should be formatted like 'key=value' or + 'path:path:key=value' + -DSC,--disablescatter Disable all scatters + ~~~ If you are on SHARK, you can also load the `biopet` module and execute `biopet pipeline` instead: -~~~ +~~~bash $ module load biopet/v0.3.0 $ biopet pipeline bam2wig ~~~ To run the pipeline: -~~~ - biopet pipeline bam2wig -config </path/to/config.json> -qsub -jobParaEnv BWA -run +~~~bash + biopet pipeline bam2wig -config </path/to/config.json> --bamfile </path/to/bam.bam> -qsub -jobParaEnv BWA -run ~~~ ## Output Files diff --git a/docs/pipelines/basty.md b/docs/pipelines/basty.md index 3581a930ff50f7c1b2ff2d31bd7b6480ff9ba396..bd51752cb33746bfe0b1d069dc4e2a4f8b851317 100644 --- a/docs/pipelines/basty.md +++ b/docs/pipelines/basty.md @@ -3,8 +3,8 @@ ## Introduction -A pipeline for aligning bacterial genomes and detect structural variations on the level of SNPs. Basty will output phylogenetic trees. -Which makes it very easy to look at the variations between certain species or strains. +Basty is a pipeline for aligning bacterial genomes and detecting structural variations on the level of SNPs. +Basty will output phylogenetic trees, which makes it very easy to look at the variations between certain species or strains. ### Tools for this pipeline * [Shiva](../pipelines/shiva.md) @@ -14,7 +14,7 @@ Which makes it very easy to look at the variations between certain species or st ### Requirements -To run for a specific species, please do not forget to create the proper index files. +To run with a specific species, please do not forget to create the proper index files. The index files are created from the supplied reference: * ```.dict``` (can be produced with <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>) @@ -22,18 +22,59 @@ The index files are created from the supplied reference: * ```.idxSpecificForAligner``` (depending on which aligner is used one should create a suitable index specific for that aligner. Each aligner has his own way of creating index files. Therefore the options for creating the index files can be found inside the aligner itself) +### Configuration +To run Basty, please create the proper [Config](../general/config.md) files. + +Batsy uses the [Shiva](../shiva.md) pipeline internally. Please check the documentation for this pipeline for the options. + +#### Required configuration values + +| Submodule | Name | Type | Default | Function | +| --------- | ---- | ---- | ------- | -------- | +| shiva | variantcallers | List[String] | | Which variant caller to use | +| - | output_dir | Path | Path to output directory | + + +#### Other options + +Specific configuration options additional to Basty are: + +| Submodule | Name | Type | Default | Function | +| --------- | ---- | ---- | ------- | -------- | +| raxml | seed | Integer | 12345 | RAxML Random seed| +| raxml | raxml_ml_model | String | GTRGAMMAX | RAxML model | +| raxml | ml_runs | Integer | 20 | Number of RaxML runs | +| raxml | boot_runs | Integer | 100 | Number of RaxML boot runs | + + +#### Example settings config + +```json + +{ + output_dir: </path/to/out_directory>, + "shiva": { + "variantcallers": ["freeBayes"] + }, + "raxml" : { + "ml_runs": 50 + } +} + +``` + ### Example ##### For the help screen: ~~~ -java -jar Biopet.0.2.0.jar pipeline basty -h +java -jar </path/to/biopet.jar> pipeline basty -h ~~~ ##### Run the pipeline: Note that one should first create the appropriate [configs](../general/config.md). ~~~ -java -jar Biopet.0.2.0.jar pipeline basty -run -config MySamples.json -config MySettings.json -outDir myOutDir +java -jar </path/to/biopet/jar> pipeline basty -run -config MySamples.json -config MySettings.json ~~~ ### Result files diff --git a/docs/pipelines/flexiprep.md b/docs/pipelines/flexiprep.md index 56da30f1f2e08e16d185d4ed45603f991bbf391b..613b534633b5e34993e7d4d4d0e76210de10c5e7 100644 --- a/docs/pipelines/flexiprep.md +++ b/docs/pipelines/flexiprep.md @@ -1,32 +1,29 @@ # Flexiprep ## Introduction -Flexiprep is out quality control pipeline. This pipeline checks for possible barcode contamination, clips reads, trims reads and runs -the tool <a href="http://www.bioinformatics.babraham.ac.uk/projects/fastqc/" target="_blank">Fastqc</a>. -The adapter clipping is performed by <a href="https://github.com/marcelm/cutadapt" target="_blank">Cutadapt</a>. -For the quality trimming we use: <a href="https://github.com/najoshi/sickle" target="_blank">Sickle</a>. Flexiprep works on `.fastq` files. +Flexiprep is our quality control pipeline. This pipeline checks for possible barcode contamination, clips reads, trims reads and runs +the <a href="http://www.bioinformatics.babraham.ac.uk/projects/fastqc/" target="_blank">Fastqc</a> tool. +Adapter clipping is performed by <a href="https://github.com/marcelm/cutadapt" target="_blank">Cutadapt</a>. +For quality trimming we use <a href="https://github.com/najoshi/sickle" target="_blank">Sickle</a>. +Flexiprep works on `.fastq` files. ## Example To get the help menu: ~~~ -java -jar Biopet-0.2.0-DEV.jar pipeline Flexiprep -h +java -jar </path/to/biopet.jar> pipeline Flexiprep -h + Arguments for Flexiprep: - -R1,--input_r1 <input_r1> R1 fastq file (gzipped allowed) - -sample,--samplename <samplename> Sample name - -library,--libraryname <libraryname> Library name - -outDir,--output_directory <output_directory> Output directory - -R2,--input_r2 <input_r2> R2 fastq file (gzipped allowed) - -skiptrim,--skiptrim Skip Trim fastq files - -skipclip,--skipclip Skip Clip fastq files - -config,--config_file <config_file> JSON config file(s) - -DSC,--disablescatterdefault Disable all scatters + -R1,--input_r1 <input_r1> R1 fastq file (gzipped allowed) + -R2,--input_r2 <input_r2> R2 fastq file (gzipped allowed) + -sample,--sampleid <sampleid> Sample ID + -library,--libid <libid> Library ID + -config,--config_file <config_file> JSON config file(s) + -DSC,--disablescatter Disable all scatters ~~~ -As we can see in the above example we provide the options to skip trimming or clipping -since sometimes you want to have the possibility to not perform these tasks e.g. -if there are no adapters present in your .fastq. Note that the pipeline also works on unpaired reads where one should only provide R1. +Note that the pipeline also works on unpaired reads where one should only provide R1. To start the pipeline (remove `-run` for a dry run): @@ -36,9 +33,34 @@ java -jar Biopet-0.2.0.jar pipeline Flexiprep -run -outDir myDir \ -library myLibname -config mySettings.json ~~~ + +## Configuration and flags +For technical reasons, single sample pipelines, such as this pipeline do **not** take a sample config. +Input files are in stead given on the command line as a flag. + +Command line flags for Flexiprep are: + +| Flag (short)| Flag (long) | Type | Function | +| ------------ | ----------- | ---- | -------- | +| -R1 | --input_r1 | Path (**required**) | Path to input fastq file | +| -R2 | --input_r2 | Path (optional) | Path to second read pair fastq file. | +| -sample | --sampleid | String (**required**) | Name of sample | +| -library | --libid | String (**required**) | Name of library | + +If `-R2` is given, the pipeline will assume a paired-end setup. + +### Config + +All other values should be provided in the config. Specific config values towards the mapping pipeline are: + +| Name | Type | Function | +| ---- | ---- | -------- | +| skiptrim | Boolean | Skip the trimming step | +| skipclip | Boolean | Skip the clipping step | + ## Result files -The results from this pipeline will be a fastq file which is depending on the options either clipped and trimmed, only clipped, - only trimmed or no quality control at all. The pipeline also outputs 2 Fastqc runs one before and one after quality control. +The results from this pipeline will be a fastq file. +The pipeline also outputs 2 Fastqc runs one before and one after quality control. ### Example output diff --git a/docs/pipelines/gentrap.md b/docs/pipelines/gentrap.md index cfb99916cbbc063a735cbb37657df9b954f626e1..0c73201fb8c8563d4c0c554d45e8f65bd1a6fe9d 100644 --- a/docs/pipelines/gentrap.md +++ b/docs/pipelines/gentrap.md @@ -78,7 +78,7 @@ For the pipeline settings, there are some values that you need to specify while 1. `output_dir`: path to output directory (if it does not exist, Gentrap will create it for you). 2. `aligner`: which aligner to use (`gsnap` or `tophat`) -3. `reference`: this must point to a reference FASTA file and in the same directory, there must be a `.dict` file of the FASTA file. +3. `reference_fasta`: this must point to a reference FASTA file and in the same directory, there must be a `.dict` file of the FASTA file. 4. `expression_measures`: this entry determines which expression measurement modes Gentrap will do. You can choose zero or more from the following: `fragments_per_gene`, `bases_per_gene`, `bases_per_exon`, `cufflinks_strict`, `cufflinks_guided`, and/or `cufflinks_blind`. If you only wish to align, you can set the value as an empty list (`[]`). 5. `strand_protocol`: this determines whether your library is prepared with a specific stranded protocol or not. There are two protocols currently supported now: `dutp` for dUTP-based protocols and `non_specific` for non-strand-specific protocols. 6. `annotation_refflat`: contains the path to an annotation refFlat file of the entire genome @@ -100,7 +100,7 @@ Thus, an example settings configuration is as follows: "output_dir": "/path/to/output/dir", "expression_measures": ["fragments_per_gene", "bases_per_gene"], "strand_protocol": "dutp", - "reference": "/path/to/reference", + "reference_fasta": "/path/to/reference", "annotation_gtf": "/path/to/gtf", "annotation_refflat": "/path/to/refflat", "gsnap": { diff --git a/docs/pipelines/mapping.md b/docs/pipelines/mapping.md index c6375c701fd91765a6b0927544c9e1e8d3f17336..5fae9a6a64c9fc43d6c896cb34ec03a662300c33 100644 --- a/docs/pipelines/mapping.md +++ b/docs/pipelines/mapping.md @@ -17,46 +17,86 @@ After the QC, the pipeline simply maps the reads with the chosen aligner. The re * <a href="https://github.com/alexdobin/STAR" target="_blank">Star-2pass</a> * <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a> +## Configuration and flags +For technical reasons, single sample pipelines, such as this mapping pipeline do **not** take a sample config. +Input files are in stead given on the command line as a flag. + +Command line flags for the mapping pipeline are: + +| Flag (short)| Flag (long) | Type | Function | +| ------------ | ----------- | ---- | -------- | +| -R1 | --input_r1 | Path (**required**) | Path to input fastq file | +| -R2 | --input_r2 | Path (optional) | Path to second read pair fastq file. | +| -sample | --sampleid | String (**required**) | Name of sample | +| -library | --libid | String (**required**) | Name of library | + +If `-R2` is given, the pipeline will assume a paired-end setup. + +### Config + +All other values should be provided in the config. Specific config values towards the mapping pipeline are: + +| Name | Type | Function | +| ---- | ---- | -------- | +| aligner | String (optional) | Which aligner to use. Defaults to `bwa`. Choose from [`bwa`, `bwa-aln`, `bowtie`, `gsnap`, `tophat`, `stampy`, `star`, `star-2pass`] | +| skip_flexiprep | Boolean (optional) | Whether to skip the flexiprep QC step (default = False) | +| skip_markduplicates | Boolean (optional) | Whether to skip the Picard Markduplicates step (default = False) | +| skip_metrics | Boolean (optional) | Whether to skip the metrics gathering step (default = False) | +| reference_fasta | Path (**required**) | Path to indexed fasta file to be used as reference | +| platform | String (optional) | Read group Platform (defaults to `illumina`)| +| platform_unit | String (**required**) | Read group platform unit | +| readgroup_sequencing_center | String (**required**) | Read group sequencing center | +| readgroup_description | String (**required**) | Read group description | +| predicted_insertsize | Integer (**required**) | Read group predicted insert size | + +It is possible to provide any config value as a command line argument as well, using the `-cv` flag. +E.g. `-cv reference=<path/to/reference>` would set value `reference`. + ## Example -Note that one should first create the appropriate [configs](../general/config.md). +Note that one should first create the appropriate [settings config](../general/config.md). +Any supplied sample config will be ignored. + +### Example config +```json +{ +"reference_fasta": "<path/to/reference">, +"aligner": "bwa", +"skip_metrics": true, +"platform": "our_platform", +"platform_unit": "our_unit", +"readgroup_sequencing_center": "our_center", +"readgroup_description": "our_description", +"predicted_insertsize": 300, +"output_dir": "<path/to/output/dir"> +} +``` + + +### Running the pipeline For the help menu: ~~~ java -jar </path/to/biopet.jar> pipeline mapping -h Arguments for Mapping: - -R1,--input_r1 <input_r1> R1 fastq file - -outDir,--output_directory <output_directory> Output directory - -R2,--input_r2 <input_r2> R2 fastq file - -outputName,--outputname <outputname> Output name - -skipflexiprep,--skipflexiprep Skip flexiprep - -skipmarkduplicates,--skipmarkduplicates Skip mark duplicates - -skipmetrics,--skipmetrics Skip metrics - -ALN,--aligner <aligner> Aligner - -R,--reference <reference> Reference - -chunking,--chunking Chunking - -numberChunks,--numberchunks <numberchunks> Number of chunks, if not defined pipeline will automatically calculate the number of chunks - -RGID,--rgid <rgid> Readgroup ID - -RGLB,--rglb <rglb> Readgroup Library - -RGPL,--rgpl <rgpl> Readgroup Platform - -RGPU,--rgpu <rgpu> Readgroup platform unit - -RGSM,--rgsm <rgsm> Readgroup sample - -RGCN,--rgcn <rgcn> Readgroup sequencing center - -RGDS,--rgds <rgds> Readgroup description - -RGDT,--rgdt <rgdt> Readgroup sequencing date - -RGPI,--rgpi <rgpi> Readgroup predicted insert size - -config,--config_file <config_file> JSON config file(s) - -DSC,--disablescatterdefault Disable all scatters + -R1,--input_r1 <input_r1> R1 fastq file + -R2,--input_r2 <input_r2> R2 fastq file + -sample,--sampleid <sampleid> Sample ID + -library,--libid <libid> Library ID + -config,--config_file <config_file> JSON / YAML config file(s) + -cv,--config_value <config_value> Config values, value should be formatted like 'key=value' or + 'path:path:key=value' + -DSC,--disablescatter Disable all scatters + ~~~ To run the pipeline: ~~~ java -jar </path/to/biopet.jar> pipeline mapping -run --config mySettings.json \ --R1 myReads1.fastq -R2 myReads2.fastq -outDir myOutDir -OutputName myReadsOutput \ --R hg19.fasta -RGSM mySampleName -RGLB myLib1 +-R1 myReads1.fastq -R2 myReads2.fastq ~~~ -Note that removing -R2 causes the pipeline to be able of handlind single end `.fastq` files. +Note that removing -R2 causes the pipeline to assume single end `.fastq` files. To perform a dry run simply remove `-run` from the commandline call. diff --git a/docs/pipelines/sage.md b/docs/pipelines/sage.md index 889054cc4f0ba746c6b005cbd82253f28400cf4f..97e3ceab4fab0c6221c187135f647d1579fb426f 100644 --- a/docs/pipelines/sage.md +++ b/docs/pipelines/sage.md @@ -15,10 +15,21 @@ This pipeline uses the following modules and tools: * [SageCreateTagCounts](../tools/sagetools.md) -## Configuration +## Configuration and flags Note that one should first create the appropriate [configs](../general/config.md). +Please see the documentation for wrapped pipelines (`Mapping` and `Flexiprep`) for their configuration options and flags. + +Specific configuration values for the Sage pipeline are: + +| Name | Type | Function | +| ---- | ---- | -------- | +| countbed | Path (required) | Path to count bed file | +| squishedcountbed | Path (optional) | By supplying this file the auto squish job will be skipped | +| transcriptome | Path (required) | Fasta file for transcriptome. Note: Must come from Ensembl! | +| tags_library | Path (optional) | Five-column tab-delimited file (<tag> <firstTag> <AllTags> <FirstAntiTag> <AllAntiTags>). Unsupported option | + ## Running Sage As with other pipelines, you can run the Sage pipeline by invoking the `pipeline` subcommand. There is also a general help available which can be invoked using the `-h` flag: @@ -27,13 +38,12 @@ As with other pipelines, you can run the Sage pipeline by invoking the `pipeline $ java -jar /path/to/biopet.jar pipeline sage -h Arguments for Sage: - -outDir,--output_directory <output_directory> Output directory - --countbed <countbed> countBed - --squishedcountbed <squishedcountbed> squishedCountBed, by suppling this file the auto squish job will be - skipped - --transcriptome <transcriptome> Transcriptome, used for generation of tag library - -config,--config_file <config_file> JSON config file(s) - -DSC,--disablescatterdefault Disable all scatters + -s,--sample <sample> Only Sample + -config,--config_file <config_file> JSON / YAML config file(s) + -cv,--config_value <config_value> Config values, value should be formatted like 'key=value' or + 'path:path:key=value' + -DSC,--disablescatter Disable all scatters + ~~~ If you are on SHARK, you can also load the `biopet` module and execute `biopet pipeline` instead: diff --git a/docs/pipelines/shiva.md b/docs/pipelines/shiva.md index c55892564ee30f6158d3f712fba73d6c6eca221c..9f3b0076fb056289f89bd7b59e2fa757680234c1 100644 --- a/docs/pipelines/shiva.md +++ b/docs/pipelines/shiva.md @@ -3,7 +3,7 @@ ## Introduction This pipeline is build for variant calling on NGS data (preferably Illumina data). -It is based on the <a href="https://www.broadinstitute.org/gatk/guide/best-practices" target="_blank">best practices</a>) of GATK in terms of there approach to variant calling. +It is based on the <a href="https://www.broadinstitute.org/gatk/guide/best-practices" target="_blank">best practices</a>) of GATK in terms of their approach to variant calling. The pipeline accepts ```.fastq & .bam``` files as input. ---- @@ -26,9 +26,9 @@ Note that one should first create the appropriate [configs](../general/config.md ### Full pipeline -The full pipeline can start from fastq or from bam file. This pipeline will include pre process steps for the bam files. +The full pipeline can start from fastq or from bam file. This pipeline will include pre-process steps for the bam files. -To get the help menu: +To view the help menu, execute: ~~~ java -jar </path/to/biopet.jar> pipeline shiva -h @@ -44,13 +44,15 @@ To run the pipeline: java -jar </path/to/biopet.jar> pipeline shiva -config MySamples.json -config MySettings.json -run ~~~ -To perform a dry run simply remove `-run` from the commandline call. +A dry run can be performed by simply removing the `-run` flag from the command line call. -### Just variantcalling +### Only variant calling -This will not do any pre process steps on the bam files. +It is possible to run Shiva while only performing its variant calling steps. +This has been separated in its own pipeline named `shivavariantcalling`. +As this calling pipeline starts from BAM files, it will naturally not perform any pre-processing steps. -To get the help menu: +To view the help menu, execute: ~~~ java -jar </path/to/biopet.jar> pipeline shivavariantcalling -h @@ -68,13 +70,15 @@ To run the pipeline: java -jar </path/to/biopet.jar> pipeline shivavariantcalling -config MySettings.json -run ~~~ -To perform a dry run simply remove `-run` from the commandline call. +A dry run can be performed by simply removing the `-run` flag from the command line call. ---- -## Variantcaller -At this moment the following variantcallers modes can be used +## Variant caller +At this moment the following variant callers can be used + +`TODO: explain them briefly` * haplotypecaller * haplotypecaller_gvcf @@ -85,36 +89,59 @@ At this moment the following variantcallers modes can be used * freebayes * raw ----- - -## Multisample and Singlesample -### Multisample -With <a href="https://www.broadinstitute.org/gatk/guide/tagged?tag=multi-sample">multisample</a> - one can perform variantcalling with all samples combined for more statistical power and accuracy. - - -### Singlesample -If one prefers single sample variantcalling (which is the default) there is no need of setting the joint_variantcalling inside the config. -The single sample variantcalling has 2 modes as well: - - ----- - ## Config options To view all possible config options please navigate to our Gitlab wiki page <a href="https://git.lumc.nl/biopet/biopet/wikis/GATK-Variantcalling-Pipeline" target="_blank">Config</a> +### Required settings +| Namespace | Name | Type | Default | Function | +| ----------- | ---- | ---- | ------- | -------- | +| - | output_dir | String | | Path to output directory | +| Shiva | variantcallers | List[String] | | Which variant callers to use | + + ### Config options -| Config Name | Name | Type | Default | Function | +| Namespace | Name | Type | Default | Function | | ----------- | ---- | ----- | ------- | -------- | -| shiva | reference | String | | reference to align to | +| shiva | reference_fasta | String | | reference to align to | | shiva | dbsnp | String | | vcf file of dbsnp records | | shiva | variantcallers | List[String] | | variantcaller to use, see list | -| shiva | multisample_sample_variantcalling | Boolean | true | | -| shiva | single_sample_variantcalling | Boolean | false | | -| shiva | library_variantcalling | Boolean | false | | +| shiva | use_indel_realigner | Boolean | true | Realign indels | +| shiva | use_base_recalibration | Boolean | true | Base recalibrate | +| shiva | use_analyze_covariates | Boolean | false | Analyze covariates during base recalibration step | +| shiva | bam_to_fastq | Boolean | false | Convert bam files to fastq files | +| shiva | correct_readgroups | Boolean | false | Attempt to correct read groups | +| vcffilter | min_sample_depth | Integer | 8 | Filter variants with at least x coverage | +| vcffilter | min_alternate_depth | Integer | 2 | Filter variants with at least x depth on the alternate allele | +| vcffilter | min_samples_pass | Integer | 1 | Minimum amount of samples which pass custom filter (requires additional flags) | +| vcffilter | filter_ref_calls | Boolean | true | Remove reference calls | +| vcfstats | reference | String | Path to reference to be used by `vcfstats` | + +Since Shiva uses the [Mapping](../mapping.md) pipeline internally, mapping config values can be specified as well. +For all the options, please see the corresponding documentation for the mapping pipeline. + +### Modes + +Shiva furthermore supports three modes. The default and recommended option is `multisample_variantcalling`. +During this mode, all bam files will be simultaneously called in one big VCF file. It will work with any number of samples. + +On top of that, Shiva provides two separate modes that only work with a single sample. +Those are not recommend, but may be useful to those who need to validate replicates. + +Mode `single_sample_variantcalling` calls a single sample as a merged bam file. +I.e., it will merge all libraries in one bam file, then calls on that. + +The other mode, `library_variantcalling`, will call simultaneously call all library bam files. + +The config for these therefore is: + +| Namespace | Name | Type | Default | Function | +| ----------- | ---- | ---- | ------- | -------- | +| shiva | multisample_variantcalling | Boolean | true | Default, multisample calling | +| shiva | single_sample_variantcalling | Boolean | false | Not-recommended, single sample, merged bam | +| shiva | library_variantcalling | Boolean | false | Not-recommended, single sample, per library | **Config example** @@ -124,13 +151,19 @@ To view all possible config options please navigate to our Gitlab wiki page "samples": { "SampleID": { "libraries": { - "lib_id_1": { "bam": "YoureBam.bam" }, + "lib_id_1": { "bam": "YourBam.bam" }, "lib_id_2": { "R1": "file_R1.fq.gz", "R2": "file_R2.fq.gz" } } } }, - "reference": "<location of fasta of reference>", - "variantcallers": [ "haplotypecaller", "unifiedgenotyper" ], + "shiva": { + "reference": "<location of fasta of reference>", + "variantcallers": [ "haplotypecaller", "unifiedgenotyper" ], + "dbsnp": "</path/to/dbsnp.vcf>", + "vcffilter": { + "min_alternate_depth": 1 + } + }, "output_dir": "<output directory>" } ``` diff --git a/docs/pipelines/toucan.md b/docs/pipelines/toucan.md index c0c812197aa863530c1b3d99966c0ef4b4e4d19b..633f3cac24d8b0520dec97c1c5ef596ca6ae901b 100644 --- a/docs/pipelines/toucan.md +++ b/docs/pipelines/toucan.md @@ -22,8 +22,9 @@ Arguments for Toucan: Configuration ------------- You can set all the usual [flags and options](http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html) of the VEP in the configuration, -with the same name used by native VEP. -As some of these flags might conflict with other Biopet tools/pipelines, it is wise to put the VEP in its own JSON object. +with the same name used by native VEP, except those added after version 75. +The naming scheme for flags an options is indentical to the one used by the VEP +As some of these flags might conflict with other Biopet tools/pipelines, it is wise to put the VEP in its own namespace. You **MUST** set the following fields: @@ -46,7 +47,7 @@ With that in mind, an example configuration using mode `standard` of the VepNorm "vepnormalizer": { "mode": "standard" }, - "out_dir": <path_to_output_directory> + "output_dir": <path_to_output_directory> } ~~~~