Commit aa04f501 authored by bow's avatar bow
Browse files

Merge branch 'docs-toucan' into 'release-0.3.0'

Docs toucan

See merge request !133
parents bb337946 a141edb3
TOUCAN
===========
Introduction
-----------
The Toucan pipeline is a VEP-based annotation pipeline.
Currently, it comprises just two steps:
* Variant Effect Predictor run
* [VEP Normalizer on the VEP output](../tools/VEPNormalizer.md)
Example
-----------
~~~~bash
java -jar Biopet-0.3.0.jar pipeline Toucan -h
Arguments for Toucan:
-Input,--inputvcf <inputvcf> Input VCF file
-config,--config_file <config_file> JSON config file(s)
-DSC,--disablescatter Disable all scatters
~~~~
Configuration
-------------
You can set all the usual [flags and options](http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html) of the VEP in the configuration,
with the same name used by native VEP.
As some of these flags might conflict with other Biopet tools/pipelines, it is wise to put the VEP in its own JSON object.
You **MUST** set the following fields:
* `vep_script`: the path to the VEP executable
* `dir` or `dir_cache`: the path to the VEP cache
It is wise to set the `cache_version` field as well.
Furthermore, the `fork` field will be overwritten by `threads` in case that one exists in the config.
Therefore, it is recommended not to use `fork`, but to rather use `threads`.
With that in mind, an example configuration using mode `standard` of the VEPNormalizer would thus be:
~~~~
{
"varianteffectpredictor": {
"vep_script": <path_to_exe>,
"dir": <path_to_cache>,
"cache_version": <cache_version>,
"threads": 8
},
"vepnormalizer": {
"mode": "standard"
},
"out_dir": <path_to_output_directory>
}
~~~~
Running the pipeline
---------------
The command to run the pipeline is:
~~~~
java -jar pipeline Toucan -Input <input_vcf> -config <config_json> -run
~~~~
If one wishes to run it on a cluster, the command becomes
~~~~
java -jar pipeline Toucan -Input <input_vcf> -config <config_json> -run -qsub -jobParaEnv <PE>
~~~~
VEPNormalizer
============
Introduction
------------
This tool normalizes a VCF file annotated with the Variant Effect Predictor (VEP).
Since the VEP does not use INFO fields to annotate, but rather puts all its annotations in one big string inside a "CSQ" INFO tag it is necessary to normalize it.
This normalizer will use the information in the CSQ header to create INFO fields for each annotation field.
It has two modes: `standard` and `explode`. The `standard` mode will produce a VCF according to the VCF specification.
This means that every VEP INFO tag will consist of the comma-separated list of values for each transcript.
In case the value is empty, the VEP INFO tag will not be shown for that specific record
Mode `explode` will, on the other hand, create a new VCF record for each transcript it encounters.
This thus means each VEP INFO tag will consist of a single value (if present at all). This can be useful if one must work on a per-transcript basis.
Please note, however, that this means records may seem to be "duplicated".
The CSQ tag is by default removed from the output VCF file. If one wishes to retain it, one can set the `--do-not-remove` option.
Example
---------
~~~~bash
java -jar Biopet-0.3.0.jar tool VEPNormalizer -h
|VEPNormalizer - Parse VEP-annotated VCF to standard VCF format
Usage: VEPNormalizer [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-I <vcf> | --InputFile <vcf>
Input VCF file
-O <vcf> | --OutputFile <vcf>
Output VCF file
-m <mode> | --mode <mode>
Mode
--do-not-remove
Do not remove CSQ tag
~~~~
......@@ -7,6 +7,7 @@ pages:
- ['pipelines/flexiprep.md', 'Pipelines', 'Flexiprep']
- ['pipelines/mapping.md', 'Pipelines', 'Mapping']
- ['pipelines/sage.md', 'Pipelines', 'Sage']
- ['pipelines/toucan.md', 'Pipelines', 'Toucan']
- ['tools/SamplesTsvToJson.md','Tools','SamplesTsvToJson']
- ['tools/BastyGenerateFasta.md','Tools','BastyGenerateFasta']
- ['tools/bedtointerval.md','Tools','BedToInterval']
......@@ -19,9 +20,10 @@ pages:
- ['tools/VcfFilter.md','Tools','VcfFilter']
- ['tools/MpileupToVcf.md', 'Tools', 'MpileupToVcf']
- ['tools/sagetools.md', 'Tools', 'Sagetools']
- ['tools/VEPNormalizer.md', 'Tools', 'VEPNormalizer']
- ['tools/WipeReads.md', 'Tools', 'WipeReads']
#- ['developing/Setup.md', 'Developing', 'Setting up your local development environment']
- ['about.md', 'About']
- ['license.md', 'License']
#theme: readthedocs
repo_url: https://git.lumc.nl/biopet/biopet
\ No newline at end of file
repo_url: https://git.lumc.nl/biopet/biopet
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment