shiva.md 14.2 KB
Newer Older
Peter van 't Hof's avatar
Peter van 't Hof committed
1 2 3 4
# Shiva

## Introduction

5
This pipeline is built for variant calling on NGS data (preferably Illumina data). Part of this pipeline resembles the <a href="https://www.broadinstitute.org/gatk/guide/best-practices" target="_blank">best practices</a>) of GATK in terms of their approach to variant calling.
Peter van 't Hof's avatar
Peter van 't Hof committed
6 7 8 9
The pipeline accepts ```.fastq & .bam``` files as input.

----

10
## Overview of tools and sub-pipelines for this pipeline
Peter van 't Hof's avatar
Peter van 't Hof committed
11

12 13 14 15 16
* [Flexiprep for QC](flexiprep.md)
* [Metagenomics analysis](gears.md)
* [Mapping](mapping.md)
* [VEP annotation](toucan.md)
* [CNV analysis](kopisu.md)
Peter van 't Hof's avatar
Peter van 't Hof committed
17
* <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>
Leon Mei's avatar
Leon Mei committed
18 19 20 21 22
* <a href="https://www.broadinstitute.org/gatk/" target="_blank">GATK tools</a>
* <a href="https://github.com/ekg/freebayes" target="_blank">Freebayes</a>
* <a href="http://dkoboldt.github.io/varscan/" target="_blank">Varscan</a>
* <a href="https://samtools.github.io/bcftools/bcftools.html" target="_blank">Bcftools</a>
* <a href="http://www.htslib.org/" target="_blank">Samtools</a>
Peter van 't Hof's avatar
Peter van 't Hof committed
23 24 25

----

26
## Basic usage
Peter van 't Hof's avatar
Peter van 't Hof committed
27

28
Note that one should first create the appropriate sample and pipeline setting [configs](../general/config.md).
Peter van 't Hof's avatar
Peter van 't Hof committed
29

30
Shiva pipeline can start from FASTQ or BAM files. This pipeline will include pre-process steps for the BAM files. 
31

32
When using BAM files as input, Note that one should alter the sample config field from `R1` into `bam`.
33

Sander Bollen's avatar
Sander Bollen committed
34
To view the help menu, execute:
Peter van 't Hof's avatar
Peter van 't Hof committed
35
~~~
36
biopet pipeline shiva -h
37
 
Peter van 't Hof's avatar
Peter van 't Hof committed
38
Arguments for Shiva:
39 40 41 42 43
 -s,--sample <sample>                  Only Process This Sample
 -config,--config_file <config_file>   JSON / YAML config file(s)
 -cv,--config_value <config_value>     Config values, value should be formatted like 'key=value' or 
                                       'namespace:namespace:key=value'
 -DSC,--disablescatter                 Disable all scatters
Peter van 't Hof's avatar
Peter van 't Hof committed
44 45 46 47 48

~~~

To run the pipeline:
~~~
49
biopet pipeline shiva -config MySamples.yml -config MySettings.yml -run
Peter van 't Hof's avatar
Peter van 't Hof committed
50 51
~~~

Sander Bollen's avatar
Sander Bollen committed
52
A dry run can be performed by simply removing the `-run` flag from the command line call.
Peter van 't Hof's avatar
Peter van 't Hof committed
53

npappas's avatar
npappas committed
54

55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
An example of MySettings.yml file is provided here and more detailed config options are explained in [config options](#config-options).
``` yaml
samples:
    SampleID:
        libraries:
            lib_id_1:
                bam: YourBam.bam
            lib_id_2:
                R1: file_R1.fq.gz
                R2: file_R2.fq.gz
species: H.sapiens
reference_name: GRCh38_no_alt_analysis_set
dbsnp_vcf: <dbsnp.vcf.gz>
vcffilter:
    min_alternate_depth: 1
output_dir: <output directory>
variantcallers:
    - haplotypecaller
    - unifiedgenotyper
    - haplotypecaller_gvcf
unifiedgenotyper:
    merge_vcf_results: false # This will do the variantcalling but will not merged into the final vcf file
```
Peter van 't Hof's avatar
Peter van 't Hof committed
78 79
----

80
## Supported variant callers
Sander Bollen's avatar
Sander Bollen committed
81 82
At this moment the following variant callers can be used

83 84 85 86 87 88 89 90 91 92 93 94
| ConfigName | Tool | Description |
| ---------- | ---- | ----------- |
| haplotypecaller_gvcf | <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller</a> | Running HaplotypeCaller in gvcf mode |
| haplotypecaller_allele | <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller</a> | Only genotype a given list of alleles with HaplotypeCaller |
| unifiedgenotyper_allele | <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php">unifiedgenotyper</a> | Only genotype a given list of alleles with UnifiedGenotyper |
| unifiedgenotyper | <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php">unifiedgenotyper</a> | Running default UnifiedGenotyper |
| haplotypecaller | <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller</a> | Running default HaplotypeCaller |
| freebayes | <a href="https://github.com/ekg/freebayes">freebayes</a> |  |
| raw | [Naive variant caller](../tools/MpileupToVcf) |  |
| bcftools | <a href="https://samtools.github.io/bcftools/bcftools.html">bcftools</a> |  |
| bcftools_singlesample | <a href="https://samtools.github.io/bcftools/bcftools.html">bcftools</a> |  |
| varscan_cns_singlesample | <a href="http://varscan.sourceforge.net/">varscan</a> |  |
95

Peter van 't Hof's avatar
Peter van 't Hof committed
96 97
## Config options

Sander Bollen's avatar
Sander Bollen committed
98
### Required settings
Leon Mei's avatar
Leon Mei committed
99
| ConfigNamespace | Name | Type | Default | Function |
Sander Bollen's avatar
Sander Bollen committed
100
| ----------- | ---- | ---- | ------- | -------- |
101
| - | output_dir | String |  | Path to output directory |
Sander Bollen's avatar
Sander Bollen committed
102 103
| Shiva | variantcallers | List[String] | | Which variant callers to use |

Peter van 't Hof's avatar
Peter van 't Hof committed
104 105
### Config options

Leon Mei's avatar
Leon Mei committed
106 107 108 109 110
| ConfigNamespace | Name |  Type | Default | Function | Applicable variant caller |
| ----------- | ---- | ----- | ------- | -------- | -------- |
| shiva | species | String | unknown_species | Name of species, like H.sapiens | all |
| shiva | reference_name | String | unknown_reference_name | Name of reference, like hg19 | all |
| shiva | reference_fasta | String |  | reference to align to | all |
111
| shiva | dbsnp_vcf | String |  | vcf file of dbsnp records | haplotypecaller, haplotypecaller_gvcf, haplotypecaller_allele, unifiedgenotyper, unifiedgenotyper_allele |
Leon Mei's avatar
Leon Mei committed
112
| shiva | variantcallers | List[String] |  | variantcaller to use, see list | all |
113
| shiva | input_alleles | String |  | vcf file contains sites of interest for genotyping (including HOM REF calls). Only used when haplotypecaller_allele or unifiedgenotyper_allele is used.  | haplotypecaller_allele, unifiedgenotyper_allele |
Leon Mei's avatar
Leon Mei committed
114 115
| shiva | use_indel_realigner | Boolean | true | Realign indels | all |
| shiva | use_base_recalibration | Boolean | true | Base recalibrate | all |
116 117 118
| shiva | use_analyze_covariates | Boolean | true | Analyze covariates during base recalibration step | all |
| shiva | bam_to_fastq | Boolean | false | Convert bam files to fastq files | Only used when input is a bam file |
| shiva | correct_readgroups | Boolean | false | Attempt to correct read groups | Only used when input is a bam file |
Peter van 't Hof's avatar
Peter van 't Hof committed
119 120
| shiva | amplicon_bed | Path |  | Path to target bed file | all |
| shiva | regions_of_interest | Array of paths |  | Array of paths to region of interest (e.g. gene panels) bed files | all |
Peter van 't Hof's avatar
Peter van 't Hof committed
121 122 123 124 125
| shivavariantcalling | gender_aware_calling | Boolean | false | Enables gander aware variantcalling | haplotypecaller_gvcf |
| shivavariantcalling | hap̦loid_regions | Bed file |  | Haploid regions for all genders | haplotypecaller_gvcf |
| shivavariantcalling | hap̦loid_regions_male | Bed file |  | Haploid regions for males | haplotypecaller_gvcf |
| shivavariantcalling | hap̦loid_regions_female | Bed file |  | Haploid regions for females | haplotypecaller_gvcf |
| shiva | amplicon_bed | Path |  | Path to target bed file | all |
126 127 128 129
| vcffilter | min_sample_depth | Integer | 8 | Filter variants with at least x coverage | raw |
| vcffilter | min_alternate_depth | Integer | 2 | Filter variants with at least x depth on the alternate allele | raw |
| vcffilter | min_samples_pass | Integer | 1 | Minimum amount of samples which pass custom filter (requires additional flags) | raw |
| vcffilter | filter_ref_calls | Boolean | true | Remove reference calls | raw |
Sander Bollen's avatar
Sander Bollen committed
130

Peter van 't Hof's avatar
Peter van 't Hof committed
131
Since Shiva uses the [Mapping](mapping.md) pipeline internally, mapping config values can be specified as well.
Sander Bollen's avatar
Sander Bollen committed
132 133
For all the options, please see the corresponding documentation for the mapping pipeline.

134
----
135

136
## Advanced usage
137

Peter van 't Hof's avatar
Peter van 't Hof committed
138 139 140 141 142 143 144 145
### Gender aware variantcalling

In Shiva and ShivaVariantcalling while using haplotypecaller_gvcf it is possible to do gender aware variantcalling. In this mode it required to supply bed files to define haploid regions (see config values). 
- For males is `hap̦loid_regions` and `hap̦loid_regions_male` used.
- For females is `hap̦loid_regions` and `hap̦loid_regions_female` used.

The pipeline will use a union of those files. At least 1 file is required while using this mode.

146
### Reporting modes
Sander Bollen's avatar
Sander Bollen committed
147 148 149 150

Shiva furthermore supports three modes. The default and recommended option is `multisample_variantcalling`.
During this mode, all bam files will be simultaneously called in one big VCF file. It will work with any number of samples.

151
Additionally, Shiva provides two separate modes that only work with a single sample.
Sander Bollen's avatar
Sander Bollen committed
152
Those are not recommended, but may be useful to those who need to validate replicates.
Sander Bollen's avatar
Sander Bollen committed
153 154 155 156 157 158 159 160

Mode `single_sample_variantcalling` calls a single sample as a merged bam file.
I.e., it will merge all libraries in one bam file, then calls on that.

The other mode, `library_variantcalling`, will call simultaneously call all library bam files.

The config for these therefore is:

Sander Bollen's avatar
Sander Bollen committed
161
| namespace | Name | Type | Default | Function |
Sander Bollen's avatar
Sander Bollen committed
162 163 164 165
| ----------- | ---- | ---- | ------- | -------- |
| shiva | multisample_variantcalling | Boolean | true | Default, multisample calling |
| shiva | single_sample_variantcalling | Boolean | false | Not-recommended, single sample, merged bam |
| shiva | library_variantcalling | Boolean | false | Not-recommended, single sample, per library |
Peter van 't Hof's avatar
Peter van 't Hof committed
166

Leon Mei's avatar
Leon Mei committed
167 168 169 170 171 172 173 174 175
### Additional metagenomics analysis

[Gears](gears.md) can be ran for the data analysed with `Shiva`. There are two stages at which this metagenomics sub-pipeline can be called 
and this should be specified in the [config](../general/config) file. To call Gears, please use the following config values.

*`mapping_to_gears: none` : Disable this functionality. (default)
*`mapping_to_gears: all` : Trimmed and clipped reads from [Flexiprep](flexiprep).
*`mapping_to_gears: unmapped` : Only send unmapped reads after alignment to Gears, e.g., a kind of "trash bin" analysis.

176
### Only variant calling
Sander Bollen's avatar
Sander Bollen committed
177

178 179 180
It is possible to run Shiva while only performing its variant calling steps starting from BAM files.
This has been separated in its own pipeline named `shivavariantcalling`. Different than running shiva which converts BAM files to fastq files first, 
shivavariantcalling will not perform any pre-processing and mapping steps. But just to call variants based on the input BAM files.
Sander Bollen's avatar
Sander Bollen committed
181

182 183 184
To view the help menu, execute:
~~~
biopet pipeline shivavariantcalling -h
Sander Bollen's avatar
Sander Bollen committed
185

186 187 188 189 190 191 192 193
Arguments for ShivaVariantcalling:
 -BAM,--inputbamsarg <inputbamsarg>    Bam files (should be deduped bams)
 -sample,--sampleid <sampleid>         Sample ID
 -library,--libid <libid>              Library ID
 -config,--config_file <config_file>   JSON / YAML config file(s)
 -cv,--config_value <config_value>     Config values, value should be formatted like 'key=value' or 
                                       'namespace:namespace:key=value'
 -DSC,--disablescatter                 Disable all scatters 
Peter van 't Hof's avatar
Peter van 't Hof committed
194

195
~~~
Peter van 't Hof's avatar
Peter van 't Hof committed
196

197 198 199 200 201
To run the pipeline:
~~~
biopet pipeline shivavariantcalling -config MySettings.yml -run
~~~

202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228
### Only Structural Variant calling
It is possible to run Shiva while only performing the Structural Variant calling steps starting from BAM files. For this, there is a separate pipeline named `ShivaSvCalling`. 
The difference from running Shiva, is that it will not convert the BAM files into fastq files first and it will omit any pre-processing or alignment steps. 
It will call SVs based on the input alignment (BAM) files. 

To view the help menu, type:
~~~
biopet pipeline ShivaSvCalling -h
 
Arguments for ShivaSvCalling:
 -BAM,--inputbamsarg <inputbamsarg>    Bam files (should be deduped bams)
 -sample,--sampleid <sampleid>         Sample ID
 -library,--libid <libid>              Library ID
 -config,--config_file <config_file>   JSON / YAML config file(s)
 -cv,--config_value <config_value>     Config values, value should be formatted like 'key=value' or 
                                       'namespace:namespace:key=value'
 -DSC,--disablescatter                 Disable all scatters
~~~ 
 
To run `ShivaSvCalling`, the user needs to supply the input BAM files from the command line using the `-BAM` flag. It is not possible to provide them in a sample sheet as a config file. 
No sample ID or library information is necessary.
  
To run the pipeline, you can type something like:
~~~
biopet pipeline ShivaSvCalling -BAM sampleA.bam -BAM sampleB.bam -config MySettings.yml -run
~~~

229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247
### Exome variant calling

If one calls variants with Shiva on exome samples and an ```amplicon_bed``` file is available, the user is able to add this file to the config file.
 When the file is given, the coverage over the positions in the bed file will be calculated plus the number of variants on each position. If there is an interest
 in a specific region of the genome/exome one is capable to give multiple ```regionOfInterest.bed``` files with the option ```regions_of_interest``` (in list/array format).
 
 A short recap: the option ```amplicon_bed``` can only be given one time and should be composed of the amplicon kit used to obtain the exome data.
 The option ```regions_of_interest``` can contain multiple bed files in ```list``` format and can contain any region a user wants. If multiple regions are given,
 the pipeline will make an coverage plot over each bed file separately. 

### VEP annotation

Shiva can be linked to our VEP based annotation pipeline to annotate the VCF files. 

**example config**
```yaml
toucan:
  vep_version: 86
  enable_scatter: false
Peter van 't Hof's avatar
Peter van 't Hof committed
248 249
```

250 251 252 253 254 255 256 257 258 259 260 261
### SV calling 

In addition to standard variant calling, Shiva also supports SV calling. 
One can enable this option by setting the `sv_calling` config option to `true`.

**example config**
```yaml
shiva:
    sv_calling: true
sv_callers:
- breakdancer
- delly
Peter van 't Hof's avatar
Peter van 't Hof committed
262
- clever
263 264 265 266 267 268 269 270 271 272 273
pysvtools:
  flanking: 100
```

### CNV calling 

In addition to standard variant calling, Shiva also supports CNV calling. 
One can enable this option by setting the `cnv_calling` config option to `true`.

For CNV calling Shiva uses the [Kopisu](kopisu.md) as a sub-pipeline. 
Please see the documentation for Kopisu.
Sander Bollen's avatar
Sander Bollen committed
274

275
**example config**
Sander Bollen's avatar
Sander Bollen committed
276 277 278 279 280 281 282 283 284 285 286 287 288
```yaml
shiva:
    cnv_calling: true
kopisu:
    use_cnmops_method: false
    use_freec_method: false
    use_xhmm_method: true
amplicon_bed: <path_to_bed>
xhmm:
    discover_params: <path_to_file>
    exe: <path_to_executable>
```

Peter van 't Hof's avatar
Peter van 't Hof committed
289
## References
290 291 292 293 294 295 296 297

* Shiva follows the best practices of GATK: [GATK best practices](https://www.broadinstitute.org/gatk/guide/best-practices)


## Getting Help

If you have any questions on running Shiva, suggestions on how to improve the overall flow, or requests for your favorite variant calling related program to be added, feel free to post an issue to our issue tracker at [GitHub](https://github.com/biopet/biopet).
Or contact us directly via: [SASC email](mailto:SASC@lumc.nl)