shiva.md 14.3 KB
Newer Older
Peter van 't Hof's avatar
Peter van 't Hof committed
1
2
3
4
# Shiva

## Introduction

5
This pipeline is built for variant calling on NGS data (preferably Illumina data). Part of this pipeline resembles the <a href="https://www.broadinstitute.org/gatk/guide/best-practices" target="_blank">best practices</a>) of GATK in terms of their approach to variant calling.
Peter van 't Hof's avatar
Peter van 't Hof committed
6
7
8
9
The pipeline accepts ```.fastq & .bam``` files as input.

----

10
## Overview of tools and sub-pipelines for this pipeline
Peter van 't Hof's avatar
Peter van 't Hof committed
11

12
13
14
15
16
* [Flexiprep for QC](flexiprep.md)
* [Metagenomics analysis](gears.md)
* [Mapping](mapping.md)
* [VEP annotation](toucan.md)
* [CNV analysis](kopisu.md)
Peter van 't Hof's avatar
Peter van 't Hof committed
17
* <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>
Mei's avatar
Mei committed
18
19
20
21
22
* <a href="https://www.broadinstitute.org/gatk/" target="_blank">GATK tools</a>
* <a href="https://github.com/ekg/freebayes" target="_blank">Freebayes</a>
* <a href="http://dkoboldt.github.io/varscan/" target="_blank">Varscan</a>
* <a href="https://samtools.github.io/bcftools/bcftools.html" target="_blank">Bcftools</a>
* <a href="http://www.htslib.org/" target="_blank">Samtools</a>
Peter van 't Hof's avatar
Peter van 't Hof committed
23
24
25

----

26
## Basic usage
Peter van 't Hof's avatar
Peter van 't Hof committed
27

28
Note that one should first create the appropriate sample and pipeline setting [configs](../general/config.md).
Peter van 't Hof's avatar
Peter van 't Hof committed
29

30
Shiva pipeline can start from FASTQ or BAM files. This pipeline will include pre-process steps for the BAM files. 
31

32
When using BAM files as input, Note that one should alter the sample config field from `R1` into `bam`.
33

Sander Bollen's avatar
Sander Bollen committed
34
To view the help menu, execute:
Peter van 't Hof's avatar
Peter van 't Hof committed
35
~~~
36
biopet pipeline shiva -h
37
 
Peter van 't Hof's avatar
Peter van 't Hof committed
38
Arguments for Shiva:
39
40
41
42
43
 -s,--sample <sample>                  Only Process This Sample
 -config,--config_file <config_file>   JSON / YAML config file(s)
 -cv,--config_value <config_value>     Config values, value should be formatted like 'key=value' or 
                                       'namespace:namespace:key=value'
 -DSC,--disablescatter                 Disable all scatters
Peter van 't Hof's avatar
Peter van 't Hof committed
44
45
46
47
48

~~~

To run the pipeline:
~~~
49
biopet pipeline shiva -config MySamples.yml -config MySettings.yml -run
Peter van 't Hof's avatar
Peter van 't Hof committed
50
51
~~~

Sander Bollen's avatar
Sander Bollen committed
52
A dry run can be performed by simply removing the `-run` flag from the command line call.
Peter van 't Hof's avatar
Peter van 't Hof committed
53

Pappas's avatar
Pappas committed
54

55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
An example of MySettings.yml file is provided here and more detailed config options are explained in [config options](#config-options).
``` yaml
samples:
    SampleID:
        libraries:
            lib_id_1:
                bam: YourBam.bam
            lib_id_2:
                R1: file_R1.fq.gz
                R2: file_R2.fq.gz
species: H.sapiens
reference_name: GRCh38_no_alt_analysis_set
dbsnp_vcf: <dbsnp.vcf.gz>
vcffilter:
    min_alternate_depth: 1
output_dir: <output directory>
variantcallers:
    - haplotypecaller
    - unifiedgenotyper
    - haplotypecaller_gvcf
unifiedgenotyper:
    merge_vcf_results: false # This will do the variantcalling but will not merged into the final vcf file
```
Peter van 't Hof's avatar
Peter van 't Hof committed
78
79
----

80
## Supported variant callers
Sander Bollen's avatar
Sander Bollen committed
81
82
At this moment the following variant callers can be used

83
84
85
86
87
88
89
90
91
92
93
94
| ConfigName | Tool | Description |
| ---------- | ---- | ----------- |
| haplotypecaller_gvcf | <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller</a> | Running HaplotypeCaller in gvcf mode |
| haplotypecaller_allele | <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller</a> | Only genotype a given list of alleles with HaplotypeCaller |
| unifiedgenotyper_allele | <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php">unifiedgenotyper</a> | Only genotype a given list of alleles with UnifiedGenotyper |
| unifiedgenotyper | <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php">unifiedgenotyper</a> | Running default UnifiedGenotyper |
| haplotypecaller | <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller</a> | Running default HaplotypeCaller |
| freebayes | <a href="https://github.com/ekg/freebayes">freebayes</a> |  |
| raw | [Naive variant caller](../tools/MpileupToVcf) |  |
| bcftools | <a href="https://samtools.github.io/bcftools/bcftools.html">bcftools</a> |  |
| bcftools_singlesample | <a href="https://samtools.github.io/bcftools/bcftools.html">bcftools</a> |  |
| varscan_cns_singlesample | <a href="http://varscan.sourceforge.net/">varscan</a> |  |
95

Peter van 't Hof's avatar
Peter van 't Hof committed
96
97
## Config options

Sander Bollen's avatar
Sander Bollen committed
98
### Required settings
Mei's avatar
Mei committed
99
| ConfigNamespace | Name | Type | Default | Function |
Sander Bollen's avatar
Sander Bollen committed
100
| ----------- | ---- | ---- | ------- | -------- |
101
| - | output_dir | String |  | Path to output directory |
Sander Bollen's avatar
Sander Bollen committed
102
103
| Shiva | variantcallers | List[String] | | Which variant callers to use |

Peter van 't Hof's avatar
Peter van 't Hof committed
104
105
### Config options

Mei's avatar
Mei committed
106
107
108
109
110
| ConfigNamespace | Name |  Type | Default | Function | Applicable variant caller |
| ----------- | ---- | ----- | ------- | -------- | -------- |
| shiva | species | String | unknown_species | Name of species, like H.sapiens | all |
| shiva | reference_name | String | unknown_reference_name | Name of reference, like hg19 | all |
| shiva | reference_fasta | String |  | reference to align to | all |
111
| shiva | dbsnp_vcf | String |  | vcf file of dbsnp records | haplotypecaller, haplotypecaller_gvcf, haplotypecaller_allele, unifiedgenotyper, unifiedgenotyper_allele |
Mei's avatar
Mei committed
112
| shiva | variantcallers | List[String] |  | variantcaller to use, see list | all |
113
| shiva | input_alleles | String |  | vcf file contains sites of interest for genotyping (including HOM REF calls). Only used when haplotypecaller_allele or unifiedgenotyper_allele is used.  | haplotypecaller_allele, unifiedgenotyper_allele |
Mei's avatar
Mei committed
114
115
| shiva | use_indel_realigner | Boolean | true | Realign indels | all |
| shiva | use_base_recalibration | Boolean | true | Base recalibrate | all |
116
117
118
| shiva | use_analyze_covariates | Boolean | true | Analyze covariates during base recalibration step | all |
| shiva | bam_to_fastq | Boolean | false | Convert bam files to fastq files | Only used when input is a bam file |
| shiva | correct_readgroups | Boolean | false | Attempt to correct read groups | Only used when input is a bam file |
Peter van 't Hof's avatar
Peter van 't Hof committed
119
120
| shiva | amplicon_bed | Path |  | Path to target bed file | all |
| shiva | regions_of_interest | Array of paths |  | Array of paths to region of interest (e.g. gene panels) bed files | all |
Peter van 't Hof's avatar
Peter van 't Hof committed
121
122
123
124
125
| shivavariantcalling | gender_aware_calling | Boolean | false | Enables gander aware variantcalling | haplotypecaller_gvcf |
| shivavariantcalling | hap̦loid_regions | Bed file |  | Haploid regions for all genders | haplotypecaller_gvcf |
| shivavariantcalling | hap̦loid_regions_male | Bed file |  | Haploid regions for males | haplotypecaller_gvcf |
| shivavariantcalling | hap̦loid_regions_female | Bed file |  | Haploid regions for females | haplotypecaller_gvcf |
| shiva | amplicon_bed | Path |  | Path to target bed file | all |
126
127
128
129
| vcffilter | min_sample_depth | Integer | 8 | Filter variants with at least x coverage | raw |
| vcffilter | min_alternate_depth | Integer | 2 | Filter variants with at least x depth on the alternate allele | raw |
| vcffilter | min_samples_pass | Integer | 1 | Minimum amount of samples which pass custom filter (requires additional flags) | raw |
| vcffilter | filter_ref_calls | Boolean | true | Remove reference calls | raw |
Sander Bollen's avatar
Sander Bollen committed
130

Peter van 't Hof's avatar
Peter van 't Hof committed
131
Since Shiva uses the [Mapping](mapping.md) pipeline internally, mapping config values can be specified as well.
Sander Bollen's avatar
Sander Bollen committed
132
133
For all the options, please see the corresponding documentation for the mapping pipeline.

134
----
135

136
## Advanced usage
137

Peter van 't Hof's avatar
Peter van 't Hof committed
138
139
140
### Gender aware variantcalling

In Shiva and ShivaVariantcalling while using haplotypecaller_gvcf it is possible to do gender aware variantcalling. In this mode it required to supply bed files to define haploid regions (see config values). 
Peter van 't Hof's avatar
Peter van 't Hof committed
141
142
- For males, `hap̦loid_regions` and `hap̦loid_regions_male` is used.
- For females, `hap̦loid_regions` and `hap̦loid_regions_female` is used.
Peter van 't Hof's avatar
Peter van 't Hof committed
143
144
145

The pipeline will use a union of those files. At least 1 file is required while using this mode.

146
### Reporting modes
Sander Bollen's avatar
Sander Bollen committed
147
148
149
150

Shiva furthermore supports three modes. The default and recommended option is `multisample_variantcalling`.
During this mode, all bam files will be simultaneously called in one big VCF file. It will work with any number of samples.

151
Additionally, Shiva provides two separate modes that only work with a single sample.
Sander Bollen's avatar
Sander Bollen committed
152
Those are not recommended, but may be useful to those who need to validate replicates.
Sander Bollen's avatar
Sander Bollen committed
153
154
155
156
157
158
159
160

Mode `single_sample_variantcalling` calls a single sample as a merged bam file.
I.e., it will merge all libraries in one bam file, then calls on that.

The other mode, `library_variantcalling`, will call simultaneously call all library bam files.

The config for these therefore is:

Sander Bollen's avatar
Sander Bollen committed
161
| namespace | Name | Type | Default | Function |
Sander Bollen's avatar
Sander Bollen committed
162
163
164
165
| ----------- | ---- | ---- | ------- | -------- |
| shiva | multisample_variantcalling | Boolean | true | Default, multisample calling |
| shiva | single_sample_variantcalling | Boolean | false | Not-recommended, single sample, merged bam |
| shiva | library_variantcalling | Boolean | false | Not-recommended, single sample, per library |
Peter van 't Hof's avatar
Peter van 't Hof committed
166

Mei's avatar
Mei committed
167
168
169
170
171
172
173
174
175
### Additional metagenomics analysis

[Gears](gears.md) can be ran for the data analysed with `Shiva`. There are two stages at which this metagenomics sub-pipeline can be called 
and this should be specified in the [config](../general/config) file. To call Gears, please use the following config values.

*`mapping_to_gears: none` : Disable this functionality. (default)
*`mapping_to_gears: all` : Trimmed and clipped reads from [Flexiprep](flexiprep).
*`mapping_to_gears: unmapped` : Only send unmapped reads after alignment to Gears, e.g., a kind of "trash bin" analysis.

176
### Only variant calling
Sander Bollen's avatar
Sander Bollen committed
177

178
179
180
It is possible to run Shiva while only performing its variant calling steps starting from BAM files.
This has been separated in its own pipeline named `shivavariantcalling`. Different than running shiva which converts BAM files to fastq files first, 
shivavariantcalling will not perform any pre-processing and mapping steps. But just to call variants based on the input BAM files.
Sander Bollen's avatar
Sander Bollen committed
181

182
183
184
To view the help menu, execute:
~~~
biopet pipeline shivavariantcalling -h
Sander Bollen's avatar
Sander Bollen committed
185

186
187
188
189
190
191
192
193
Arguments for ShivaVariantcalling:
 -BAM,--inputbamsarg <inputbamsarg>    Bam files (should be deduped bams)
 -sample,--sampleid <sampleid>         Sample ID
 -library,--libid <libid>              Library ID
 -config,--config_file <config_file>   JSON / YAML config file(s)
 -cv,--config_value <config_value>     Config values, value should be formatted like 'key=value' or 
                                       'namespace:namespace:key=value'
 -DSC,--disablescatter                 Disable all scatters 
Peter van 't Hof's avatar
Peter van 't Hof committed
194

195
~~~
Peter van 't Hof's avatar
Peter van 't Hof committed
196

197
198
199
200
201
To run the pipeline:
~~~
biopet pipeline shivavariantcalling -config MySettings.yml -run
~~~

Moustakas's avatar
Moustakas committed
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
### Only Structural Variant calling
It is possible to run Shiva while only performing the Structural Variant calling steps starting from BAM files. For this, there is a separate pipeline named `ShivaSvCalling`. 
The difference from running Shiva, is that it will not convert the BAM files into fastq files first and it will omit any pre-processing or alignment steps. 
It will call SVs based on the input alignment (BAM) files. 

To view the help menu, type:
~~~
biopet pipeline ShivaSvCalling -h
 
Arguments for ShivaSvCalling:
 -BAM,--inputbamsarg <inputbamsarg>    Bam files (should be deduped bams)
 -sample,--sampleid <sampleid>         Sample ID
 -library,--libid <libid>              Library ID
 -config,--config_file <config_file>   JSON / YAML config file(s)
 -cv,--config_value <config_value>     Config values, value should be formatted like 'key=value' or 
                                       'namespace:namespace:key=value'
 -DSC,--disablescatter                 Disable all scatters
~~~ 
 
To run `ShivaSvCalling`, the user needs to supply the input BAM files from the command line using the `-BAM` flag. It is not possible to provide them in a sample sheet as a config file. 
No sample ID or library information is necessary.
  
To run the pipeline, you can type something like:
~~~
biopet pipeline ShivaSvCalling -BAM sampleA.bam -BAM sampleB.bam -config MySettings.yml -run
~~~

229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
### Exome variant calling

If one calls variants with Shiva on exome samples and an ```amplicon_bed``` file is available, the user is able to add this file to the config file.
 When the file is given, the coverage over the positions in the bed file will be calculated plus the number of variants on each position. If there is an interest
 in a specific region of the genome/exome one is capable to give multiple ```regionOfInterest.bed``` files with the option ```regions_of_interest``` (in list/array format).
 
 A short recap: the option ```amplicon_bed``` can only be given one time and should be composed of the amplicon kit used to obtain the exome data.
 The option ```regions_of_interest``` can contain multiple bed files in ```list``` format and can contain any region a user wants. If multiple regions are given,
 the pipeline will make an coverage plot over each bed file separately. 

### VEP annotation

Shiva can be linked to our VEP based annotation pipeline to annotate the VCF files. 

**example config**
```yaml
toucan:
  vep_version: 86
  enable_scatter: false
Peter van 't Hof's avatar
Peter van 't Hof committed
248
249
```

250
251
252
253
254
255
256
257
258
259
260
261
### SV calling 

In addition to standard variant calling, Shiva also supports SV calling. 
One can enable this option by setting the `sv_calling` config option to `true`.

**example config**
```yaml
shiva:
    sv_calling: true
sv_callers:
- breakdancer
- delly
Peter van 't Hof's avatar
Peter van 't Hof committed
262
- clever
263
264
265
266
267
268
269
270
271
272
273
pysvtools:
  flanking: 100
```

### CNV calling 

In addition to standard variant calling, Shiva also supports CNV calling. 
One can enable this option by setting the `cnv_calling` config option to `true`.

For CNV calling Shiva uses the [Kopisu](kopisu.md) as a sub-pipeline. 
Please see the documentation for Kopisu.
Sander Bollen's avatar
Sander Bollen committed
274

275
**example config**
Sander Bollen's avatar
Sander Bollen committed
276
277
278
279
280
281
282
283
284
285
286
287
288
```yaml
shiva:
    cnv_calling: true
kopisu:
    use_cnmops_method: false
    use_freec_method: false
    use_xhmm_method: true
amplicon_bed: <path_to_bed>
xhmm:
    discover_params: <path_to_file>
    exe: <path_to_executable>
```

Peter van 't Hof's avatar
Peter van 't Hof committed
289
## References
290
291
292
293
294
295
296
297

* Shiva follows the best practices of GATK: [GATK best practices](https://www.broadinstitute.org/gatk/guide/best-practices)


## Getting Help

If you have any questions on running Shiva, suggestions on how to improve the overall flow, or requests for your favorite variant calling related program to be added, feel free to post an issue to our issue tracker at [GitHub](https://github.com/biopet/biopet).
Or contact us directly via: [SASC email](mailto:SASC@lumc.nl)