shiva.md 11.2 KB
Newer Older
Peter van 't Hof's avatar
Peter van 't Hof committed
1
2
3
4
5
# Shiva

## Introduction

This pipeline is build for variant calling on NGS data (preferably Illumina data).
Sander Bollen's avatar
Sander Bollen committed
6
It is based on the <a href="https://www.broadinstitute.org/gatk/guide/best-practices" target="_blank">best practices</a>) of GATK in terms of their approach to variant calling.
Peter van 't Hof's avatar
Peter van 't Hof committed
7
8
9
10
11
12
13
14
15
16
17
18
19
The pipeline accepts ```.fastq & .bam``` files as input.

----

## Tools for this pipeline

* <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>
* [Flexiprep](flexiprep.md)
* <a href="https://www.broadinstitute.org/gatk/" target="_blank">GATK tools</a>:
    * GATK
    * Freebayes
    * Bcftools
    * Samtools
Sander Bollen's avatar
Sander Bollen committed
20
21
22
* [FreeC](http://boevalab.com/FREEC/)
* [cn.mops](http://bioconductor.org/packages/release/bioc/html/cn.mops.html)
* [XHMM](http://atgu.mgh.harvard.edu/xhmm/tutorial.shtml)
Peter van 't Hof's avatar
Peter van 't Hof committed
23
24
25
26
27
28
29

----

## Example

Note that one should first create the appropriate [configs](../general/config.md).

30
31
32
33
34
35
### Sample input extensions

Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. 

Shiva is a special pipeline in the sense that it can also start directly from `bam` files. Note that one should alter the sample config field from `R1` into `bam`.

36
37
### Full pipeline

Sander Bollen's avatar
Sander Bollen committed
38
The full pipeline can start from fastq or from bam file. This pipeline will include pre-process steps for the bam files.
39

Sander Bollen's avatar
Sander Bollen committed
40
To view the help menu, execute:
Peter van 't Hof's avatar
Peter van 't Hof committed
41
~~~
42
biopet pipeline shiva -h
Peter van 't Hof's avatar
Peter van 't Hof committed
43
44
45
46
47
48
49
50
51
52

Arguments for Shiva:
 -sample,--onlysample <onlysample>               Only Sample
 -config,--config_file <config_file>             JSON config file(s)
 -DSC,--disablescatterdefault                    Disable all scatters

~~~

To run the pipeline:
~~~
53
biopet pipeline shiva -config MySamples.json -config MySettings.json -run
Peter van 't Hof's avatar
Peter van 't Hof committed
54
55
~~~

Sander Bollen's avatar
Sander Bollen committed
56
A dry run can be performed by simply removing the `-run` flag from the command line call.
Peter van 't Hof's avatar
Peter van 't Hof committed
57

Sander Bollen's avatar
Sander Bollen committed
58
### Only variant calling
59

Sander Bollen's avatar
Sander Bollen committed
60
61
62
It is possible to run Shiva while only performing its variant calling steps.
This has been separated in its own pipeline named `shivavariantcalling`.
As this calling pipeline starts from BAM files, it will naturally not perform any pre-processing steps.
63

Sander Bollen's avatar
Sander Bollen committed
64
To view the help menu, execute:
65
66
67
68
69
70
71
72
73
74
75
76
77
78
~~~
java -jar </path/to/biopet.jar> pipeline shivavariantcalling -h

Arguments for ShivaVariantcalling:
 -BAM,--inputbams <inputbams>          Bam files (should be deduped bams)
 -sample,--sampleid <sampleid>         Sample ID (only effects summary and not required)
 -library,--libid <libid>              Library ID (only effects summary and not required)
 -config,--config_file <config_file>   JSON config file(s)
 -DSC,--disablescatter                 Disable all scatters

~~~

To run the pipeline:
~~~
79
biopet pipeline shivavariantcalling -config MySettings.json -run
80
81
~~~

Sander Bollen's avatar
Sander Bollen committed
82
A dry run can be performed by simply removing the `-run` flag from the command line call.
83
84


Peter van 't Hof's avatar
Peter van 't Hof committed
85
86
----

Sander Bollen's avatar
Sander Bollen committed
87
88
89
## Variant caller
At this moment the following variant callers can be used

90
91
92
93
94
95
96
97
98
99
100
* <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller</a>
    * Running default HaplotypeCaller
* <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller_gvcf</a>
    * Running HaplotypeCaller in gvcf mode
* <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller_allele</a>
    * Only genotype a given list of alleles with HaplotypeCaller
* <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php">unifiedgenotyper</a>
    * Running default UnifiedGenotyper
* <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php">unifiedgenotyper_allele</a>
    * Only genotype a given list of alleles with UnifiedGenotyper
* <a href="https://samtools.github.io/bcftools/bcftools.html">bcftools</a>
Peter van 't Hof's avatar
Peter van 't Hof committed
101
* <a href="https://samtools.github.io/bcftools/bcftools.html">bcftools_singlesample</a>
102
103
* <a href="https://github.com/ekg/freebayes">freebayes</a>
* [raw](../tools/MpileupToVcf)
Peter van 't Hof's avatar
Peter van 't Hof committed
104
105
106
107
108
109

## Config options

To view all possible config options please navigate to our Gitlab wiki page
<a href="https://git.lumc.nl/biopet/biopet/wikis/GATK-Variantcalling-Pipeline" target="_blank">Config</a>

Sander Bollen's avatar
Sander Bollen committed
110
### Required settings
Sander Bollen's avatar
Sander Bollen committed
111
| Confignamespace | Name | Type | Default | Function |
Sander Bollen's avatar
Sander Bollen committed
112
| ----------- | ---- | ---- | ------- | -------- |
113
| - | output_dir | String |  | Path to output directory |
Sander Bollen's avatar
Sander Bollen committed
114
115
116
| Shiva | variantcallers | List[String] | | Which variant callers to use |


Peter van 't Hof's avatar
Peter van 't Hof committed
117
118
### Config options

Sander Bollen's avatar
Sander Bollen committed
119
| ConfignNamespace | Name |  Type | Default | Function |
Peter van 't Hof's avatar
Peter van 't Hof committed
120
| ----------- | ---- | ----- | ------- | -------- |
Peter van 't Hof's avatar
Peter van 't Hof committed
121
122
| shiva | species | String | unknown_species | Name of species, like H.sapiens |
| shiva | reference_name | String | unknown_reference_name | Name of reference, like hg19 |
Sander Bollen's avatar
Sander Bollen committed
123
| shiva | reference_fasta | String |  | reference to align to |
Peter van 't Hof's avatar
Peter van 't Hof committed
124
125
| shiva | dbsnp | String |  | vcf file of dbsnp records |
| shiva | variantcallers | List[String] |  | variantcaller to use, see list |
Sander Bollen's avatar
Sander Bollen committed
126
127
128
129
130
| shiva | use_indel_realigner | Boolean | true | Realign indels |
| shiva | use_base_recalibration | Boolean | true | Base recalibrate |
| shiva | use_analyze_covariates | Boolean | false | Analyze covariates during base recalibration step |
| shiva | bam_to_fastq | Boolean | false | Convert bam files to fastq files |
| shiva | correct_readgroups | Boolean | false | Attempt to correct read groups |
Sander Bollen's avatar
Sander Bollen committed
131
132
| shiva | amplicon_bed | Path | Path to target bed file |
| shiva | regions_of_interest | Array of paths | Array of paths to region of interest (e.g. gene panels) bed files |
Sander Bollen's avatar
Sander Bollen committed
133
134
135
136
137
| vcffilter | min_sample_depth | Integer | 8 | Filter variants with at least x coverage |
| vcffilter | min_alternate_depth | Integer | 2 | Filter variants with at least x depth on the alternate allele |
| vcffilter | min_samples_pass | Integer | 1 | Minimum amount of samples which pass custom filter (requires additional flags) |
| vcffilter | filter_ref_calls | Boolean | true | Remove reference calls |

Peter van 't Hof's avatar
Peter van 't Hof committed
138
Since Shiva uses the [Mapping](mapping.md) pipeline internally, mapping config values can be specified as well.
Sander Bollen's avatar
Sander Bollen committed
139
140
For all the options, please see the corresponding documentation for the mapping pipeline.

141
142
143
144
145
146
147
148
149
150
151
### Exome variant calling

If one calls variants with Shiva on exome samples and a ```amplicon_bed``` file is available, the user is able to add this file to the config file.
 When the file is given, the coverage over the positions in the bed file will be calculated plus the number of variants on each position. If there is an interest
 in a specific region of the genome/exome one is capable to give multiple ```regionOfInterest.bed``` files with the option ```regions_of_interest``` (in list/array format).
 
 A short recap: the option ```amplicon_bed``` can only be given one time and should be composed of the amplicon kit used to obtain the exome data.
 The option ```regions_of_interest``` can contain multiple bed files in ```list``` format and can contain any region a user wants. If multiple regions are given,
 the pipeline will make an coverage plot over each bed file separately. 
 

Sander Bollen's avatar
Sander Bollen committed
152
153
154
155
156
157
### Modes

Shiva furthermore supports three modes. The default and recommended option is `multisample_variantcalling`.
During this mode, all bam files will be simultaneously called in one big VCF file. It will work with any number of samples.

On top of that, Shiva provides two separate modes that only work with a single sample.
Sander Bollen's avatar
Sander Bollen committed
158
Those are not recommended, but may be useful to those who need to validate replicates.
Sander Bollen's avatar
Sander Bollen committed
159
160
161
162
163
164
165
166

Mode `single_sample_variantcalling` calls a single sample as a merged bam file.
I.e., it will merge all libraries in one bam file, then calls on that.

The other mode, `library_variantcalling`, will call simultaneously call all library bam files.

The config for these therefore is:

Sander Bollen's avatar
Sander Bollen committed
167
| namespace | Name | Type | Default | Function |
Sander Bollen's avatar
Sander Bollen committed
168
169
170
171
| ----------- | ---- | ---- | ------- | -------- |
| shiva | multisample_variantcalling | Boolean | true | Default, multisample calling |
| shiva | single_sample_variantcalling | Boolean | false | Not-recommended, single sample, merged bam |
| shiva | library_variantcalling | Boolean | false | Not-recommended, single sample, per library |
Peter van 't Hof's avatar
Peter van 't Hof committed
172

Sander Bollen's avatar
Sander Bollen committed
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
## CNV calling 

In addition to standard variant calling, Shiva also supports CNV calling. 
One can enable this option by setting the `cnv_calling` config option to `true`.

For CNV calling, Shiva supports three methods, each with different use cases: 
 
 * FreeC: Use for WGS or Exomes. For WGS, a control sample set is not required
 * Cn.mops: Use for Exomes. Must be used on a set of samples, preferably with N>20. Do not use on low-coverage samples.
 * XHMM: Use for Exomes or targeted approach. Number of target regions _must_ be larger than the amount of samples. Use with N >= 40.
 
All three methods can be run concurrently. However, we do not yet provide any merging of calls, since output formats vary widely.
 
To use any of these methods, you must enable them in the config. 
The options to do so are as follows:

| namespace | Name | Type | Default | Function | 
| --------- | ---- | ---- | ------- | -------- |
| kopisu | use_freec_method | Boolean | true | Enable Freec |
| kopisu | use_cnmops_method | Boolean | false | Enable Cn.mops
| kopisu | use_xhmm_method | Boolean | false | Enable XHMM | 

### Freec 

TODO

### Cn.mops

TODO


### XHMM 

When using the XHMM method, one _must_ provide a target bed file. 
XHMM cannot work without it. 
Additionally, the XHMM method requires the path a parameters file for XHMM. 
Please see the XHMM website for what this file should contain.
 
This means the following config values are required:

| Namespace | Name | Type | Default | Meaning |
| --------- | ---- | ---- | ------- | ------- |
| - | amplicon_bed | path | - | Path to target bed file |
| xhmm | discover_params | path | - | Path to XHMM params file |

It is recommended you use at least 40 samples with this method. 
One should also have more _covered_ target regions than there are samples.
This means this method is not suited for very small target kits. 

Please note that it is _not_ possible to run this method without GATK dependencies. 

Peter van 't Hof's avatar
Peter van 't Hof committed
224

Sander Bollen's avatar
Sander Bollen committed
225
## Example configs 
Peter van 't Hof's avatar
Peter van 't Hof committed
226
227
**Config example**

Peter van 't Hof's avatar
Peter van 't Hof committed
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
``` yaml
samples:
    SampleID:
        libraries:
            lib_id_1:
                bam: YourBam.bam
            lib_id_2:
                R1: file_R1.fq.gz
                R2: file_R2.fq.gz
dbsnp: <dbsnp.vcf.gz>
vcffilter:
    min_alternate_depth: 1
output_dir: <output directory>
variantcallers:
    - haplotypecaller
    - unifiedgenotyper
    - haplotypecaller_gvcf
Peter van 't Hof's avatar
Peter van 't Hof committed
245
246
```

Sander Bollen's avatar
Sander Bollen committed
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
**Additional XHMM example**

```yaml
shiva:
    cnv_calling: true
kopisu:
    use_cnmops_method: false
    use_freec_method: false
    use_xhmm_method: true
amplicon_bed: <path_to_bed>
xhmm:
    discover_params: <path_to_file>
    exe: <path_to_executable>
```

Peter van 't Hof's avatar
Peter van 't Hof committed
262
## References
263
264
265
266
267
268
269
270

* Shiva follows the best practices of GATK: [GATK best practices](https://www.broadinstitute.org/gatk/guide/best-practices)


## Getting Help

If you have any questions on running Shiva, suggestions on how to improve the overall flow, or requests for your favorite variant calling related program to be added, feel free to post an issue to our issue tracker at [GitHub](https://github.com/biopet/biopet).
Or contact us directly via: [SASC email](mailto:SASC@lumc.nl)