shiva.md 9.96 KB
Newer Older
Peter van 't Hof's avatar
Peter van 't Hof committed
1
2
3
4
5
# Shiva

## Introduction

This pipeline is build for variant calling on NGS data (preferably Illumina data).
Sander Bollen's avatar
Sander Bollen committed
6
It is based on the <a href="https://www.broadinstitute.org/gatk/guide/best-practices" target="_blank">best practices</a>) of GATK in terms of their approach to variant calling.
Peter van 't Hof's avatar
Peter van 't Hof committed
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
The pipeline accepts ```.fastq & .bam``` files as input.

----

## Tools for this pipeline

* <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>
* [Flexiprep](flexiprep.md)
* <a href="https://www.broadinstitute.org/gatk/" target="_blank">GATK tools</a>:
    * GATK
    * Freebayes
    * Bcftools
    * Samtools

----

## Example

Note that one should first create the appropriate [configs](../general/config.md).

27
28
29
30
31
32
### Sample input extensions

Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. 

Shiva is a special pipeline in the sense that it can also start directly from `bam` files. Note that one should alter the sample config field from `R1` into `bam`.

33
34
### Full pipeline

Sander Bollen's avatar
Sander Bollen committed
35
The full pipeline can start from fastq or from bam file. This pipeline will include pre-process steps for the bam files.
36

Sander Bollen's avatar
Sander Bollen committed
37
To view the help menu, execute:
Peter van 't Hof's avatar
Peter van 't Hof committed
38
~~~
39
biopet pipeline shiva -h
Peter van 't Hof's avatar
Peter van 't Hof committed
40
41
42
43
44
45
46
47
48
49

Arguments for Shiva:
 -sample,--onlysample <onlysample>               Only Sample
 -config,--config_file <config_file>             JSON config file(s)
 -DSC,--disablescatterdefault                    Disable all scatters

~~~

To run the pipeline:
~~~
50
biopet pipeline shiva -config MySamples.json -config MySettings.json -run
Peter van 't Hof's avatar
Peter van 't Hof committed
51
52
~~~

Sander Bollen's avatar
Sander Bollen committed
53
A dry run can be performed by simply removing the `-run` flag from the command line call.
Peter van 't Hof's avatar
Peter van 't Hof committed
54

Pappas's avatar
Pappas committed
55
56
57
58
59
60
[Gears](gears) is run automatically for the data analysed with `Shiva`. There are two levels on which this can be done and this should be specified in the [config](../general/config) file:

*`mapping_to_gears: unmapped` : Unmapped reads after alignment. (default)
*`mapping_to_gears: all` : Trimmed and clipped reads from [Flexiprep](flexiprep).
*`mapping_to_gears: none` : Disable this functionality.

Sander Bollen's avatar
Sander Bollen committed
61
### Only variant calling
62

Sander Bollen's avatar
Sander Bollen committed
63
64
65
It is possible to run Shiva while only performing its variant calling steps.
This has been separated in its own pipeline named `shivavariantcalling`.
As this calling pipeline starts from BAM files, it will naturally not perform any pre-processing steps.
66

Sander Bollen's avatar
Sander Bollen committed
67
To view the help menu, execute:
68
69
70
71
72
73
74
75
76
77
78
79
80
81
~~~
java -jar </path/to/biopet.jar> pipeline shivavariantcalling -h

Arguments for ShivaVariantcalling:
 -BAM,--inputbams <inputbams>          Bam files (should be deduped bams)
 -sample,--sampleid <sampleid>         Sample ID (only effects summary and not required)
 -library,--libid <libid>              Library ID (only effects summary and not required)
 -config,--config_file <config_file>   JSON config file(s)
 -DSC,--disablescatter                 Disable all scatters

~~~

To run the pipeline:
~~~
82
biopet pipeline shivavariantcalling -config MySettings.json -run
83
84
~~~

Sander Bollen's avatar
Sander Bollen committed
85
A dry run can be performed by simply removing the `-run` flag from the command line call.
86
87


Peter van 't Hof's avatar
Peter van 't Hof committed
88
89
----

Sander Bollen's avatar
Sander Bollen committed
90
91
92
## Variant caller
At this moment the following variant callers can be used

93
94
95
96
97
98
99
100
101
102
103
* <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller</a>
    * Running default HaplotypeCaller
* <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller_gvcf</a>
    * Running HaplotypeCaller in gvcf mode
* <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php">haplotypecaller_allele</a>
    * Only genotype a given list of alleles with HaplotypeCaller
* <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php">unifiedgenotyper</a>
    * Running default UnifiedGenotyper
* <a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php">unifiedgenotyper_allele</a>
    * Only genotype a given list of alleles with UnifiedGenotyper
* <a href="https://samtools.github.io/bcftools/bcftools.html">bcftools</a>
Peter van 't Hof's avatar
Peter van 't Hof committed
104
* <a href="https://samtools.github.io/bcftools/bcftools.html">bcftools_singlesample</a>
105
* <a href="https://github.com/ekg/freebayes">freebayes</a>
106
* <a href="http://varscan.sourceforge.net/">varscan</a>
107
* [raw](../tools/MpileupToVcf)
Peter van 't Hof's avatar
Peter van 't Hof committed
108

109

Peter van 't Hof's avatar
Peter van 't Hof committed
110
111
112
113
114
## Config options

To view all possible config options please navigate to our Gitlab wiki page
<a href="https://git.lumc.nl/biopet/biopet/wikis/GATK-Variantcalling-Pipeline" target="_blank">Config</a>

Sander Bollen's avatar
Sander Bollen committed
115
### Required settings
Sander Bollen's avatar
Sander Bollen committed
116
| Confignamespace | Name | Type | Default | Function |
Sander Bollen's avatar
Sander Bollen committed
117
| ----------- | ---- | ---- | ------- | -------- |
118
| - | output_dir | String |  | Path to output directory |
Sander Bollen's avatar
Sander Bollen committed
119
120
121
| Shiva | variantcallers | List[String] | | Which variant callers to use |


Peter van 't Hof's avatar
Peter van 't Hof committed
122
123
### Config options

Sander Bollen's avatar
Sander Bollen committed
124
| ConfignNamespace | Name |  Type | Default | Function |
Peter van 't Hof's avatar
Peter van 't Hof committed
125
| ----------- | ---- | ----- | ------- | -------- |
Peter van 't Hof's avatar
Peter van 't Hof committed
126
127
| shiva | species | String | unknown_species | Name of species, like H.sapiens |
| shiva | reference_name | String | unknown_reference_name | Name of reference, like hg19 |
Sander Bollen's avatar
Sander Bollen committed
128
| shiva | reference_fasta | String |  | reference to align to |
Peter van 't Hof's avatar
Peter van 't Hof committed
129
130
| shiva | dbsnp | String |  | vcf file of dbsnp records |
| shiva | variantcallers | List[String] |  | variantcaller to use, see list |
Sander Bollen's avatar
Sander Bollen committed
131
132
133
134
135
| shiva | use_indel_realigner | Boolean | true | Realign indels |
| shiva | use_base_recalibration | Boolean | true | Base recalibrate |
| shiva | use_analyze_covariates | Boolean | false | Analyze covariates during base recalibration step |
| shiva | bam_to_fastq | Boolean | false | Convert bam files to fastq files |
| shiva | correct_readgroups | Boolean | false | Attempt to correct read groups |
Sander Bollen's avatar
Sander Bollen committed
136
137
| shiva | amplicon_bed | Path | Path to target bed file |
| shiva | regions_of_interest | Array of paths | Array of paths to region of interest (e.g. gene panels) bed files |
Sander Bollen's avatar
Sander Bollen committed
138
139
140
141
142
| vcffilter | min_sample_depth | Integer | 8 | Filter variants with at least x coverage |
| vcffilter | min_alternate_depth | Integer | 2 | Filter variants with at least x depth on the alternate allele |
| vcffilter | min_samples_pass | Integer | 1 | Minimum amount of samples which pass custom filter (requires additional flags) |
| vcffilter | filter_ref_calls | Boolean | true | Remove reference calls |

Peter van 't Hof's avatar
Peter van 't Hof committed
143
Since Shiva uses the [Mapping](mapping.md) pipeline internally, mapping config values can be specified as well.
Sander Bollen's avatar
Sander Bollen committed
144
145
For all the options, please see the corresponding documentation for the mapping pipeline.

146
147
148
149
150
151
152
153
154
155
156
### Exome variant calling

If one calls variants with Shiva on exome samples and a ```amplicon_bed``` file is available, the user is able to add this file to the config file.
 When the file is given, the coverage over the positions in the bed file will be calculated plus the number of variants on each position. If there is an interest
 in a specific region of the genome/exome one is capable to give multiple ```regionOfInterest.bed``` files with the option ```regions_of_interest``` (in list/array format).
 
 A short recap: the option ```amplicon_bed``` can only be given one time and should be composed of the amplicon kit used to obtain the exome data.
 The option ```regions_of_interest``` can contain multiple bed files in ```list``` format and can contain any region a user wants. If multiple regions are given,
 the pipeline will make an coverage plot over each bed file separately. 
 

Sander Bollen's avatar
Sander Bollen committed
157
158
159
160
161
162
### Modes

Shiva furthermore supports three modes. The default and recommended option is `multisample_variantcalling`.
During this mode, all bam files will be simultaneously called in one big VCF file. It will work with any number of samples.

On top of that, Shiva provides two separate modes that only work with a single sample.
Sander Bollen's avatar
Sander Bollen committed
163
Those are not recommended, but may be useful to those who need to validate replicates.
Sander Bollen's avatar
Sander Bollen committed
164
165
166
167
168
169
170
171

Mode `single_sample_variantcalling` calls a single sample as a merged bam file.
I.e., it will merge all libraries in one bam file, then calls on that.

The other mode, `library_variantcalling`, will call simultaneously call all library bam files.

The config for these therefore is:

Sander Bollen's avatar
Sander Bollen committed
172
| namespace | Name | Type | Default | Function |
Sander Bollen's avatar
Sander Bollen committed
173
174
175
176
| ----------- | ---- | ---- | ------- | -------- |
| shiva | multisample_variantcalling | Boolean | true | Default, multisample calling |
| shiva | single_sample_variantcalling | Boolean | false | Not-recommended, single sample, merged bam |
| shiva | library_variantcalling | Boolean | false | Not-recommended, single sample, per library |
Peter van 't Hof's avatar
Peter van 't Hof committed
177

Sander Bollen's avatar
Sander Bollen committed
178
179
180
181
182
## CNV calling 

In addition to standard variant calling, Shiva also supports CNV calling. 
One can enable this option by setting the `cnv_calling` config option to `true`.

Sander Bollen's avatar
Sander Bollen committed
183
184
For CNV calling Shiva uses the [Kopisu](kopisu.md) as a module. 
Please see the documentation for Kopisu.  
Sander Bollen's avatar
Sander Bollen committed
185

Peter van 't Hof's avatar
Peter van 't Hof committed
186

Sander Bollen's avatar
Sander Bollen committed
187
## Example configs 
Peter van 't Hof's avatar
Peter van 't Hof committed
188
189
**Config example**

Peter van 't Hof's avatar
Peter van 't Hof committed
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
``` yaml
samples:
    SampleID:
        libraries:
            lib_id_1:
                bam: YourBam.bam
            lib_id_2:
                R1: file_R1.fq.gz
                R2: file_R2.fq.gz
dbsnp: <dbsnp.vcf.gz>
vcffilter:
    min_alternate_depth: 1
output_dir: <output directory>
variantcallers:
    - haplotypecaller
    - unifiedgenotyper
    - haplotypecaller_gvcf
Peter van 't Hof's avatar
Peter van 't Hof committed
207
208
unifiedgenotyper:
    merge_vcf_results: false # This will do the variantcalling but will not merged into the final vcf file
Peter van 't Hof's avatar
Peter van 't Hof committed
209
210
```

Sander Bollen's avatar
Sander Bollen committed
211
**Additional XHMM CNV calling example**
Sander Bollen's avatar
Sander Bollen committed
212
213
214
215
216
217
218
219
220
221
222
223
224
225

```yaml
shiva:
    cnv_calling: true
kopisu:
    use_cnmops_method: false
    use_freec_method: false
    use_xhmm_method: true
amplicon_bed: <path_to_bed>
xhmm:
    discover_params: <path_to_file>
    exe: <path_to_executable>
```

Peter van 't Hof's avatar
Peter van 't Hof committed
226
## References
227
228
229
230
231
232
233
234

* Shiva follows the best practices of GATK: [GATK best practices](https://www.broadinstitute.org/gatk/guide/best-practices)


## Getting Help

If you have any questions on running Shiva, suggestions on how to improve the overall flow, or requests for your favorite variant calling related program to be added, feel free to post an issue to our issue tracker at [GitHub](https://github.com/biopet/biopet).
Or contact us directly via: [SASC email](mailto:SASC@lumc.nl)