GATK-pipeline.md 4.35 KB
Newer Older
1
2
3
4
5
# GATK-pipeline

## Introduction

The GATK-pipeline is build for variant calling on NGS data (preferably Illumina data).
6
It is based on the <a href="https://www.broadinstitute.org/gatk/guide/best-practices" target="_blank">best practices</a>) of GATK in terms of there approach to variant calling.
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
The pipeline accepts ```.fastq & .bam``` files as input.

## Tools for this pipeline

* <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>
* [Flexiprep](flexiprep.md)
* <a href="https://www.broadinstitute.org/gatk/" target="_blank">GATK tools</a>:
    * Realignertargetcreator
    * Indelrealigner
    * Baserecalibrator
    * Printreads
    * Splitncigarreads
    * Haplotypecaller
    * Variantrecalibrator
    * Applyrecalibration
    * Genotypegvcfs
    * Variantannotator

## Example

Note that one should first create the appropriate [configs](../config.md).

To get the help menu:
~~~
java -jar Biopet.0.2.0.jar pipeline gatkPipeline -h

Arguments for GatkPipeline:
 -outDir,--output_directory <output_directory>   Output directory
 -sample,--onlysample <onlysample>               Only Sample
 -skipgenotyping,--skipgenotyping                Skip Genotyping step
 -mergegvcfs,--mergegvcfs                        Merge gvcfs
 -jointVariantCalling,--jointvariantcalling      Joint variantcalling
 -jointGenotyping,--jointgenotyping              Joint genotyping
 -config,--config_file <config_file>             JSON config file(s)
 -DSC,--disablescatterdefault                    Disable all scatters

~~~

To run the pipeline:
~~~
java -jar Biopet.0.2.0.jar pipeline gatkPipeline -run -config MySamples.json -config MySettings.json -outDir myOutDir
~~~
49
To check if your pipeline can create all the jobs (dry run) remove the `-run`:
50
~~~
51
java -jar Biopet.0.2.0.jar pipeline gatkPipeline -config MySamples.json -config MySettings.json -outDir myOutDir
52
53
~~~

54
## Results
55
56

### Result files
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
~~~
.
└── samples
    ├── my_sample1
    │   ├── run_lib1
    │   │   ├── chunks
    │   │   │   ├── 1
    │   │   │       └── flexiprep
    │   │   │
    │   │   │
    │   │   │
    │   │   ├── flexiprep
    │   │   │   ├── input.R1.fastqc
    │   │   │   │   └── input.R1_fastqc
    │   │   │   │       ├── Icons
    │   │   │   │       └── Images
    │   │   │   ├── input.R1.qc.fastqc
    │   │   │   │   └── input.R1.qc_fastqc
    │   │   │   │       ├── Icons
    │   │   │   │       └── Images
    │   │   │   ├── input.R2.fastqc
    │   │   │   │   └── input.R2_fastqc
    │   │   │   │       ├── Icons
    │   │   │   │       └── Images
    │   │   │   └── input.R2.qc.fastqc
    │   │   │       └── input.R2.qc_fastqc
    │   │   │           ├── Icons
    │   │   │           └── Images
    │   │   └── metrics
    │   ├── run_lib2
    │   │   ├── chunks
    │   │   │   ├── 1
    │   │   │       └── flexiprep
    │   │   │
    │   │   ├── flexiprep
    │   │   │   ├── input.R1.fastqc
    │   │   │   │   └── input.R1_fastqc
    │   │   │   │       ├── Icons
    │   │   │   │       └── Images
    │   │   │   ├── input.R1.qc.fastqc
    │   │   │   │   └── input.R1.qc_fastqc
    │   │   │   │       ├── Icons
    │   │   │   │       └── Images
    │   │   │   ├── input.R2.fastqc
    │   │   │   │   └── input.R2_fastqc
    │   │   │   │       ├── Icons
    │   │   │   │       └── Images
    │   │   │   └── input.R2.qc.fastqc
    │   │   │       └── input.R2.qc_fastqc
    │   │   │           ├── Icons
    │   │   │           └── Images
    │   │   └── metrics
    │   └── variantcalling
~~~

112
113
114
115

### Best practice

## References