gears.md 4.46 KB
Newer Older
1
# Gears
Wai Yi Leung's avatar
Wai Yi Leung committed
2
3
4
5
6

## Introduction
Gears is a metagenomics pipeline. (``GE``nome ``A``nnotation of ``R``esidual ``S``equences). One can use this pipeline to identify contamination in sequencing runs on either raw FastQ files or BAM files.
In case of BAM file as input, it will extract the unaligned read(pair) sequences for analysis.

Peter van 't Hof's avatar
Peter van 't Hof committed
7
Analysis result is reported in a krona graph, which is visible and navigatable in a webbrowser.
Wai Yi Leung's avatar
Wai Yi Leung committed
8
9
10

Pipeline analysis components include:
 
Peter van 't Hof's avatar
Peter van 't Hof committed
11
12
 - [Kraken, DerrickWood](https://github.com/DerrickWood/kraken)
 - [Qiime closed reference](http://qiime.org)
Peter van 't Hof's avatar
Peter van 't Hof committed
13
 - [Qiime open reference](http://qiime.org)
Peter van 't Hof's avatar
Peter van 't Hof committed
14
15
 - [Qiime rtax](http://qiime.org) (**Experimental**)
 - SeqCount (**Experimental**)
Wai Yi Leung's avatar
Wai Yi Leung committed
16

Peter van 't Hof's avatar
Peter van 't Hof committed
17
## Gears
Wai Yi Leung's avatar
Wai Yi Leung committed
18

Peter van 't Hof's avatar
Peter van 't Hof committed
19
This pipeline is used to analyse a group of samples. This pipeline only accepts fastq files. The fastq files first get trimmed and clipped with [Flexiprep](Flexiprep). This can be disabled with the config flags of [Flexiprep](Flexiprep). The samples can be specified with a sample config file, see [Config](../general/Config)
Wai Yi Leung's avatar
Wai Yi Leung committed
20

Peter van 't Hof's avatar
Peter van 't Hof committed
21
### Config
Wai Yi Leung's avatar
Wai Yi Leung committed
22

Peter van 't Hof's avatar
Peter van 't Hof committed
23
24
| Key | Type | default | Function |
| --- | ---- | ------- | -------- |
Pappas's avatar
Pappas committed
25
26
| gears_use_centrifuge | Boolean | true | Run fastq file with centrifuge |
| gears_use_kraken | Boolean | false | Run fastq file with kraken |
Peter van 't Hof's avatar
Peter van 't Hof committed
27
| gears_use_qiime_closed | Boolean | false | Run fastq files with qiime with the closed reference module |
Peter van 't Hof's avatar
Peter van 't Hof committed
28
| gears_use_qiime_open | Boolean | false | Run fastq files with qiime with the open reference module |
Peter van 't Hof's avatar
Peter van 't Hof committed
29
30
31
32
| gears_use_qiime_rtax | Boolean | false |  Run fastq files with qiime with the rtax module |
| gears_use_seq_count | Boolean | false | Produces raw count files |

### Example
Wai Yi Leung's avatar
Wai Yi Leung committed
33

Peter van 't Hof's avatar
Peter van 't Hof committed
34
35
36
37
38
To start the pipeline (remove `-run` for a dry run):

``` bash
biopet pipeline Gears -run  \
-config mySettings.json -config samples.json
Wai Yi Leung's avatar
Wai Yi Leung committed
39
40
```

Peter van 't Hof's avatar
Peter van 't Hof committed
41
42
43
## GearsSingle

This pipeline can be used to analyse a single sample, this can be fastq files or a bam file. When a bam file is given only the unmapped reads are extracted.
Wai Yi Leung's avatar
Wai Yi Leung committed
44

Peter van 't Hof's avatar
Peter van 't Hof committed
45
### Example
Wai Yi Leung's avatar
Wai Yi Leung committed
46
47
48
49

To start the pipeline (remove `-run` for a dry run):

``` bash
Peter van 't Hof's avatar
Peter van 't Hof committed
50
biopet pipeline GearsSingle -run  \
Wai Yi Leung's avatar
Wai Yi Leung committed
51
52
53
54
-R1 myFirstReadPair -R2 mySecondReadPair -sample mySampleName \
-library myLibname -config mySettings.json
```

Peter van 't Hof's avatar
Peter van 't Hof committed
55
### Commandline flags
Wai Yi Leung's avatar
Wai Yi Leung committed
56
57
58
59
60
61
62
63
64
65
66
For technical reasons, single sample pipelines, such as this pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.

Command line flags for Gears are:

| Flag  (short)| Flag (long) | Type | Function |
| ------------ | ----------- | ---- | -------- |
| -R1 | --input_r1 | Path (optional) | Path to input fastq file |
| -R2 | --input_r2 | Path (optional) | Path to second read pair fastq file. |
| -bam | --bamfile | Path (optional) | Path to bam file. |
| -sample | --sampleid | String (**required**) | Name of sample |
Peter van 't Hof's avatar
Peter van 't Hof committed
67
| -library | --libid | String (optional) | Name of library |
Wai Yi Leung's avatar
Wai Yi Leung committed
68
69
70

If `-R2` is given, the pipeline will assume a paired-end setup. `-bam` is mutualy exclusive with the `-R1` and `-R2` flags. Either specify `-bam` or `-R1` and/or `-R2`.

71
72
73
74
### Sample input extensions

Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. 

Wai Yi Leung's avatar
Wai Yi Leung committed
75
76
### Config

Peter van 't Hof's avatar
Peter van 't Hof committed
77
78
79
80
| Key | Type | default | Function |
| --- | ---- | ------- | -------- |
| gears_use_kraken | Boolean | true | Run fastq file with kraken |
| gears_use_qiime_closed | Boolean | false | Run fastq files with qiime with the closed reference module |
Peter van 't Hof's avatar
Peter van 't Hof committed
81
| gears_use_qiime_open | Boolean | false | Run fastq files with qiime with the open reference module |
Peter van 't Hof's avatar
Peter van 't Hof committed
82
83
| gears_use_qiime_rtax | Boolean | false |  Run fastq files with qiime with the rtax module |
| gears_use_seq_count | Boolean | false | Produces raw count files |
Wai Yi Leung's avatar
Wai Yi Leung committed
84

Peter van 't Hof's avatar
Peter van 't Hof committed
85
### Result files
Wai Yi Leung's avatar
Wai Yi Leung committed
86

Peter van 't Hof's avatar
Peter van 't Hof committed
87
The results of `GearsSingle` are stored in the following files:
Wai Yi Leung's avatar
Wai Yi Leung committed
88
89
90
91
92
93
94
95
96
97
98
99

| File suffix | Application | Content | Description |
| ----------- | ----------- | ------- | ----------- |
| *.krkn.raw  | kraken      | tsv     | Annotation per sequence |
| *.krkn.full | kraken-report | tsv | List of all annotation possible with counts filled in for this specific sample|
| *.krkn.json | krakenreport2json| json | JSON representation of the taxonomy report, for postprocessing |

In a seperate `report` folder, one can find the html report displaying the summary and providing a navigation view on the taxonomy graph and (its) result.

## Getting Help
For questions about this pipeline and suggestions, we have a GitHub page where you can submit your ideas and thoughts .[GitHub](https://github.com/biopet/biopet).
Or contact us directly via: [SASC email](mailto:SASC@lumc.nl)