gears.md 4.51 KB
Newer Older
1
# Gears
Wai Yi Leung's avatar
Wai Yi Leung committed
2
3
4
5
6

## Introduction
Gears is a metagenomics pipeline. (``GE``nome ``A``nnotation of ``R``esidual ``S``equences). One can use this pipeline to identify contamination in sequencing runs on either raw FastQ files or BAM files.
In case of BAM file as input, it will extract the unaligned read(pair) sequences for analysis.

Peter van 't Hof's avatar
Peter van 't Hof committed
7
Analysis result is reported in a krona graph, which is visible and navigatable in a webbrowser.
Wai Yi Leung's avatar
Wai Yi Leung committed
8
9

Pipeline analysis components include:
10
11

 - [Centrifuge](https://github.com/infphilo/centrifuge)
Peter van 't Hof's avatar
Peter van 't Hof committed
12
13
 - [Kraken, DerrickWood](https://github.com/DerrickWood/kraken)
 - [Qiime closed reference](http://qiime.org)
Peter van 't Hof's avatar
Peter van 't Hof committed
14
 - [Qiime open reference](http://qiime.org)
Peter van 't Hof's avatar
Peter van 't Hof committed
15
16
 - [Qiime rtax](http://qiime.org) (**Experimental**)
 - SeqCount (**Experimental**)
Wai Yi Leung's avatar
Wai Yi Leung committed
17

Peter van 't Hof's avatar
Peter van 't Hof committed
18
## Gears
Wai Yi Leung's avatar
Wai Yi Leung committed
19

Peter van 't Hof's avatar
Peter van 't Hof committed
20
This pipeline is used to analyse a group of samples. This pipeline only accepts fastq files. The fastq files first get trimmed and clipped with [Flexiprep](Flexiprep). This can be disabled with the config flags of [Flexiprep](Flexiprep). The samples can be specified with a sample config file, see [Config](../general/Config)
Wai Yi Leung's avatar
Wai Yi Leung committed
21

Peter van 't Hof's avatar
Peter van 't Hof committed
22
### Config
Wai Yi Leung's avatar
Wai Yi Leung committed
23

Peter van 't Hof's avatar
Peter van 't Hof committed
24
25
| Key | Type | default | Function |
| --- | ---- | ------- | -------- |
26
27
| gears_use_centrifuge | Boolean | true | Run fastq files with centrifuge |
| gears_use_kraken | Boolean | false | Run fastq files with kraken |
Peter van 't Hof's avatar
Peter van 't Hof committed
28
| gears_use_qiime_closed | Boolean | false | Run fastq files with qiime with the closed reference module |
Peter van 't Hof's avatar
Peter van 't Hof committed
29
| gears_use_qiime_open | Boolean | false | Run fastq files with qiime with the open reference module |
Peter van 't Hof's avatar
Peter van 't Hof committed
30
31
32
33
| gears_use_qiime_rtax | Boolean | false |  Run fastq files with qiime with the rtax module |
| gears_use_seq_count | Boolean | false | Produces raw count files |

### Example
Wai Yi Leung's avatar
Wai Yi Leung committed
34

Peter van 't Hof's avatar
Peter van 't Hof committed
35
36
37
38
39
To start the pipeline (remove `-run` for a dry run):

``` bash
biopet pipeline Gears -run  \
-config mySettings.json -config samples.json
Wai Yi Leung's avatar
Wai Yi Leung committed
40
41
```

Peter van 't Hof's avatar
Peter van 't Hof committed
42
43
44
## GearsSingle

This pipeline can be used to analyse a single sample, this can be fastq files or a bam file. When a bam file is given only the unmapped reads are extracted.
Wai Yi Leung's avatar
Wai Yi Leung committed
45

Peter van 't Hof's avatar
Peter van 't Hof committed
46
### Example
Wai Yi Leung's avatar
Wai Yi Leung committed
47
48
49
50

To start the pipeline (remove `-run` for a dry run):

``` bash
Peter van 't Hof's avatar
Peter van 't Hof committed
51
biopet pipeline GearsSingle -run  \
Wai Yi Leung's avatar
Wai Yi Leung committed
52
53
54
55
-R1 myFirstReadPair -R2 mySecondReadPair -sample mySampleName \
-library myLibname -config mySettings.json
```

Peter van 't Hof's avatar
Peter van 't Hof committed
56
### Commandline flags
Wai Yi Leung's avatar
Wai Yi Leung committed
57
58
59
60
61
62
63
64
65
66
67
For technical reasons, single sample pipelines, such as this pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.

Command line flags for Gears are:

| Flag  (short)| Flag (long) | Type | Function |
| ------------ | ----------- | ---- | -------- |
| -R1 | --input_r1 | Path (optional) | Path to input fastq file |
| -R2 | --input_r2 | Path (optional) | Path to second read pair fastq file. |
| -bam | --bamfile | Path (optional) | Path to bam file. |
| -sample | --sampleid | String (**required**) | Name of sample |
Peter van 't Hof's avatar
Peter van 't Hof committed
68
| -library | --libid | String (optional) | Name of library |
Wai Yi Leung's avatar
Wai Yi Leung committed
69

70
If `-R2` is given, the pipeline will assume a paired-end setup. `-bam` is mutually exclusive with the `-R1` and `-R2` flags. Either specify `-bam` or `-R1` and/or `-R2`.
Wai Yi Leung's avatar
Wai Yi Leung committed
71

72
73
74
75
### Sample input extensions

Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. 

Wai Yi Leung's avatar
Wai Yi Leung committed
76
77
### Config

Peter van 't Hof's avatar
Peter van 't Hof committed
78
79
80
81
| Key | Type | default | Function |
| --- | ---- | ------- | -------- |
| gears_use_kraken | Boolean | true | Run fastq file with kraken |
| gears_use_qiime_closed | Boolean | false | Run fastq files with qiime with the closed reference module |
Peter van 't Hof's avatar
Peter van 't Hof committed
82
| gears_use_qiime_open | Boolean | false | Run fastq files with qiime with the open reference module |
Peter van 't Hof's avatar
Peter van 't Hof committed
83
84
| gears_use_qiime_rtax | Boolean | false |  Run fastq files with qiime with the rtax module |
| gears_use_seq_count | Boolean | false | Produces raw count files |
Wai Yi Leung's avatar
Wai Yi Leung committed
85

Peter van 't Hof's avatar
Peter van 't Hof committed
86
### Result files
Wai Yi Leung's avatar
Wai Yi Leung committed
87

Peter van 't Hof's avatar
Peter van 't Hof committed
88
The results of `GearsSingle` are stored in the following files:
Wai Yi Leung's avatar
Wai Yi Leung committed
89
90
91
92
93
94
95
96
97
98
99
100

| File suffix | Application | Content | Description |
| ----------- | ----------- | ------- | ----------- |
| *.krkn.raw  | kraken      | tsv     | Annotation per sequence |
| *.krkn.full | kraken-report | tsv | List of all annotation possible with counts filled in for this specific sample|
| *.krkn.json | krakenreport2json| json | JSON representation of the taxonomy report, for postprocessing |

In a seperate `report` folder, one can find the html report displaying the summary and providing a navigation view on the taxonomy graph and (its) result.

## Getting Help
For questions about this pipeline and suggestions, we have a GitHub page where you can submit your ideas and thoughts .[GitHub](https://github.com/biopet/biopet).
Or contact us directly via: [SASC email](mailto:SASC@lumc.nl)