mapping.md 5.92 KB
Newer Older
bow's avatar
bow committed
1
2
3
4
# Mapping

## Introduction

sajvanderzeeuw's avatar
sajvanderzeeuw committed
5
6
7
The mapping pipeline has been created for NGS users who want to align there data with the most commonly used alignment programs.
The pipeline performs a quality control (QC) on the raw fastq files with our [Flexiprep](flexiprep.md) pipeline. 
After the QC, the pipeline simply maps the reads with the chosen aligner. The resulting BAM files will be sorted on coordinates and indexed, for downstream analysis.
8

sajvanderzeeuw's avatar
sajvanderzeeuw committed
9
10
11
12
## Tools for this pipeline:

* [Flexiprep](flexiprep.md)
* Alignment programs:
Peter van 't Hof's avatar
Peter van 't Hof committed
13
14
    * <a href="http://bio-bwa.sourceforge.net/bwa.shtml" target="_blank">Bwa mem</a>
    * <a href="http://bio-bwa.sourceforge.net/bwa.shtml" target="_blank">Bwa aln</a>
sajvanderzeeuw's avatar
sajvanderzeeuw committed
15
16
    * <a href="http://bowtie-bio.sourceforge.net/index.shtml" target="_blank">Bowtie version 1.1.1</a>
    * <a href="http://www.well.ox.ac.uk/project-stampy" target="_blank">Stampy</a>
Peter van 't Hof's avatar
Peter van 't Hof committed
17
18
    * <a href="http://research-pub.gene.com/gmap/" target="_blank">Gsnap</a>
    * <a href="https://ccb.jhu.edu/software/tophat" target="_blank">TopHat</a>
Peter van 't Hof's avatar
Peter van 't Hof committed
19
    * <a href="https://ccb.jhu.edu/software/hisat2/index.shtml" target="_blank">Hisat2</a>
sajvanderzeeuw's avatar
sajvanderzeeuw committed
20
21
22
23
    * <a href="https://github.com/alexdobin/STAR" target="_blank">Star</a>
    * <a href="https://github.com/alexdobin/STAR" target="_blank">Star-2pass</a>
* <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>

Sander Bollen's avatar
Sander Bollen committed
24
25
26
27
28
29
30
31
## Configuration and flags
For technical reasons, single sample pipelines, such as this mapping pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.

Command line flags for the mapping pipeline are:

| Flag  (short)| Flag (long) | Type | Function |
| ------------ | ----------- | ---- | -------- |
Peter van 't Hof's avatar
Peter van 't Hof committed
32
33
| -R1 | --inputR1 | Path (**required**) | Path to input fastq file |
| -R2 | --inputR2 | Path (optional) | Path to second read pair fastq file. |
Sander Bollen's avatar
Sander Bollen committed
34
35
36
37
38
| -sample | --sampleid | String (**required**) | Name of sample |
| -library | --libid | String (**required**) | Name of library |

If `-R2` is given, the pipeline will assume a paired-end setup.

39
40
41
42
43
### Sample input extensions

It is a good idea to check the format of your input files before starting any pipeline. Since the pipeline expects a specific format based on the file extensions.
So for example if one inputs files with a `fastq | fq` extension the pipeline expects an unzipped `fastq` file. When the extension ends with `fastq.gz | fq.gz` the pipeline expects a bgzipped or gzipped `fastq` file.

Sander Bollen's avatar
Sander Bollen committed
44
45
46
47
48
49
### Config

All other values should be provided in the config. Specific config values towards the mapping pipeline are:

| Name | Type | Function |
| ---- | ---- | -------- |
50
51
| output_dir | Path (**required**) | directory for output files |
| reference_fasta | Path (**required**) | Path to indexed fasta file to be used as reference |
Giannis Moustakas's avatar
Giannis Moustakas committed
52
| aligner | String (optional) | Which aligner to use. Defaults to `bwa`. Choose from [`bwa-mem`, `bwa-aln`, `bowtie`, `bowtie2`, `gsnap`, `tophat`, `stampy`, `star`, `star-2pass`, `hisat2`] |
Sander Bollen's avatar
Sander Bollen committed
53
54
55
56
| skip_flexiprep | Boolean (optional) | Whether to skip the flexiprep QC step (default = False) |
| skip_markduplicates | Boolean (optional) | Whether to skip the Picard Markduplicates step (default = False) |
| skip_metrics | Boolean (optional) | Whether to skip the metrics gathering step (default = False) |
| platform | String (optional) | Read group Platform (defaults to `illumina`)|
57
58
59
60
| platform_unit | String (optional) | Read group platform unit |
| readgroup_sequencing_center | String (optional) | Read group sequencing center |
| readgroup_description | String (optional) | Read group description |
| predicted_insertsize | Integer (optional) | Read group predicted insert size |
61
| keep_final_bam_file | Boolean (default true) | when needed the pipeline can remove the bam file after it's not required anymore for other jobs |
Sander Bollen's avatar
Sander Bollen committed
62
63
64
65

It is possible to provide any config value as a command line argument as well, using the `-cv` flag.
E.g. `-cv reference=<path/to/reference>` would set value `reference`.

sajvanderzeeuw's avatar
sajvanderzeeuw committed
66
## Example
bow's avatar
bow committed
67

Sander Bollen's avatar
Sander Bollen committed
68
69
70
71
Note that one should first create the appropriate [settings config](../general/config.md).
Any supplied sample config will be ignored.

### Example config
Peter van 't Hof's avatar
Peter van 't Hof committed
72
73
74
75
76
77
78
79
80
81

#### Minimal
```json
{
"reference_fasta": "<path/to/reference">,
"output_dir": "<path/to/output/dir">
}
```

#### With options
Sander Bollen's avatar
Sander Bollen committed
82
83
```json
{
Sander Bollen's avatar
Sander Bollen committed
84
"reference_fasta": "<path/to/reference">,
Sander Bollen's avatar
Sander Bollen committed
85
86
87
88
89
90
91
"aligner": "bwa",
"skip_metrics": true,
"platform": "our_platform",
"platform_unit":  "our_unit",
"readgroup_sequencing_center": "our_center",
"readgroup_description": "our_description",
"predicted_insertsize": 300,
Sander Bollen's avatar
Sander Bollen committed
92
93
94
95
96
97
"output_dir": "<path/to/output/dir">
}
```


### Running the pipeline
98

sajvanderzeeuw's avatar
sajvanderzeeuw committed
99
100
For the help menu:
~~~
101
biopet pipeline mapping -h
sajvanderzeeuw's avatar
sajvanderzeeuw committed
102
103

Arguments for Mapping:
Sander Bollen's avatar
Sander Bollen committed
104
105
106
107
108
109
110
111
112
 -R1,--input_r1 <input_r1>             R1 fastq file
 -R2,--input_r2 <input_r2>             R2 fastq file
 -sample,--sampleid <sampleid>         Sample ID
 -library,--libid <libid>              Library ID
 -config,--config_file <config_file>   JSON / YAML config file(s)
 -cv,--config_value <config_value>     Config values, value should be formatted like 'key=value' or
                                       'path:path:key=value'
 -DSC,--disablescatter                 Disable all scatters

sajvanderzeeuw's avatar
sajvanderzeeuw committed
113
~~~
114

sajvanderzeeuw's avatar
sajvanderzeeuw committed
115
116
To run the pipeline:
~~~
117
biopet pipeline mapping -run --config mySettings.json \
Sander Bollen's avatar
Sander Bollen committed
118
-R1 myReads1.fastq -R2 myReads2.fastq
sajvanderzeeuw's avatar
sajvanderzeeuw committed
119
~~~
Sander Bollen's avatar
Sander Bollen committed
120
Note that removing -R2 causes the pipeline to assume single end `.fastq` files.
121

sajvanderzeeuw's avatar
sajvanderzeeuw committed
122
123
124
125
To perform a dry run simply remove `-run` from the commandline call.

----

126
## Result files
sajvanderzeeuw's avatar
sajvanderzeeuw committed
127
128
129
130
131
132
~~~
├── OutDir
    ├── <samplename>-lib_1.dedup.bai
    ├── <samplename>-lib_1.dedup.bam
    ├── <samplename>-lib_1.dedup.metrics
    ├── flexiprep
Peter van 't Hof's avatar
Peter van 't Hof committed
133
134
    ├── metrics
    └── report
sajvanderzeeuw's avatar
sajvanderzeeuw committed
135
~~~
136
137
138
139
140
141

## Getting Help

If you have any questions on running Mapping, suggestions on how to improve the overall flow, or requests for your favorite aligner to be added, feel free to post an issue to our issue tracker at
 [GitHub](https://github.com/biopet/biopet). Or contact us directly via: [SASC email](mailto:SASC@lumc.nl)