mapping.md 4.95 KB
Newer Older
bow's avatar
bow committed
1
2
3
4
# Mapping

## Introduction

sajvanderzeeuw's avatar
sajvanderzeeuw committed
5
6
7
The mapping pipeline has been created for NGS users who want to align there data with the most commonly used alignment programs.
The pipeline performs a quality control (QC) on the raw fastq files with our [Flexiprep](flexiprep.md) pipeline. 
After the QC, the pipeline simply maps the reads with the chosen aligner. The resulting BAM files will be sorted on coordinates and indexed, for downstream analysis.
8

sajvanderzeeuw's avatar
sajvanderzeeuw committed
9
10
11
12
## Tools for this pipeline:

* [Flexiprep](flexiprep.md)
* Alignment programs:
Peter van 't Hof's avatar
Peter van 't Hof committed
13
14
    * <a href="http://bio-bwa.sourceforge.net/bwa.shtml" target="_blank">Bwa mem</a>
    * <a href="http://bio-bwa.sourceforge.net/bwa.shtml" target="_blank">Bwa aln</a>
sajvanderzeeuw's avatar
sajvanderzeeuw committed
15
16
    * <a href="http://bowtie-bio.sourceforge.net/index.shtml" target="_blank">Bowtie version 1.1.1</a>
    * <a href="http://www.well.ox.ac.uk/project-stampy" target="_blank">Stampy</a>
Peter van 't Hof's avatar
Peter van 't Hof committed
17
18
    * <a href="http://research-pub.gene.com/gmap/" target="_blank">Gsnap</a>
    * <a href="https://ccb.jhu.edu/software/tophat" target="_blank">TopHat</a>
sajvanderzeeuw's avatar
sajvanderzeeuw committed
19
20
21
22
    * <a href="https://github.com/alexdobin/STAR" target="_blank">Star</a>
    * <a href="https://github.com/alexdobin/STAR" target="_blank">Star-2pass</a>
* <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>

Sander Bollen's avatar
Sander Bollen committed
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
## Configuration and flags
For technical reasons, single sample pipelines, such as this mapping pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.

Command line flags for the mapping pipeline are:

| Flag  (short)| Flag (long) | Type | Function |
| ------------ | ----------- | ---- | -------- |
| -R1 | --input_r1 | Path (**required**) | Path to input fastq file |
| -R2 | --input_r2 | Path (optional) | Path to second read pair fastq file. |
| -sample | --sampleid | String (**required**) | Name of sample |
| -library | --libid | String (**required**) | Name of library |

If `-R2` is given, the pipeline will assume a paired-end setup.

### Config

All other values should be provided in the config. Specific config values towards the mapping pipeline are:

| Name | Type | Function |
| ---- | ---- | -------- |
Sander Bollen's avatar
Sander Bollen committed
44
45
46
47
| aligner | String (optional) | Which aligner to use. Defaults to `bwa`. Choose from [`bwa`, `bwa-aln`, `bowtie`, `gsnap`, `tophat`, `stampy`, `star`, `star-2pass`] |
| skip_flexiprep | Boolean (optional) | Whether to skip the flexiprep QC step (default = False) |
| skip_markduplicates | Boolean (optional) | Whether to skip the Picard Markduplicates step (default = False) |
| skip_metrics | Boolean (optional) | Whether to skip the metrics gathering step (default = False) |
Sander Bollen's avatar
Sander Bollen committed
48
| reference_fasta | Path (**required**) | Path to indexed fasta file to be used as reference |
Sander Bollen's avatar
Sander Bollen committed
49
50
51
52
53
| platform | String (optional) | Read group Platform (defaults to `illumina`)|
| platform_unit | String (**required**) | Read group platform unit |
| readgroup_sequencing_center | String (**required**) | Read group sequencing center |
| readgroup_description | String (**required**) | Read group description |
| predicted_insertsize | Integer (**required**) | Read group predicted insert size |
Sander Bollen's avatar
Sander Bollen committed
54
55
56
57

It is possible to provide any config value as a command line argument as well, using the `-cv` flag.
E.g. `-cv reference=<path/to/reference>` would set value `reference`.

sajvanderzeeuw's avatar
sajvanderzeeuw committed
58
## Example
bow's avatar
bow committed
59

Sander Bollen's avatar
Sander Bollen committed
60
61
62
63
Note that one should first create the appropriate [settings config](../general/config.md).
Any supplied sample config will be ignored.

### Example config
Peter van 't Hof's avatar
Peter van 't Hof committed
64
65
66
67
68
69
70
71
72
73

#### Minimal
```json
{
"reference_fasta": "<path/to/reference">,
"output_dir": "<path/to/output/dir">
}
```

#### With options
Sander Bollen's avatar
Sander Bollen committed
74
75
```json
{
Sander Bollen's avatar
Sander Bollen committed
76
"reference_fasta": "<path/to/reference">,
Sander Bollen's avatar
Sander Bollen committed
77
78
79
80
81
82
83
"aligner": "bwa",
"skip_metrics": true,
"platform": "our_platform",
"platform_unit":  "our_unit",
"readgroup_sequencing_center": "our_center",
"readgroup_description": "our_description",
"predicted_insertsize": 300,
Sander Bollen's avatar
Sander Bollen committed
84
85
86
87
88
89
"output_dir": "<path/to/output/dir">
}
```


### Running the pipeline
90

sajvanderzeeuw's avatar
sajvanderzeeuw committed
91
92
For the help menu:
~~~
bow's avatar
bow committed
93
java -jar </path/to/biopet.jar> pipeline mapping -h
sajvanderzeeuw's avatar
sajvanderzeeuw committed
94
95

Arguments for Mapping:
Sander Bollen's avatar
Sander Bollen committed
96
97
98
99
100
101
102
103
104
 -R1,--input_r1 <input_r1>             R1 fastq file
 -R2,--input_r2 <input_r2>             R2 fastq file
 -sample,--sampleid <sampleid>         Sample ID
 -library,--libid <libid>              Library ID
 -config,--config_file <config_file>   JSON / YAML config file(s)
 -cv,--config_value <config_value>     Config values, value should be formatted like 'key=value' or
                                       'path:path:key=value'
 -DSC,--disablescatter                 Disable all scatters

sajvanderzeeuw's avatar
sajvanderzeeuw committed
105
~~~
106

sajvanderzeeuw's avatar
sajvanderzeeuw committed
107
108
To run the pipeline:
~~~
bow's avatar
bow committed
109
java -jar </path/to/biopet.jar> pipeline mapping -run --config mySettings.json \
Sander Bollen's avatar
Sander Bollen committed
110
-R1 myReads1.fastq -R2 myReads2.fastq
sajvanderzeeuw's avatar
sajvanderzeeuw committed
111
~~~
Sander Bollen's avatar
Sander Bollen committed
112
Note that removing -R2 causes the pipeline to assume single end `.fastq` files.
113

sajvanderzeeuw's avatar
sajvanderzeeuw committed
114
115
116
117
To perform a dry run simply remove `-run` from the commandline call.

----

118
## Result files
sajvanderzeeuw's avatar
sajvanderzeeuw committed
119
120
121
122
123
124
~~~
├── OutDir
    ├── <samplename>-lib_1.dedup.bai
    ├── <samplename>-lib_1.dedup.bam
    ├── <samplename>-lib_1.dedup.metrics
    ├── flexiprep
Peter van 't Hof's avatar
Peter van 't Hof committed
125
126
    ├── metrics
    └── report
sajvanderzeeuw's avatar
sajvanderzeeuw committed
127
~~~