Skip to content
Snippets Groups Projects
Commit bff41e20 authored by Sander Bollen's avatar Sander Bollen
Browse files

Rewrote mapping documentation

parent 5d2725e2
No related branches found
No related tags found
No related merge requests found
......@@ -17,46 +17,94 @@ After the QC, the pipeline simply maps the reads with the chosen aligner. The re
* <a href="https://github.com/alexdobin/STAR" target="_blank">Star-2pass</a>
* <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>
## Configuration and flags
For technical reasons, single sample pipelines, such as this mapping pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.
Command line flags for the mapping pipeline are:
| Flag (short)| Flag (long) | Type | Function |
| ------------ | ----------- | ---- | -------- |
| -R1 | --input_r1 | Path (**required**) | Path to input fastq file |
| -R2 | --input_r2 | Path (optional) | Path to second read pair fastq file. |
| -sample | --sampleid | String (**required**) | Name of sample |
| -library | --libid | String (**required**) | Name of library |
If `-R2` is given, the pipeline will assume a paired-end setup.
### Config
All other values should be provided in the config. Specific config values towards the mapping pipeline are:
| Name | Type | Function |
| ---- | ---- | -------- |
| aligner | String (optional) | Which aligner to use. Defaults to `bwa-mem` |
| skipflexiprep | Boolean (optional) | Whether to skip the flexiprep QC step (default = False) |
| skipmarkduplicates | Boolean (optional) | Whether to skip the Picard Markduplicates step (default = False) |
| skipmetrics | Boolean (optional) | Whether to skip the metrics gathering step (defualt = False) |
| reference | Path (**required**) | Path to indexed fasta file to be used as reference |
| rgid | String (**required**) | Readgroup ID |
| rglb | String (**required**) | Readgroup Library |
| rgpl | String (**required**) | Readgroup Platform |
| rgpu | String (**required**) | Readgroup platform unit |
| rgsm | String (**required**) | Readgroup sample |
| rgcn | String (**required**) | Readgroup sequencing center |
| rgds | String (**required**) | Readgroup description |
| rgdt | ISO8601 date (**required**) | Readgroup sequencing date |
| rgpi | Integer (**required**) | Readgroup predicted insert size |
It is possible to provide any config value as a command line argument as well, using the `-cv` flag.
E.g. `-cv reference=<path/to/reference>` would set value `reference`.
## Example
Note that one should first create the appropriate [configs](../general/config.md).
Note that one should first create the appropriate [settings config](../general/config.md).
Any supplied sample config will be ignored.
### Example config
```json
{
"reference": "<path/to/reference">,
"aligner": "bwa-mem",
"skipmetrics": True,
"rgid" : "our_id",
"rglb": "our_lib",
"rgpl": "our_platform",
"rgpu": "our_unit",
"rgsm": "our_sample",
"rgcn": "our_center",
"rgds": "our_description",
"rgdt": "2015-05-28",
"rgpi": 300,
"output_dir": "<path/to/output/dir">
}
```
### Running the pipeline
For the help menu:
~~~
java -jar </path/to/biopet.jar> pipeline mapping -h
Arguments for Mapping:
-R1,--input_r1 <input_r1> R1 fastq file
-outDir,--output_directory <output_directory> Output directory
-R2,--input_r2 <input_r2> R2 fastq file
-outputName,--outputname <outputname> Output name
-skipflexiprep,--skipflexiprep Skip flexiprep
-skipmarkduplicates,--skipmarkduplicates Skip mark duplicates
-skipmetrics,--skipmetrics Skip metrics
-ALN,--aligner <aligner> Aligner
-R,--reference <reference> Reference
-chunking,--chunking Chunking
-numberChunks,--numberchunks <numberchunks> Number of chunks, if not defined pipeline will automatically calculate the number of chunks
-RGID,--rgid <rgid> Readgroup ID
-RGLB,--rglb <rglb> Readgroup Library
-RGPL,--rgpl <rgpl> Readgroup Platform
-RGPU,--rgpu <rgpu> Readgroup platform unit
-RGSM,--rgsm <rgsm> Readgroup sample
-RGCN,--rgcn <rgcn> Readgroup sequencing center
-RGDS,--rgds <rgds> Readgroup description
-RGDT,--rgdt <rgdt> Readgroup sequencing date
-RGPI,--rgpi <rgpi> Readgroup predicted insert size
-config,--config_file <config_file> JSON config file(s)
-DSC,--disablescatterdefault Disable all scatters
-R1,--input_r1 <input_r1> R1 fastq file
-R2,--input_r2 <input_r2> R2 fastq file
-sample,--sampleid <sampleid> Sample ID
-library,--libid <libid> Library ID
-config,--config_file <config_file> JSON / YAML config file(s)
-cv,--config_value <config_value> Config values, value should be formatted like 'key=value' or
'path:path:key=value'
-DSC,--disablescatter Disable all scatters
~~~
To run the pipeline:
~~~
java -jar </path/to/biopet.jar> pipeline mapping -run --config mySettings.json \
-R1 myReads1.fastq -R2 myReads2.fastq -outDir myOutDir -OutputName myReadsOutput \
-R hg19.fasta -RGSM mySampleName -RGLB myLib1
-R1 myReads1.fastq -R2 myReads2.fastq
~~~
Note that removing -R2 causes the pipeline to be able of handlind single end `.fastq` files.
Note that removing -R2 causes the pipeline to assume single end `.fastq` files.
To perform a dry run simply remove `-run` from the commandline call.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment