flexiprep.md 5.89 KB
Newer Older
sajvanderzeeuw's avatar
sajvanderzeeuw committed
1
# Flexiprep
2

sajvanderzeeuw's avatar
sajvanderzeeuw committed
3
## Introduction
Sander Bollen's avatar
Sander Bollen committed
4
5
6
7
8
Flexiprep is our quality control pipeline. This pipeline checks for possible barcode contamination, clips reads, trims reads and runs
the <a href="http://www.bioinformatics.babraham.ac.uk/projects/fastqc/" target="_blank">Fastqc</a> tool.
Adapter clipping is performed by <a href="https://github.com/marcelm/cutadapt" target="_blank">Cutadapt</a>.
For quality trimming we use <a href="https://github.com/najoshi/sickle" target="_blank">Sickle</a>.
Flexiprep works on `.fastq` files.
sajvanderzeeuw's avatar
sajvanderzeeuw committed
9
10


sajvanderzeeuw's avatar
sajvanderzeeuw committed
11
## Example
sajvanderzeeuw's avatar
sajvanderzeeuw committed
12

sajvanderzeeuw's avatar
sajvanderzeeuw committed
13
14
To get the help menu:
~~~
Sander Bollen's avatar
Sander Bollen committed
15
16
java -jar </path/to/biopet.jar> pipeline Flexiprep -h

sajvanderzeeuw's avatar
sajvanderzeeuw committed
17
Arguments for Flexiprep:
Sander Bollen's avatar
Sander Bollen committed
18
19
20
21
22
23
 -R1,--input_r1 <input_r1>             R1 fastq file (gzipped allowed)
 -R2,--input_r2 <input_r2>             R2 fastq file (gzipped allowed)
 -sample,--sampleid <sampleid>         Sample ID
 -library,--libid <libid>              Library ID
 -config,--config_file <config_file>   JSON config file(s)
 -DSC,--disablescatter                 Disable all scatters
sajvanderzeeuw's avatar
sajvanderzeeuw committed
24
~~~
sajvanderzeeuw's avatar
sajvanderzeeuw committed
25

Sander Bollen's avatar
Sander Bollen committed
26
Note that the pipeline also works on unpaired reads where one should only provide R1.
sajvanderzeeuw's avatar
sajvanderzeeuw committed
27
28


sajvanderzeeuw's avatar
sajvanderzeeuw committed
29
30
31
32
33
34
To start the pipeline (remove `-run` for a dry run):
~~~bash
java -jar Biopet-0.2.0.jar pipeline Flexiprep -run -outDir myDir \
-R1 myFirstReadPair -R2 mySecondReadPair -sample mySampleName \
-library myLibname -config mySettings.json
~~~
sajvanderzeeuw's avatar
sajvanderzeeuw committed
35

Sander Bollen's avatar
Sander Bollen committed
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

## Configuration and flags
For technical reasons, single sample pipelines, such as this pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.

Command line flags for Flexiprep are:

| Flag  (short)| Flag (long) | Type | Function |
| ------------ | ----------- | ---- | -------- |
| -R1 | --input_r1 | Path (**required**) | Path to input fastq file |
| -R2 | --input_r2 | Path (optional) | Path to second read pair fastq file. |
| -sample | --sampleid | String (**required**) | Name of sample |
| -library | --libid | String (**required**) | Name of library |

If `-R2` is given, the pipeline will assume a paired-end setup.

### Config

All other values should be provided in the config. Specific config values towards the mapping pipeline are:

| Name | Type | Function |
| ---- | ---- | -------- |
Peter van 't Hof's avatar
Peter van 't Hof committed
58
59
| skiptrim | Boolean | Default false, if true the trimming step is skipped |
| skipclip | Boolean | Default false, if true the clipping step is skipped |
Sander Bollen's avatar
Sander Bollen committed
60

61
## Result files
Sander Bollen's avatar
Sander Bollen committed
62
63
The results from this pipeline will be a fastq file.
The pipeline also outputs 2 Fastqc runs one before and one after quality control.
sajvanderzeeuw's avatar
sajvanderzeeuw committed
64

65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
### Example output

~~~
.
├── mySample_01.qc.summary.json
├── mySample_01.qc.summary.json.out
├── mySample_01.R1.contams.txt
├── mySample_01.R1.fastqc
│   ├── mySample_01.R1_fastqc
│   │   ├── fastqc_data.txt
│   │   ├── fastqc_report.html
│   │   ├── Icons
│   │   │   ├── error.png
│   │   │   ├── fastqc_icon.png
│   │   │   ├── tick.png
│   │   │   └── warning.png
│   │   ├── Images
│   │   │   └── warning.png
│   │   ├── Images
│   │   │   ├── duplication_levels.png
│   │   │   ├── kmer_profiles.png
│   │   │   ├── per_base_gc_content.png
│   │   │   ├── per_base_n_content.png
│   │   │   ├── per_base_quality.png
│   │   │   ├── per_base_sequence_content.png
│   │   │   ├── per_sequence_gc_content.png
│   │   │   ├── per_sequence_quality.png
│   │   │   └── sequence_length_distribution.png
│   │   └── summary.txt
│   └── mySample_01.R1.qc_fastqc.zip
├── mySample_01.R1.qc.fastq.gz
├── mySample_01.R1.qc.fastq.gz.md5
├── mySample_01.R2.contams.txt
├── mySample_01.R2.fastqc
│   ├── mySample_01.R2_fastqc
│   │   ├── fastqc_data.txt
│   │   ├── fastqc_report.html
│   │   ├── Icons
│   │   │   ├── error.png
│   │   │   ├── fastqc_icon.png
│   │   │   ├── tick.png
│   │   │   └── warning.png
│   │   ├── Images
│   │   │   ├── duplication_levels.png
│   │   │   ├── kmer_profiles.png
│   │   │   ├── per_base_gc_content.png
│   │   │   ├── per_base_n_content.png
│   │   │   ├── per_base_quality.png
│   │   │   ├── per_base_sequence_content.png
│   │   │   ├── per_sequence_gc_content.png
│   │   │   ├── per_sequence_quality.png
│   │   │   └── sequence_length_distribution.png
│   │   └── summary.txt
│   └── mySample_01.R2_fastqc.zip
├── mySample_01.R2.fastq.md5
├── mySample_01.R2.qc.fastqc
│   ├── mySample_01.R2.qc_fastqc
│   │   ├── fastqc_data.txt
│   │   ├── fastqc_report.html
│   │   ├── Icons
│   │   │   ├── error.png
│   │   │   ├── fastqc_icon.png
│   │   │   ├── tick.png
│   │   │   └── warning.png
│   │   ├── Images
│   │   │   ├── duplication_levels.png
│   │   │   ├── kmer_profiles.png
│   │   │   ├── per_base_gc_content.png
│   │   │   ├── per_base_n_content.png
│   │   │   ├── per_base_quality.png
│   │   │   ├── per_base_sequence_content.png
│   │   │   ├── per_sequence_gc_content.png
│   │   │   ├── per_sequence_quality.png
│   │   │   └── sequence_length_distribution.png
│   │   └── summary.txt
│   └── mySample_01.R2.qc_fastqc.zip
├── mySample_01.R2.qc.fastq.gz
Peter van 't Hof's avatar
Peter van 't Hof committed
142
143
144
├── mySample_01.R2.qc.fastq.gz.md5
└── report

145
~~~