Commit 83956764 authored by npappas's avatar npappas

Fixes based on comments

parent 111b9705
......@@ -101,6 +101,12 @@ While optional settings are:
1. `aligner`: which aligner to use (`bwa` or `bowtie`)
2. `macs2`: Here only the callpeak modus is implemented. But one can set all the options from [macs2 callpeak](https://github.com/taoliu/MACS/#call-peaks) in this settings config. Note that the config value is: `macs2_callpeak`
[Gears](gears) is run automatically for the data analysed with `Carp`. There are two levels on which this can be done and this should be specified in the [config](../general/config) file:
*`mapping_to_gears: unmapped` : Unmapped reads after alignment. (default)
*`mapping_to_gears: all` : Trimmed and clipped reads from [Flexiprep](flexiprep).
*`mapping_to_gears: none` : Disable this functionality.
## Configuration for detection of broad peaks (ATAC-seq)
Carp can do broad peak-calling by using the following config:
......
......@@ -3,9 +3,9 @@
## Introduction
Gears is a metagenomics pipeline. (``GE``nome ``A``nnotation of ``R``esidual ``S``equences). It can be used to identify contamination in sequencing runs on either raw FastQ files or BAM files.
In case of BAM file as input, it will extract the unaligned read(pair) sequences for analysis.
It can also be used to analyse sequencid data obtained from metagenomics samples, containing a mix of different organisms. Taxonomic labels will be assigned to the input reads and these will be reported.
It can also be used to analyse sequencing data obtained from metagenomics samples, containing a mix of different organisms. Taxonomic labels will be assigned to the input reads and these will be reported.
The result of the analysis is reported in a [Krona graph](https://github.com/marbl/Krona/wiki), which is visible and navigatable in a web browser.
The result of the analysis is reported in a [Krona graph](https://github.com/marbl/Krona/wiki), which is visible in an interactive way in a web browser.
Pipeline analysis components include:
......@@ -18,8 +18,11 @@ Pipeline analysis components include:
## Gears
This pipeline is used to analyse a group of samples and only accepts fastq files. The fastq files first get trimmed and clipped with [Flexiprep](flexiprep). This can be disabled with the config flags of [Flexiprep](flexiprep). The samples can be specified with a sample config file, see [Config](../general/config).
Gears uses centrifuge by default as its classification engine and the NCBI non-redundant database (nt) as the default database against which taxonomic assignments for the short input reads are made.
This pipeline is used to analyse a group of samples and only accepts fastq files. The fastq files first get trimmed and clipped with [Flexiprep](flexiprep).
This can be disabled with the config flags of [Flexiprep](flexiprep). The samples can be specified with a sample config file, see [Config](../general/config).
`Gears` uses centrifuge by default as its classification engine. An indexed database, created with ```centrifuge-index``` is required to be specified by including ```centrifuge_index: /path/to/index``` in the [config](../general/config) file.
More information on how to build centrifuge databases and indexes can be found [here](https://github.com/infphilo/centrifuge/blob/master/MANUAL.markdown#database-download-and-index-building).
On LUMC's SHARK the NCBI non-redundant database (nt) is used as the default database against which taxonomic assignments for the short input reads are made.
If you would like to use another classifier you should specify so in the [config](../general/config) file. Multiple classifiers can be used in one run, if you wish to have multiple outputs for comparison.
Note that the Centrifuge and Kraken systems can be used for any kind of input sequences (e.g. shotgun or WGS), while the QIIME system is optimized for 16S-based analyses.
......@@ -39,10 +42,12 @@ Note that the Centrifuge and Kraken systems can be used for any kind of input se
To start the pipeline (remove `-run` for a dry run):
``` bash
biopet pipeline Gears -qsub -jobParaEnv BWA -jobQueue all.q -run \
-config mySettings.json -config samples.json
biopet pipeline Gears -run \
-config /path/to/mySettings.json -config /path/to/samples.json
```
Note: If you are using the LUMC High performance Computer cluster (aka SHARK) make sure to include the ```-qsub -jobParaEnv BWA -jobQueue all.q``` flags when invoking the command.
## GearsSingle
This pipeline can be used to analyse a single sample, be it fastq files or a bam file. When a bam file is provided as input only the unmapped reads are extracted and further analysed.
......@@ -52,14 +57,15 @@ This pipeline can be used to analyse a single sample, be it fastq files or a bam
To start the pipeline (remove `-run` for a dry run):
``` bash
biopet pipeline GearsSingle -qsub -jobParaEnv BWA -jobQueue all.q -run \
biopet pipeline GearsSingle -run \
-R1 /path/to/myFirstReadPair -R2 /path/to/mySecondReadPair -sample mySampleName \
-library myLibname -config path/to/mySettings.json
-library myLibname -config /path/to/mySettings.json
```
Note: If you are using the LUMC High performance Computer cluster (aka SHARK) make sure to include the ```-qsub -jobParaEnv BWA -jobQueue all.q``` flags when invoking the command.
### Commandline flags
### Command line flags
For technical reasons, single sample pipelines, such as this pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.
Input files are instead given on the command line as a flag.
Command line flags for Gears are:
......@@ -91,7 +97,7 @@ Please refer [to our mapping pipeline](mapping.md) for information about how the
### Result files
The results of a `Gears` run are organised in two folders: `report` and `samples`.
In the `report` folder, one can find the html report (index.html) displaying the summarized results over all samples and providing a navigation view on the taxonomy graph and (its) result, per sample.
In the `report` folder, one can find the html report (index.html) displaying the summarised results over all samples and providing a navigation view on the taxonomy graph and its result, per sample.
In the `samples` folder, one can find a separate folder for each sample. The individual folders follow the input samples naming and contain the results for each analysis run per sample.
##Example
......@@ -124,10 +130,18 @@ The `Gears`-specific results are contained in a folder named after each tool tha
| *.centrifuge.krkn.json | krakenReportToJson | json | JSON representation of the the taxonomy report |
| *.centrifuge_unique.kreport | centrifuge-kreport | tsv | Kraken-style report of the centrifuge output including taxonomy information for the reads with unique taxonomic assignment |
| *.centrifuge_unique.krkn.json | krakenReportToJson | json | JSON represeantation of the taxonomy report for the uniquely mapped reads |
Kraken specific output
| *.krkn.raw | kraken | tsv | Annotation per sequence |
| *.krkn.full | kraken-report | tsv | List of all annotation possible with counts filled in for this specific sample|
| *.krkn.json | krakenReportToJson | json | JSON representation of the taxonomy report, for postprocessing |
QIIME specific output
| *.otu_table.biom | qiime | biom | Biom file containing counts for OTUs identified in the input |
| *.otu_map.txt | qiime | tsv | Tab-separated file containing information about which samples a taxon has been identified in |
## Getting Help
For questions about this pipeline and suggestions, we have a GitHub page where you can submit your ideas and thoughts .[GitHub](https://github.com/biopet/biopet).
......
......@@ -28,6 +28,13 @@ Please refer [to our mapping pipeline](mapping.md) for information about how the
As with other biopet pipelines, Gentrap relies on a JSON configuration file to run its analyses. There are two important parts here, the configuration for the samples (to determine the sample layout of your experiment) and the configuration for the pipeline settings (to determine which analyses are run).
To get help creating the appropriate [configs](../general/config.md) please refer to the config page in the general section.
[Gears](gears) is run automatically for the data analysed with `Gentrap`. There are two levels on which this can be done and this should be specified in the [config](../general/config) file:
*`mapping_to_gears: unmapped` : Unmapped reads after alignment. (default)
*`mapping_to_gears: all` : Trimmed and clipped reads from [Flexiprep](flexiprep).
*`mapping_to_gears: none` : Disable this functionality.
### Sample Configuration
Samples are single experimental units whose expression you want to measure. They usually consist of a single sequencing library, but in some cases (for example when the experiment demands each sample have a minimum library depth) a single sample may contain multiple sequencing libraries as well. All this is can be configured using the correct JSON nesting, with the following pattern:
......
......@@ -52,6 +52,12 @@ biopet pipeline shiva -config MySamples.json -config MySettings.json -run
A dry run can be performed by simply removing the `-run` flag from the command line call.
[Gears](gears) is run automatically for the data analysed with `Shiva`. There are two levels on which this can be done and this should be specified in the [config](../general/config) file:
*`mapping_to_gears: unmapped` : Unmapped reads after alignment. (default)
*`mapping_to_gears: all` : Trimmed and clipped reads from [Flexiprep](flexiprep).
*`mapping_to_gears: none` : Disable this functionality.
### Only variant calling
It is possible to run Shiva while only performing its variant calling steps.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment