Commit 47b5dd69 authored by bow's avatar bow
Browse files

Docs updates

parent e02dc9dc
......@@ -24,7 +24,15 @@ After loading the module, you can access the biopet package by simply typing `bi
$ biopet
~~~
This will show you a list of tools and pipelines that you can use straight away. You can also execute `biopet pipeline` to show only available pipelines or `biopet tool` to show only the tools. Almost all of the pipelines have a common usage pattern with a similar set of flags, for example:
This will show you a list of tools and pipelines that you can use straight away. You can also execute `biopet pipeline` to show only available pipelines or `biopet tool` to show only the tools. What you should be aware of, is that this is actually a shell function that calls `java` on the system-wide available Biopet JAR file.
~~~
$ java -jar /path/to/current/biopet/release.jar
~~~
The actual path will vary from version to version, which is controlled by which module you loaded.
Almost all of the pipelines have a common usage pattern with a similar set of flags, for example:
~~~
$ biopet pipeline shiva -config myconfig.json -qsub -jobParaEnv BWA -retry 2
......@@ -51,7 +59,7 @@ Biopet is based on the Queue framework developed by the Broad Institute as part
We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://git.lumc.nl/biopet/biopet](https://git.lumc.nl/biopet/biopet/issues), along with our issue tracker.
## Setting up your local development environment
## Local development setup
To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.3 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first.
......
## <a href="https://git.lumc.nl/biopet/biopet/tree/develop/protected/basty/src/main/scala/nl/lumc/sasc/biopet/pipelines/basty" target="_blank">Basty</a>
# Basty
# Introduction
## Introduction
A pipeline for aligning bacterial genomes and detect structural variations on the level of SNPs. Basty will output phylogenetic trees.
Which makes it very easy to look at the variations between certain species or strains.
## Tools for this pipeline
### Tools for this pipeline
* [GATK-pipeline](GATK-pipeline.md)
* [BastyGenerateFasta](../tools/BastyGenerateFasta.md)
* <a href="http://sco.h-its.org/exelixis/software.html" target="_blank">RAxml</a>
* <a href="https://github.com/sanger-pathogens/Gubbins" target="_blank">Gubbins</a>
## Requirements
### Requirements
To run for a specific species, please do not forget to create the proper index files.
The index files are created from the supplied reference:
......@@ -22,21 +22,21 @@ The index files are created from the supplied reference:
* ```.idxSpecificForAligner``` (depending on which aligner is used one should create a suitable index specific for that aligner.
Each aligner has his own way of creating index files. Therefore the options for creating the index files can be found inside the aligner itself)
## Example
### Example
#### For the help screen:
##### For the help screen:
~~~
java -jar Biopet.0.2.0.jar pipeline basty -h
~~~
#### Run the pipeline:
##### Run the pipeline:
Note that one should first create the appropriate [configs](../general/config.md).
~~~
java -jar Biopet.0.2.0.jar pipeline basty -run -config MySamples.json -config MySettings.json -outDir myOutDir
~~~
## Result files
### Result files
The output files this pipeline produces are:
* A complete output from [Flexiprep](flexiprep.md)
......@@ -107,7 +107,7 @@ The output files this pipeline produces are:
├── multisample.ug.discovery.vcf.gz
└── multisample.ug.discovery.vcf.gz.tbi
~~~
## Best practice
### Best practice
# References
## References
# Introduction
# Gentrap
Gentrap (generic transcriptome analysis pipeline) is a general data analysis pipelines for quantifying expression levels from RNA-seq libraries generated using the Illumina machines. It was designed to be flexible, providing several aligners and quantification modes to choose from, with optional steps in between. It can be used to run different experiment configurations, from single sample runs to multiple sample runs containing multiple sequencing libraries. It can also do a very simple variant calling (using VarScan).
## Introduction
Gentrap (*generic transcriptome analysis pipeline*) is a general data analysis pipelines for quantifying expression levels from RNA-seq libraries generated using the Illumina machines. It was designed to be flexible, providing several aligners and quantification modes to choose from, with optional steps in between. It can be used to run different experiment configurations, from single sample runs to multiple sample runs containing multiple sequencing libraries. It can also do a very simple variant calling (using VarScan).
At the moment, Gentrap supports the following aligners:
......@@ -16,11 +18,11 @@ and the following quantification modes:
You can also provide a `.refFlat` file containing ribosomal sequence coordinates to measure how many of your libraries originate from ribosomal sequences. Then, you may optionally remove those regions as well.
# Configuration file
## Configuration File
As with other biopet pipelines, Gentrap relies on a JSON configuration file to run its analyses. There are two important parts here, the configuration for the samples (to determine the sample layout of your experiment) and the configuration for the pipeline settings (to determine which analyses are run).
## Sample Configuration
### Sample Configuration
Samples are single experimental units whose expression you want to measure. They usually consist of a single sequencing library, but in some cases (for example when the experiment demands each sample have a minimum library depth) a single sample may contain multiple sequencing libraries as well. All this is can be configured using the correct JSON nesting, with the following pattern:
......@@ -70,7 +72,7 @@ In the example above, there is one sample (named `sample_A`) which contains one
In this case, we have two samples (`sample_X` and `sample_Y`) and `sample_Y` has two different libraries (`lib_one` and `lib_two`). Notice that the names of the samples and libraries may change, but several keys such as `samples`, `libraries`, `R1`, and `R2` remain the same.
## Pipeline Settings Configuration
### Pipeline Settings Configuration
For the pipeline settings, there are some values that you need to specify while some are optional. Required settings are:
......@@ -107,3 +109,28 @@ Thus, an example settings configuration is as follows:
}
}
~~~
## Running Gentrap
As with other pipelines in the Biopet suite, gentrap can be run by specifying the pipeline after the `pipeline` subcommand:
~~~
java -jar /path/to/biopet.jar pipeline gentrap -config /path/to/config.json -qsub -jobParaEnv BWA -run
~~~
If you already have the `biopet` environment module loaded, you can also simply call `biopet`:
~~~
biopet pipeline gentrap -config /path/to/config.json -qsub -jobParaEnv BWA -run
~~~
It is also a good idea to specify retries (we recomend `-retry 3` up to `-retry 5`) so that cluster glitches do not interfere with your pipeline runs.
## Output Files
The number and types of output files depend on your run configuration. What you can always expect, however, is that there will be a summary JSON file of your run called `gentrap.summary.json` and a PDF report in a `report` folder called `gentrap_report.pdf`. The summary file contains files and statistics specific to the current run, which is meant for cases when you wish to do further processing with your Gentrap run (for example, plotting some figures), while the PDF report provides a quick overview of your run results.
## Getting Help
If you have any questions on running Gentrap, suggestions on how to improve the overall flow, or requests for your favorite RNA-seq related program to be added, feel free to post an issue to our issue tracker at [https://git.lumc.nl/biopet/biopet/issues](https://git.lumc.nl/biopet/biopet/issues).
# Release notes BioPet version 0.3.0
# Release notes Biopet version 0.3.0
Since our first release in December 2014 many new functions have been added to the pipelines including:
- Multisample compatibility
- Copy number analysis
- Structural variants analysis
- Full RNA-seq pipeline
- Annotation pipeline ( still under development )
- Summary framework
- Md5sum of all input/output files ( need to be provided in the summary )
- Sequence stats
- Program stats/versions
- mapping stats
- Tool stats ( if pipeline uses a biopet tool, it will output the version of the tool and all other statistics that might be captured )
- GATK variantcalling has a lot of new features and is now called SHIVA
- A entire new pipeline named Gentrap based on our previous [Makefile version](http://sasc-server.lumcnet.prod.intern/pipelines/makefile-0.6.0/gentrap/), with extra features like:
- remove all ribosomal reads
- a tool for building the correct annotation for read.counting etc etc.
- multi sample runs
- GATK VariantCalling has a lot of new features and is now called Shiva
- Annotation pipeline ( development version )
Also a impressive list of tools have been added to the updated framework:
......@@ -24,6 +21,7 @@ Also a impressive list of tools have been added to the updated framework:
- ExtractAlignedFastq
- FastqSplitter
- FastqSync
- MergeTables
- SamplesTsvToJson
- Seqstat ( this is a lift over tool based on our previous python implementation of seqstat )
- VEPNormalizer ( This normalizer enables a user to parse VEP output VCFs to the exact specs of [VCF 4.1](https://samtools.github.io/hts-specs/VCFv4.1.pdf) )
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment