Commit a4fb8a3d authored by bow's avatar bow
Browse files

Update index page

parent d16ce249
# Welcome to Biopet
###### (Bio Pipeline Execution Tool)
## Introduction
Biopet is an abbreviation of ( Bio Pipeline Execution Tool ) and packages several functionalities:
1. Tools for working on sequencing data
1. Pipelines to do analysis on sequencing data
1. Running analysis on a computing cluster ( Open Grid Engine )
1. Running analysis on your local desktop computer
## Introduction
### System Requirements
Biopet (Bio Pipeline Execution Toolkit) is the main pipeline development framework of the LUMC Sequencing Analysis Support Core team. It contains our main pipelines and some of the command line tools we develop in-house. It is meant to be used in the main [SHARK](https://humgenprojects.lumc.nl/trac/shark) computing cluster. While usage outside of SHARK is technically possible, some adjustments may need to be made in order to do so.
Biopet is build on top of GATK Queue, which requires having `java 7 JVM` installed on the analysis machine(s).
For end-users:
## Quick Start
* [Java 7 JVM](http://www.oracle.com/technetwork/java/javase/downloads/index.html) or [OpenJDK 7](http://openjdk.java.net/install/)
* [Cran R 3.1.1](http://cran.r-project.org/)
* [GATK](https://www.broadinstitute.org/gatk/download)
### Running Biopet in the SHARK cluster
For developers:
Biopet is available as a JAR package in SHARK. The easiest way to start using it is to activate the `biopet` environment module, which sets useful aliases and environment variables:
* [OpenJDK 7](http://openjdk.java.net/install/)
* [Cran R 3.1.1](http://cran.r-project.org/)
* [Maven 3.2](http://maven.apache.org/download.cgi)
* [GATK + Queue](https://www.broadinstitute.org/gatk/download)
* [IntelliJ](https://www.jetbrains.com/idea/) or [Netbeans > 8.0](https://netbeans.org/)
~~~
$ module load biopet/v0.3.0
~~~
## How to use
With each Biopet release, an accompanying environment module is also released. The latest release is version 0.3.0, thus `biopet/v0.3.0` is the module you would want to load.
### Running a pipeline
After loading the module, you can access the biopet package by simply typing `biopet`:
- Help:
~~~
java -jar Biopet(version).jar (pipeline of interest) -h
~~~
- Local:
$ biopet
~~~
java -jar Biopet(version).jar (pipeline of interest) (pipeline options) -run
~~~
- Cluster:
- Note that `-qsub` is cluster specific (SunGrid Engine)
~~~
java -jar Biopet(version).jar (pipeline of interest) (pipeline options) -qsub* -jobParaEnv YoureParallelEnv -run
~~~
- DryRun:
- A dry run can be performed to see if the scheduling and creating of the pipelines jobs performs well. Nothing will be executed only the job commands are created. If this succeeds it's a good indication you actual run will be successful as well.
- Each pipeline can be found as an options inside the jar file Biopet[version].jar which is located in the target directory and can be started with `java -jar <pipelineJarFile>`
This will show you a list of tools and pipelines that you can use straight away. You can also execute `biopet pipeline` to show only available pipelines or `biopet tool` to show only the tools. Almost all of the pipelines have a common usage pattern with a similar set of flags, for example:
~~~
java -jar Biopet(version).jar (pipeline of interest) (pipeline options)
$ biopet pipeline shiva -config myconfig.json -qsub -jobParaEnv BWA -retry 2
~~~
If one performs a dry run the config report will be generated. From this config report you can identify all configurable options.
### Shark Compute Cluster specific
In the SHARK compute cluster, a module is available to load the necessary dependencies.
$ module load biopet/v0.3.0
The command above will do a *dry* run of the Shiva pipeline (one of our variant calling pipeline) using the config file `myconfig.json` as if the command would be submitted to the SHARK cluster (the `-qsub` flag) to the `BWA` parallel environment (the `-jobParaEnv BWA` flag). We also set the maximum retry of failing jobs to two times (via the `-retry 2` flag). Doing a good run is a good idea to ensure that your real run proceeds smoothly. It may not catch all the errors, but if the dry run fails you can be sure that the real run will never succeed.
Using this option, the `java -jar Biopet-<version>.jar` can be ommited and `biopet` can be started using:
If the dry run proceeds without problems, you can then do the real run by using the `-run` flag:
$ biopet
### Running pipelines
$ biopet pipeline <pipeline_name>
~~~
$ biopet pipeline shiva -config myconfig.json -qsub -jobParaEnv BWA -retry 2 -run
~~~
- [Flexiprep](pipelines/flexiprep)
- [Mapping](pipelines/mapping)
- [Gatk Variantcalling](https://git.lumc.nl/biopet/biopet/wikis/GATK-Variantcalling-Pipeline)
- [Gentrap](pipelines/gentrap)
- [Sage](pipelines/sage)
- Bam2Wig
- BamMetrics
- Carp
- ConiferPipeline
- Toucan
- Yamsvp
- Basty
Note that all pipelines are still in the experimental phase and therefore one needs to be careful interpreting the results and stability of the pipelines
It is usually a good idea to do the real run using `screen` or `nohup` to prevent the job from terminating when you log out of SHARK. In practice, using `biopet` as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files [here](general/config.md). For the specific structure that each pipeline accepts, please consult the respective pipeline page.
- GatkBenchmarkGenotyping
- GatkGenotyping
- GatkPipeline
- GatkVariantRecalibration
- GatkVcfSampleCompare
- Yamsvp (Under development)
### Running Biopet in your own computer
__Note that each pipeline needs a config file written in JSON format see [config](general/config.md) & [How To! Config](https://git.lumc.nl/biopet/biopet/wikis/Config) __
At the moment, we do not provide links to download the Biopet package. If you are interested in trying out Biopet locally, please contact us as [sasc@lumc.nl](mailto:sasc@lumc.nl).
There are multiple configs that can be passed to a pipeline, for example the sample, settings and executables where from sample and settings are mandatory.
## Contributing to Biopet
- [Here](general/config.md) one can find how to create a sample and settings config
- More info can be found here: [How To! Config](https://git.lumc.nl/biopet/biopet/wikis/Config)
Biopet is based on the Queue framework developed by the Broad Institute as part of their Genome Analysis Toolkit (GATK) framework. The current Biopet release is based on the GATK 3.3 release.
### Running a tool
We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://git.lumc.nl/biopet/biopet](https://git.lumc.nl/biopet/biopet/issues), along with our issue tracker.
$ biopet tool <tool_name>
## Setting up your local development environment
- AnnotateVcfWithBed
- BastyGenerateFasta
- BedToInterval
- BedtoolsCoverageToCounts
- BiopetFlagstat
- CheckAllelesVcfInBam
- ExtractAlignedFastq
- FastqSplitter
- FastqSync
- FindRepeatsPacBio
- MergeAlleles
- MpileupToVcf
- SageCountFastq
- SageCreateLibrary
- SageCreateTagCounts
- SamplesTsvToJson
- Seqstat
- VEPNormalizer
- VcfFilter
- VcfStats
- VcfToTsv
- WipeReads
To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.3 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first.
~~~
$ git clone https://github.com/broadgsa/gatk
$ cd gatk
$ git checkout 3.3 # the current release is based on GATK 3.3
$ mvn -U clean install
~~~
Each tool has its own help screen and can be accessed like this:
$ biopet tool <tool_name> -h
This will install all the required dependencies to your local maven repository. After this is done, you can clone our repository and test if everything builds fine:
## Developers
~~~
$ git clone git@git.lumc.nl:biopet/biopet.git
$ cd biopet
$ mvn -U clean install
~~~
### Compiling Biopet
If everything builds fine, you're good to go! Otherwise, don't hesitate to contact us or file an issue at our issue tracker.
1. Clone biopet with `git clone git@git.lumc.nl:biopet/biopet.git biopet`
2. Go to biopet directory
3. run mvn_install_queue.sh, this install queue jars into the local maven repository
4. alternatively download the `queue.jar` from the GATK website
5. run `mvn verify` to compile and package or do `mvn install` to install the jars also in local maven repository
## About
Go to the [about page](about)
## License
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment