It is usually a good idea to do the real run using `screen` or `nohup` to prevent the job from terminating when you log out of SHARK. In practice, using `biopet` as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files [here](general/config.md). For the specific structure that each pipeline accepts, please consult the respective pipeline page.
It is usually a good idea to do the real run using `screen` or `nohup` to prevent the job from terminating when you log out of SHARK. In practice, using `biopet` as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files [here](docs/config.md). For the specific structure that each pipeline accepts, please consult the respective pipeline page.
### Running Biopet in your own computer
...
...
@@ -55,25 +55,25 @@ At the moment, we do not provide links to download the Biopet package. If you ar
## Contributing to Biopet
Biopet is based on the Queue framework developed by the Broad Institute as part of their Genome Analysis Toolkit (GATK) framework. The current Biopet release is based on the GATK 3.3 release.
Biopet is based on the Queue framework developed by the Broad Institute as part of their Genome Analysis Toolkit (GATK) framework. The current Biopet release is based on the GATK 3.4 release.
We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://git.lumc.nl/biopet/biopet](https://git.lumc.nl/biopet/biopet/issues), along with our issue tracker.
We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://github.com/biopet/biopet](https://github.com/biopet/biopet/issues), along with our issue tracker.
## Local development setup
To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.3 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first.
To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.4 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first.
~~~
$ git clone https://github.com/broadgsa/gatk
$ cd gatk
$ git checkout 3.3 # the current release is based on GATK 3.3
$ git checkout 3.4 # the current release is based on GATK 3.4
$ mvn -U clean install
~~~
This will install all the required dependencies to your local maven repository. After this is done, you can clone our repository and test if everything builds fine:
~~~
$ git clone git@git.lumc.nl:biopet/biopet.git
$ git clone https://github.com/biopet/biopet.git
$ cd biopet
$ mvn -U clean install
~~~
...
...
@@ -83,8 +83,8 @@ If everything builds fine, you're good to go! Otherwise, don't hesitate to conta
In `biopet/public/mypipeline/src/main/scala/nl/lumc/sasc/biopet/pipelines/mypipeline` create a file named `HelloPipeline.scala` with the following contents:
/** Only required when using [[SummaryQScript]] */
defsummaryFiles:Map[String, File]=Map()
/** Only required when using [[SummaryQScript]] */
defsummarySettings=Map()
// This method can be used to initialize some classes where needed
definit():Unit={
}
// This method is the actual pipeline
defbiopetScript:Unit={
// Executing a tool like FastQC, calling the extension in `nl.lumc.sasc.biopet.extensions.Fastqc`
valfastqc=newFastqc(this)
fastqc.fastqfile=config("fastqc_input")
fastqc.output=newFile(outputDir,"fastqc.txt")
add(fastqc)
}
}
objectHelloPipelineextendsPipelineCommand
```
Looking at the pipeline, you can see that it inherits from `QScript`. `QScript` is the fundamental class which gives access to the Queue scheduling system. In addition `SummaryQScript` (trait) will add another layer of functions which provides functions to handle and create summary files from pipeline output.
`class HelloPipeline(val root: Configurable`, our pipeline is called HelloPipeline and is taking a `root` with configuration options passed down to Biopet via a JSON specified on the commandline (--config).
```
def biopetScript: Unit = {
}
```
One can start adding pipeline components in `biopetScript`, this is the programmatically equivalent to the `main` method in most popular programming languages. For example, adding a QC tool to the pipeline like `FastQC`. Look at the example shown above.
Setting up the pipeline is done within the pipeline itself, fine-tuning is always possible by overriding in the following way:
```
val fastqc = new Fastqc(this)
fastqc.fastqfile = config("fastqc_input")
fastqc.output = new File(outputDir, "fastqc.txt")
// change kmers settings to 9, wrap with `Some()` because `fastqc.kmers` is a `Option` value.
fastqc.kmers = Some(9)
add(fastqc)
```
### Config setup
For our new pipeline, one should setup the (default) config options.
Since our pipeline is called `HelloPipeline`, the root of the configoptions will called `hellopipeline` (lowercaps).
In order to use this tool within biopet, one should write an `extension` for the tool. (as we also do for normal executables like `bwa-mem`)
The wrapper would look like this, basically exposing the same command line arguments to biopet in an OOP format.
Note: we also add some functionalities for getting summary data and passing on to biopet.
The concept of having (extension)-wrappers is to create a black-box service model. One should only know how to interact with the tool without necessarily knowing the internals.