Commit d167539f authored by Peter van 't Hof's avatar Peter van 't Hof
Browse files

Merge branch 'develop' into feature-docs-0.5.0

parents abd19ee6 ba0cf48a
......@@ -46,7 +46,7 @@ If the dry run proceeds without problems, you can then do the real run by using
$ biopet pipeline <pipeline_name> -config <path/to/config.json> -qsub -jobParaEnv BWA -retry 2 -run
~~~
It is usually a good idea to do the real run using `screen` or `nohup` to prevent the job from terminating when you log out of SHARK. In practice, using `biopet` as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files [here](general/config.md). For the specific structure that each pipeline accepts, please consult the respective pipeline page.
It is usually a good idea to do the real run using `screen` or `nohup` to prevent the job from terminating when you log out of SHARK. In practice, using `biopet` as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files [here](docs/config.md). For the specific structure that each pipeline accepts, please consult the respective pipeline page.
### Running Biopet in your own computer
......@@ -55,25 +55,25 @@ At the moment, we do not provide links to download the Biopet package. If you ar
## Contributing to Biopet
Biopet is based on the Queue framework developed by the Broad Institute as part of their Genome Analysis Toolkit (GATK) framework. The current Biopet release is based on the GATK 3.3 release.
Biopet is based on the Queue framework developed by the Broad Institute as part of their Genome Analysis Toolkit (GATK) framework. The current Biopet release is based on the GATK 3.4 release.
We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://git.lumc.nl/biopet/biopet](https://git.lumc.nl/biopet/biopet/issues), along with our issue tracker.
We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://github.com/biopet/biopet](https://github.com/biopet/biopet/issues), along with our issue tracker.
## Local development setup
To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.3 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first.
To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.4 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first.
~~~
$ git clone https://github.com/broadgsa/gatk
$ cd gatk
$ git checkout 3.3 # the current release is based on GATK 3.3
$ git checkout 3.4 # the current release is based on GATK 3.4
$ mvn -U clean install
~~~
This will install all the required dependencies to your local maven repository. After this is done, you can clone our repository and test if everything builds fine:
~~~
$ git clone git@git.lumc.nl:biopet/biopet.git
$ git clone https://github.com/biopet/biopet.git
$ cd biopet
$ mvn -U clean install
~~~
......@@ -83,8 +83,8 @@ If everything builds fine, you're good to go! Otherwise, don't hesitate to conta
## About
Go to the [about page](about)
Go to the [about page](docs/about.md)
## License
See: [License](license.md)
See: [License](docs/license.md)
#!/bin/bash
DIR=`readlink -f \`dirname $0\``
cp -r $DIR/../*/*/src/* $DIR/src
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>BiopetRoot</artifactId>
<groupId>nl.lumc.sasc</groupId>
<version>0.5.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>BiopetAggregate</artifactId>
<dependencies>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>6.8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<version>1.9.5</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.10</artifactId>
<version>2.2.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>nl.lumc.sasc</groupId>
<artifactId>BiopetProtectedPackage</artifactId>
<version>0.5.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>18.0</version>
</dependency>
</dependencies>
</project>
\ No newline at end of file
#!/bin/bash
DIR=`readlink -f \`dirname $0\``
rm -r $DIR/src/main $DIR/src/test
### System Requirements
Biopet is build on top of GATK Queue, which requires having `java` installed on the analysis machine(s).
For end-users:
* [Java 7 JVM](http://www.oracle.com/technetwork/java/javase/downloads/index.html) or [OpenJDK 7](http://openjdk.java.net/install/)
* [Cran R 2.15.3](http://cran.r-project.org/)
For developers:
* [OpenJDK 7](http://openjdk.java.net/install/)
* Minimum of 4 GB RAM {todo: provide more accurate estimation on building}
* Maven 3
* Compiled and installed version 3.4 of [GATK + Queue](https://github.com/broadgsa/gatk-protected/) in your maven repository.
* IntelliJ or Netbeans 8.0 for development
../README.md
\ No newline at end of file
# Welcome to Biopet
## Introduction
Biopet (Bio Pipeline Execution Toolkit) is the main pipeline development framework of the LUMC Sequencing Analysis Support Core team. It contains our main pipelines and some of the command line tools we develop in-house. It is meant to be used in the main [SHARK](https://humgenprojects.lumc.nl/trac/shark) computing cluster. While usage outside of SHARK is technically possible, some adjustments may need to be made in order to do so.
## Quick Start
### Running Biopet in the SHARK cluster
Biopet is available as a JAR package in SHARK. The easiest way to start using it is to activate the `biopet` environment module, which sets useful aliases and environment variables:
~~~
$ module load biopet/v0.4.0
~~~
With each Biopet release, an accompanying environment module is also released. The latest release is version 0.4.0, thus `biopet/v0.4.0` is the module you would want to load.
After loading the module, you can access the biopet package by simply typing `biopet`:
~~~
$ biopet
~~~
This will show you a list of tools and pipelines that you can use straight away. You can also execute `biopet pipeline` to show only available pipelines or `biopet tool` to show only the tools. What you should be aware of, is that this is actually a shell function that calls `java` on the system-wide available Biopet JAR file.
~~~
$ java -jar <path/to/current/biopet/release.jar>
~~~
The actual path will vary from version to version, which is controlled by which module you loaded.
Almost all of the pipelines have a common usage pattern with a similar set of flags, for example:
~~~
$ biopet pipeline <pipeline_name> -config <path/to/config.json> -qsub -jobParaEnv BWA -retry 2
~~~
The command above will do a *dry* run of a pipeline using a config file as if the command would be submitted to the SHARK cluster (the `-qsub` flag) to the `BWA` parallel environment (the `-jobParaEnv BWA` flag). We also set the maximum retry of failing jobs to two times (via the `-retry 2` flag). Doing a good run is a good idea to ensure that your real run proceeds smoothly. It may not catch all the errors, but if the dry run fails you can be sure that the real run will never succeed.
If the dry run proceeds without problems, you can then do the real run by using the `-run` flag:
~~~
$ biopet pipeline <pipeline_name> -config <path/to/config.json> -qsub -jobParaEnv BWA -retry 2 -run
~~~
It is usually a good idea to do the real run using `screen` or `nohup` to prevent the job from terminating when you log out of SHARK. In practice, using `biopet` as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files [here](general/config.md). For the specific structure that each pipeline accepts, please consult the respective pipeline page.
### Running Biopet in your own computer
At the moment, we do not provide links to download the Biopet package. If you are interested in trying out Biopet locally, please contact us as [sasc@lumc.nl](mailto:sasc@lumc.nl).
## Contributing to Biopet
Biopet is based on the Queue framework developed by the Broad Institute as part of their Genome Analysis Toolkit (GATK) framework. The current Biopet release is based on the GATK 3.4 release.
We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://github.com/biopet/biopet](https://github.com/biopet/biopet/issues), along with our issue tracker.
## Local development setup
To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.4 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first.
~~~
$ git clone https://github.com/broadgsa/gatk
$ cd gatk
$ git checkout 3.4 # the current release is based on GATK 3.4
$ mvn -U clean install
~~~
This will install all the required dependencies to your local maven repository. After this is done, you can clone our repository and test if everything builds fine:
~~~
$ git clone https://github.com/biopet/biopet.git
$ cd biopet
$ mvn -U clean install
~~~
If everything builds fine, you're good to go! Otherwise, don't hesitate to contact us or file an issue at our issue tracker.
## About
Go to the [about page](about.md)
## License
See: [License](license.md)
# Project-related
dependency-reduced-pom.xml
git.properties
# gedit
*~
# Vim
*.swp
# IntelliJ
.idea/*
*.iml
target/
public/target/
protected/target/
Test implementation of Magpie 2.0
\ No newline at end of file
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<!--TODO: replace groupId -->
<groupId>org.example.group</groupId>
<!--TODO: replace artifactId -->
<artifactId>ExternalExample</artifactId>
<!--TODO: replace version, for a new pipeline it's advised to start with '0.1.0-SNAPSHOT' -->
<version>0.1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<scoverage.plugin.version>1.0.4</scoverage.plugin.version>
<sting.shade.phase>package</sting.shade.phase>
<!--
TODO: replace app.main.class, this is the class that get executed when running the jar file
This can be any executable that have a main method. In Biopet any pipeline can be used as direct executable.
Value for direct pipeline: 'org.example.group.pipelines.SimplePipeline'
In the given example is an extension of the biopet executable. In this example there are multiple pipelines in 1 executable.
It's also possible to make your own main function and call the main function with it's argument of the pipeline from there.
-->
<app.main.class>org.example.group.ExecutableExample</app.main.class>
</properties>
<dependencies>
<!--
In here maven dependencies can be placed, when importing a biopet pipeline 'Biopet-Framework' is not required.
When only using the framework without pipeline you need to import BiopetFramework.
It's advised to not use different versions of the pipeline and the framework.
-->
<dependency>
<groupId>nl.lumc.sasc</groupId>
<artifactId>BiopetCore</artifactId>
<!--TODO: replace version of pipeline to a fixed version -->
<version>0.5.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>nl.lumc.sasc</groupId>
<artifactId>BiopetExtensions</artifactId>
<version>0.5.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>nl.lumc.sasc</groupId>
<artifactId>Shiva</artifactId>
<!--TODO: replace version of pipeline to a fixed version -->
<version>0.5.0-SNAPSHOT</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>${basedir}/src/main/scala</sourceDirectory>
<testSourceDirectory>${basedir}/src/test/scala</testSourceDirectory>
<testResources>
<testResource>
<directory>${basedir}/src/test/resources</directory>
<includes>
<include>**/*</include>
</includes>
</testResource>
</testResources>
<resources>
<resource>
<directory>${basedir}/src/main/resources</directory>
<includes>
<include>**/*</include>
</includes>
</resource>
</resources>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<configuration>
<!--suppress MavenModelInspection -->
<finalName>${project.artifactId}-${project.version}-${git.commit.id.abbrev}</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<manifestEntries>
<Main-Class>${app.main.class}</Main-Class>
<!--suppress MavenModelInspection -->
<X-Compile-Source-JDK>${maven.compile.source}</X-Compile-Source-JDK>
<!--suppress MavenModelInspection -->
<X-Compile-Target-JDK>${maven.compile.target}</X-Compile-Target-JDK>
</manifestEntries>
</transformer>
</transformers>
<filters>
</filters>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.18.1</version>
<configuration>
<forkCount>1C</forkCount>
<workingDirectory>${project.build.directory}</workingDirectory>
</configuration>
</plugin>
<plugin>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.10</version>
<executions>
<execution>
<id>copy-installed</id>
<phase>prepare-package</phase>
<goals>
<goal>list</goal>
</goals>
<configuration>
<outputFile>${project.build.outputDirectory}/dependency_list.txt</outputFile>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<id>scala-compile</id>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
<arg>-deprecation</arg>
<arg>-feature</arg>
</args>
</configuration>
</execution>
</executions>
<!-- ... (see other usage or goals for details) ... -->
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.5</version>
<executions>
<execution>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
</executions>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<showDeprecation>true</showDeprecation>
</configuration>
</plugin>
<plugin>
<groupId>org.scalariform</groupId>
<artifactId>scalariform-maven-plugin</artifactId>
<version>0.1.4</version>
<executions>
<execution>
<phase>process-sources</phase>
<goals>
<goal>format</goal>
</goals>
<configuration>
<rewriteArrowSymbols>false</rewriteArrowSymbols>
<alignParameters>true</alignParameters>
<alignSingleLineCaseStatements_maxArrowIndent>40
</alignSingleLineCaseStatements_maxArrowIndent>
<alignSingleLineCaseStatements>true</alignSingleLineCaseStatements>
<compactStringConcatenation>false</compactStringConcatenation>
<compactControlReadability>false</compactControlReadability>
<doubleIndentClassDeclaration>false</doubleIndentClassDeclaration>
<formatXml>true</formatXml>
<indentLocalDefs>false</indentLocalDefs>
<indentPackageBlocks>true</indentPackageBlocks>
<indentSpaces>2</indentSpaces>
<placeScaladocAsterisksBeneathSecondAsterisk>false
</placeScaladocAsterisksBeneathSecondAsterisk>
<preserveDanglingCloseParenthesis>true</preserveDanglingCloseParenthesis>
<preserveSpaceBeforeArguments>false</preserveSpaceBeforeArguments>
<rewriteArrowSymbols>false</rewriteArrowSymbols>
<spaceBeforeColon>false</spaceBeforeColon>
<spaceInsideBrackets>false</spaceInsideBrackets>
<spaceInsideParentheses>false</spaceInsideParentheses>
<spacesWithinPatternBinders>true</spacesWithinPatternBinders>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>pl.project13.maven</groupId>
<artifactId>git-commit-id-plugin</artifactId>
<version>2.1.10</version>
<executions>
<execution>
<goals>
<goal>revision</goal>
</goals>
</execution>
</executions>
<configuration>
<prefix>git</prefix>
<dateFormat>dd.MM.yyyy '@' HH:mm:ss z</dateFormat>
<verbose>false</verbose>
<!-- TODO: This directory need to be changed depening where your .git folder is relative from this pom.xml -->
<dotGitDirectory>${basedir}/../.git</dotGitDirectory>
<useNativeGit>true</useNativeGit>
<skipPoms>false</skipPoms>
<generateGitPropertiesFile>true</generateGitPropertiesFile>
<generateGitPropertiesFilename>src/main/resources/git.properties</generateGitPropertiesFilename>
<failOnNoGitDirectory>false</failOnNoGitDirectory>
<abbrevLength>8</abbrevLength>
<skip>false</skip>
<gitDescribe>
<skip>false</skip>
<always>false</always>
<abbrev>8</abbrev>
<dirty>-dirty</dirty>
<forceLongFormat>false</forceLongFormat>
</gitDescribe>
</configuration>
</plugin>
<plugin>
<groupId>com.mycila</groupId>
<artifactId>license-maven-plugin</artifactId>
<version>2.6</version>
<configuration>
<excludes>
<exclude>**/*git*</exclude>
<exclude>**/License*</exclude>
<exclude>**/*.bam</exclude>
<exclude>**/*.bai</exclude>
<exclude>**/*.gtf</exclude>
<exclude>**/*.fq</exclude>
<exclude>**/*.sam</exclude>
<exclude>**/*.bed</exclude>
<exclude>**/*.refFlat</exclude>
<exclude>**/*.R</exclude>
<exclude>**/*.rscript</exclude>
</excludes>
</configuration>
</plugin>
<plugin>
<groupId>org.scoverage</groupId>
<artifactId>scoverage-maven-plugin</artifactId>
<version>${scoverage.plugin.version}</version>
<configuration>
<scalaVersion>2.10.2</scalaVersion>
<!-- other parameters -->
</configuration>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.scoverage</groupId>
<artifactId>scoverage-maven-plugin</artifactId>
<version>${scoverage.plugin.version}</version>
</plugin>
</plugins>
</reporting>
</project>
\ No newline at end of file
package org.example.group
import nl.lumc.sasc.biopet.utils.{ BiopetExecutable, MainCommand }
/**
* Created by pjvanthof on 30/08/15.
*/
object ExecutableExample extends BiopetExecutable {
/** This list defines the pipeline that are usable from the executable */
def pipelines: List[MainCommand] = List(
org.example.group.pipelines.MultisamplePipeline,
org.example.group.pipelines.BiopetPipeline,
org.example.group.pipelines.SimplePipeline
)
/** This list defines the (biopet)tools that are usable from the executable */
def tools: List[MainCommand] = Nil
}
package org.example.group.pipelines
import nl.lumc.sasc.biopet.core.PipelineCommand
import nl.lumc.sasc.biopet.utils.config.Configurable
import nl.lumc.sasc.biopet.core.summary.SummaryQScript
import nl.lumc.sasc.biopet.pipelines.shiva.Shiva
import nl.lumc.sasc.biopet.utils.config.Configurable
import org.broadinstitute.gatk.queue.QScript
/**
* Created by pjvan_thof on 8/28/15.
*/
//TODO: Replace class Name
class BiopetPipeline(val root: Configurable) extends QScript with SummaryQScript {
def this() = this(null)
/** Only required when using [[SummaryQScript]] */
def summaryFile = new File(outputDir, "magpie.summary.json")
/** Only required when using [[SummaryQScript]] */
def summaryFiles: Map[String, File] = Map()
/** Only required when using [[SummaryQScript]] */
def summarySettings = Map()
// This method can be used to initialize some classes where needed
def init(): Unit = {
}
// This method is the actual pipeline
def biopetScript: Unit = {
// Executing a biopet pipeline inside
val shiva = new Shiva(this)
shiva.init()
shiva.biopetScript()
addAll(shiva.functions)
/* Only required when using [[SummaryQScript]] */
addSummaryQScript(shiva)
// From here you can use the output files of shiva as input file of other jobs
}
}
//TODO: Replace object Name, must be the same as the class of the pipeline
object BiopetPipeline extends PipelineCommand
package org.example.group.pipelines
import nl.lumc.sasc.biopet.core.{ PipelineCommand, MultiSampleQScript }
import nl.lumc.sasc.biopet.utils.config.Configurable
import nl.lumc.sasc.biopet.utils.config.Configurable
import org.broadinstitute.gatk.queue.QScript
/**
* Created by pjvanthof on 30/08/15.
*/
class MultisamplePipeline(val root: Configurable) extends QScript with MultiSampleQScript {
qscript =>