Commit 2d9184ce authored by Peter van 't Hof's avatar Peter van 't Hof
Browse files

Merge branch 'feature-dev_docs' into feature-docs-0.5.0

parents d167539f 66f93b8c
...@@ -12,3 +12,4 @@ git.properties ...@@ -12,3 +12,4 @@ git.properties
target/ target/
public/target/ public/target/
protected/target/ protected/target/
site/
...@@ -64,8 +64,8 @@ We welcome any kind of contribution, be it merge requests on the code base, docu ...@@ -64,8 +64,8 @@ We welcome any kind of contribution, be it merge requests on the code base, docu
To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.4 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first. To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.4 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first.
~~~ ~~~
$ git clone https://github.com/broadgsa/gatk $ git clone https://github.com/broadgsa/gatk-protected
$ cd gatk $ cd gatk-protected
$ git checkout 3.4 # the current release is based on GATK 3.4 $ git checkout 3.4 # the current release is based on GATK 3.4
$ mvn -U clean install $ mvn -U clean install
~~~ ~~~
......
# Developer - Code style
## General rules
- Variable names should alway be in *CamelCase* and does **not** start with a capital letter
- Class names should alway be in *CamelCase* and does **always** start with a capital letter
- Avoid using `null`, the Option `type` in Scala can be used instead
- If a method/value is designed to be overridden make it a `def` and override it with a `def`, we encourage you to not use `val`
\ No newline at end of file
# Developer - Getting started
### Requirements
- Maven 3.3
- Installed Gatk to maven local repository (see below)
- Installed Biopet to maven local repository (see below)
- Some knowledge of the programming language [Scala](http://www.scala-lang.org/) (The pipelines are scripted using Scala)
- We encourage users to use an IDE for scripting the pipeline. One that works pretty well for us is: [IntelliJ IDEA](https://www.jetbrains.com/idea/)
To start the development of a biopet pipeline you should have the following tools installed:
* Gatk
* Biopet
Make sure both tools are installed in your local maven repository. To do this one should use the commands below.
```bash
# Replace 'mvn' with the location of you maven executable or put it in your PATH with the export command.
git clone https://github.com/broadgsa/gatk-protected
cd gatk-protected
git checkout 3.4
# The GATK version is bound to a version of Biopet. Biopet 0.5.0 uses Gatk 3.4
mvn clean install
cd ..
git clone https://github.com/biopet/biopet.git
cd biopet
git checkout 0.5.0
mvn -DskipTests=true clean install
```
### Basic components
#### Qscript (pipeline)
A basic pipeline would look like this.
```scala
package org.example.group.pipelines
import nl.lumc.sasc.biopet.core.{ BiopetQScript, PipelineCommand }
import nl.lumc.sasc.biopet.utils.config.Configurable
import nl.lumc.sasc.biopet.extensions.{ Gzip, Cat }
import org.broadinstitute.gatk.queue.QScript
//TODO: Replace class name, must be the same as the class of the pipeline
class SimplePipeline(val root: Configurable) extends QScript with BiopetQScript {
// A constructor without arguments is needed if this pipeline is a root pipeline
def this() = this(null)
@Input(required = true)
var inputFile: File = null
/** This method can be used to initialize some classes where needed */
def init(): Unit = {
}
/** This method is the actual pipeline */
def biopetScript: Unit = {
val cat = new Cat(this)
cat.input :+= inputFile
cat.output = new File(outputDir, "file.out")
add(cat)
val gzip = new Gzip(this)
gzip.input :+= cat.output
gzip.output = new File(outputDir, "file.out.gz")
add(gzip)
}
}
//TODO: Replace object name, must be the same as the class of the pipeline
object SimplePipeline extends PipelineCommand
```
#### Extensions (wrappers)
Wrappers have to be written for each tool used inside the pipeline. A basic wrapper (example wraps the linux ```cat``` command) would look like this:
```scala
package nl.lumc.sasc.biopet.extensions
import java.io.File
import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction
import nl.lumc.sasc.biopet.utils.config.Configurable
import org.broadinstitute.gatk.utils.commandline.{ Input, Output }
/**
* Extension for GNU cat
*/
class Cat(val root: Configurable) extends BiopetCommandLineFunction {
@Input(doc = "Input file", required = true)
var input: List[File] = Nil
@Output(doc = "Unzipped file", required = true)
var output: File = _
executable = config("exe", default = "cat")
/** return commandline to execute */
def cmdLine = required(executable) + repeat(input) + " > " + required(output)
}
```
#### Tools (Scala programs)
Within the Biopet framework it is also possible to write your own tools in Scala. If a give functionality or script is not incorporated within the framework
one can write a tool that does the job. Below you can see an example tool which is written for automatically building sample configs.
```scala
package nl.lumc.sasc.biopet.tools
import java.io.{ PrintWriter, File }
import nl.lumc.sasc.biopet.utils.ConfigUtils._
import nl.lumc.sasc.biopet.utils.ToolCommand
import scala.collection.mutable
import scala.io.Source
/**
* This tool can convert a tsv to a json file
*/
object SamplesTsvToJson extends ToolCommand {
case class Args(inputFiles: List[File] = Nil, outputFile: Option[File] = None) extends AbstractArgs
class OptParser extends AbstractOptParser {
opt[File]('i', "inputFiles") required () unbounded () valueName "<file>" action { (x, c) =>
c.copy(inputFiles = x :: c.inputFiles)
} text "Input must be a tsv file, first line is seen as header and must at least have a 'sample' column, 'library' column is optional, multiple files allowed"
opt[File]('o', "outputFile") unbounded () valueName "<file>" action { (x, c) =>
c.copy(outputFile = Some(x))
}
}
/** Executes SamplesTsvToJson */
def main(args: Array[String]): Unit = {
val argsParser = new OptParser
val commandArgs: Args = argsParser.parse(args, Args()) getOrElse sys.exit(1)
val jsonString = stringFromInputs(commandArgs.inputFiles)
commandArgs.outputFile match {
case Some(file) => {
val writer = new PrintWriter(file)
writer.println(jsonString)
writer.close()
}
case _ => println(jsonString)
}
}
def mapFromFile(inputFile: File): Map[String, Any] = {
val reader = Source.fromFile(inputFile)
val lines = reader.getLines().toList.filter(!_.isEmpty)
val header = lines.head.split("\t")
val sampleColumn = header.indexOf("sample")
val libraryColumn = header.indexOf("library")
if (sampleColumn == -1) throw new IllegalStateException("Sample column does not exist in: " + inputFile)
val sampleLibCache: mutable.Set[(String, Option[String])] = mutable.Set()
val librariesValues: List[Map[String, Any]] = for (tsvLine <- lines.tail) yield {
val values = tsvLine.split("\t")
require(header.length == values.length, "Number of columns is not the same as the header")
val sample = values(sampleColumn)
val library = if (libraryColumn != -1) Some(values(libraryColumn)) else None
//FIXME: this is a workaround, should be removed after fixing #180
if (sample.head.isDigit || library.forall(_.head.isDigit))
throw new IllegalStateException("Sample or library may not start with a number")
if (sampleLibCache.contains((sample, library)))
throw new IllegalStateException(s"Combination of $sample ${library.map("and " + _).getOrElse("")} is found multiple times")
else sampleLibCache.add((sample, library))
val valuesMap = (for (
t <- 0 until values.size if !values(t).isEmpty && t != sampleColumn && t != libraryColumn
) yield header(t) -> values(t)).toMap
library match {
case Some(lib) => Map("samples" -> Map(sample -> Map("libraries" -> Map(lib -> valuesMap))))
case _ => Map("samples" -> Map(sample -> valuesMap))
}
}
librariesValues.foldLeft(Map[String, Any]())((acc, kv) => mergeMaps(acc, kv))
}
def stringFromInputs(inputs: List[File]): String = {
val map = inputs.map(f => mapFromFile(f)).foldLeft(Map[String, Any]())((acc, kv) => mergeMaps(acc, kv))
mapToJson(map).spaces2
}
}
```
\ No newline at end of file
...@@ -64,10 +64,10 @@ We welcome any kind of contribution, be it merge requests on the code base, docu ...@@ -64,10 +64,10 @@ We welcome any kind of contribution, be it merge requests on the code base, docu
To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.4 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first. To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.4 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first.
~~~ ~~~
$ git clone https://github.com/broadgsa/gatk $ git clone https://github.com/broadgsa/gatk-protected
$ cd gatk $ cd gatk-protected
$ git checkout 3.4 # the current release is based on GATK 3.4 $ git checkout 3.4 # the current release is based on GATK 3.4
$ mvn -U clean install $ mvn clean install
~~~ ~~~
This will install all the required dependencies to your local maven repository. After this is done, you can clone our repository and test if everything builds fine: This will install all the required dependencies to your local maven repository. After this is done, you can clone our repository and test if everything builds fine:
...@@ -75,7 +75,7 @@ This will install all the required dependencies to your local maven repository. ...@@ -75,7 +75,7 @@ This will install all the required dependencies to your local maven repository.
~~~ ~~~
$ git clone https://github.com/biopet/biopet.git $ git clone https://github.com/biopet/biopet.git
$ cd biopet $ cd biopet
$ mvn -U clean install $ mvn clean install
~~~ ~~~
If everything builds fine, you're good to go! Otherwise, don't hesitate to contact us or file an issue at our issue tracker. If everything builds fine, you're good to go! Otherwise, don't hesitate to contact us or file an issue at our issue tracker.
...@@ -83,8 +83,8 @@ If everything builds fine, you're good to go! Otherwise, don't hesitate to conta ...@@ -83,8 +83,8 @@ If everything builds fine, you're good to go! Otherwise, don't hesitate to conta
## About ## About
Go to the [about page](about.md) Go to the [about page](general/about.md)
## License ## License
See: [License](license.md) See: [License](general/license.md)
...@@ -4,6 +4,8 @@ pages: ...@@ -4,6 +4,8 @@ pages:
- General: - General:
- Config: 'general/config.md' - Config: 'general/config.md'
- Requirements: 'general/requirements.md' - Requirements: 'general/requirements.md'
- About: 'general/about.md'
- License: 'general/license.md'
- Pipelines: - Pipelines:
- Basty: 'pipelines/basty.md' - Basty: 'pipelines/basty.md'
- Bam2Wig: 'pipelines/bam2wig.md' - Bam2Wig: 'pipelines/bam2wig.md'
...@@ -35,8 +37,11 @@ pages: ...@@ -35,8 +37,11 @@ pages:
- 0.3.2: 'release_notes_0.3.2.md' - 0.3.2: 'release_notes_0.3.2.md'
- 0.3.1: 'release_notes_0.3.1.md' - 0.3.1: 'release_notes_0.3.1.md'
- 0.3.0: 'release_notes_0.3.0.md' - 0.3.0: 'release_notes_0.3.0.md'
- About: 'about.md' - Developer:
- License: 'license.md' - Getting Started: 'developer/getting-started.md'
- Code Style: 'developer/code-style.md'
- Scala docs:
- 0.4.0: 'developer/code-style.md'
#- ['developing/Setup.md', 'Developing', 'Setting up your local development environment'] #- ['developing/Setup.md', 'Developing', 'Setting up your local development environment']
#theme: readthedocs #theme: readthedocs
repo_url: https://github.com/biopet/biopet repo_url: https://github.com/biopet/biopet
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment