Merge branch 'develop' into biopet-bios

7aace72b · Peter van 't Hof · f3dc2736 · d36878da · 7aace72b · 7aace72b
Commit 7aace72b authored 8 years ago by Peter van 't Hof
--- a/README.md
+++ b/README.md
@@ -5,6 +5,7 @@

 Biopet (Bio Pipeline Execution Toolkit) is the main pipeline development framework of the LUMC Sequencing Analysis Support Core team. It contains our main pipelines and some of the command line tools we develop in-house. It is meant to be used in the main [SHARK](https://humgenprojects.lumc.nl/trac/shark) computing cluster. While usage outside of SHARK is technically possible, some adjustments may need to be made in order to do so.

+Full documantation is here: [Biopet documantation](http://biopet-docs.readthedocs.io/en/latest/)

 ## Quick Start

@@ -48,9 +49,9 @@ $ biopet pipeline <pipeline_name> -config <path/to/config.json> -qsub -jobParaEn

 It is usually a good idea to do the real run using `screen` or `nohup` to prevent the job from terminating when you log out of SHARK. In practice, using `biopet` as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files [here](docs/config.md). For the specific structure that each pipeline accepts, please consult the respective pipeline page.

-### Running Biopet in your own computer
+## Testing

-At the moment, we do not provide links to download the Biopet package. If you are interested in trying out Biopet locally, please contact us as [sasc@lumc.nl](mailto:sasc@lumc.nl).
+Our code is tested at our local Jenkins installation for every change. We are using a [JenkinsFile](Jenkinsfile) in our repository to do this.


 ## Contributing to Biopet
@@ -59,27 +60,7 @@ Biopet is based on the Queue framework developed by the Broad Institute as part

 We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://github.com/biopet/biopet](https://github.com/biopet/biopet/issues), along with our issue tracker.

-## Local development setup
-
-To develop Biopet, Java 7, Maven 3.3.3, and GATK Queue 3.5 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first.
-
-~~~
-$ git clone https://github.com/broadgsa/gatk-protected
-$ cd gatk
-$ git checkout 3.5                              # the current release is based on GATK 3.5
-$ mvn -U clean install
-~~~
-
-This will install all the required dependencies to your local maven repository. After this is done, you can clone our repository and test if everything builds fine:
-
-~~~
-$ git clone https://github.com/biopet/biopet.git
-$ cd biopet
-$ mvn -U clean install
-~~~
-
-If everything builds fine, you're good to go! Otherwise, don't hesitate to contact us or file an issue at our issue tracker.
-
+For more information please go to our [Developer documantation](http://biopet-docs.readthedocs.io/en/develop/developer/getting-started/)

 ## About


--- a/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetQScript.scala
+++ b/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetQScript.scala
@@ -19,7 +19,6 @@ import java.io.File
 import nl.lumc.sasc.biopet.core.summary.{ SummaryQScript, WriteSummary }
 import nl.lumc.sasc.biopet.utils.config.Configurable
 import nl.lumc.sasc.biopet.core.report.ReportBuilderExtension
-import nl.lumc.sasc.biopet.core.workaround.BiopetQCommandLine
 import nl.lumc.sasc.biopet.utils.Logging
 import org.broadinstitute.gatk.queue.{ QScript, QSettings }
 import org.broadinstitute.gatk.queue.function.QFunction
@@ -118,11 +117,10 @@ trait BiopetQScript extends Configurable with GatkLogging { qscript: QScript =>
    }

    functions.filter(_.jobOutputFile == null).foreach(f => {
-      try {
-        val className = if (f.getClass.isAnonymousClass) f.getClass.getSuperclass.getSimpleName else f.getClass.getSimpleName
-        f.jobOutputFile = new File(f.firstOutput.getAbsoluteFile.getParent, "." + f.firstOutput.getName + "." + className + ".out")
-      } catch {
-        case e: NullPointerException => logger.warn(s"Can't generate a jobOutputFile for $f")
+      val className = if (f.getClass.isAnonymousClass) f.getClass.getSuperclass.getSimpleName else f.getClass.getSimpleName
+      BiopetQScript.safeOutputs(f) match {
+        case Some(o) => f.jobOutputFile = new File(o.head.getAbsoluteFile.getParent, "." + f.firstOutput.getName + "." + className + ".out")
+        case _ => f.jobOutputFile = new File("./stdout") // Line is here for test backup
      }
    })

@@ -159,7 +157,7 @@ trait BiopetQScript extends Configurable with GatkLogging { qscript: QScript =>
      case that: BiopetQScript =>
        that.init()
        that.biopetScript()
-      case _ => subPipeline.script
+      case _ => subPipeline.script()
    }
    addAll(subPipeline.functions)
  }

--- a/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/WriteDependencies.scala
+++ b/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/WriteDependencies.scala
@@ -101,6 +101,9 @@ object WriteDependencies extends Logging with Configurable {
        file.addOutputJob(function)
        files += output -> file
      }
+      val file = files.getOrElse(function.jobOutputFile, QueueFile(function.jobOutputFile))
+      file.addOutputJob(function)
+      files += function.jobOutputFile -> file
    }

    val jobs = functionNames.par.map {
@@ -116,7 +119,7 @@ object WriteDependencies extends Logging with Configurable {
          "depends_on_intermediate" -> BiopetQScript.safeOutputs(f).getOrElse(Seq()).exists(files(_).isIntermediate),
          "depends_on_jobs" -> BiopetQScript.safeOutputs(f).getOrElse(Seq()).toList.flatMap(files(_).outputJobNames).distinct,
          "output_used_by_jobs" -> BiopetQScript.safeOutputs(f).getOrElse(Seq()).toList.flatMap(files(_).inputJobNames).distinct,
-          "outputs" -> BiopetQScript.safeOutputs(f).getOrElse(Seq()).toList,
+          "outputs" -> (f.jobOutputFile :: BiopetQScript.safeOutputs(f).getOrElse(Seq()).toList),
          "inputs" -> BiopetQScript.safeOutputs(f).getOrElse(Seq()).toList,
          "done_files" -> BiopetQScript.safeDoneFiles(f).getOrElse(Seq()).toList,
          "fail_files" -> BiopetQScript.safeFailFiles(f).getOrElse(Seq()).toList,

--- a/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/VariantEffectPredictor.scala
+++ b/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/VariantEffectPredictor.scala
@@ -40,10 +40,10 @@ class VariantEffectPredictor(val root: Configurable) extends BiopetCommandLineFu
  var vepScript: String = config("vep_script")

  @Input(doc = "input VCF", required = true)
-  var input: File = null
+  var input: File = _

  @Output(doc = "output file", required = true)
-  var output: File = null
+  var output: File = _

  override def subPath = {
    if (vepVersion.isSet) super.subPath ++ List("vep_settings") ++ vepVersion()
@@ -160,7 +160,7 @@ class VariantEffectPredictor(val root: Configurable) extends BiopetCommandLineFu
  override def defaultCoreMemory = 4.0

  @Output
-  private var _summary: File = null
+  private var _summary: File = _

  override def beforeGraph(): Unit = {
    super.beforeGraph()
@@ -312,11 +312,11 @@ class VariantEffectPredictor(val root: Configurable) extends BiopetCommandLineFu

    (for ((header, headerIndex) <- headers) yield {
      val name = header.stripPrefix("[").stripSuffix("]")
-      name.replaceAll(" ", "_") -> (contents.drop(headerIndex + 1).takeWhile(!isHeader(_)).flatMap { line =>
+      name.replaceAll(" ", "_") -> contents.drop(headerIndex + 1).takeWhile(!isHeader(_)).flatMap { line =>
        val values = line.split("\t", 2)
        if (values.last.isEmpty || values.last == "-") None
        else Some(values.head.replaceAll(" ", "_") -> tryToParseNumber(values.last).getOrElse(values.last))
-      }.toMap)
+      }.toMap
    }).toMap
  }
 }
--- a/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/UtilsTest.scala
+++ b/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/UtilsTest.scala
@@ -6,6 +6,7 @@ import org.testng.annotations.Test

 /**
 * Created by Sander Bollen on 12-10-16.
+ * Here we test utils
 */
 class UtilsTest extends TestNGSuite with Matchers {


--- a/biopet-utils/src/test/scala/VcfUtilsTest.scala
+++ b/biopet-utils/src/test/scala/VcfUtilsTest.scala
-import htsjdk.variant.variantcontext.{ Allele, Genotype, GenotypeBuilder }
+package nl.lumc.sasc.biopet.utils
+
+import htsjdk.variant.variantcontext.{ Allele, GenotypeBuilder }
 import org.scalatest.Matchers
 import org.scalatest.testng.TestNGSuite
 import org.testng.annotations.Test

 import scala.collection.JavaConversions._

-import nl.lumc.sasc.biopet.utils.VcfUtils
-
 /**
 * Created by Sander Bollen on 4-10-16.
 */

--- a/docs/developer/getting-started.md
+++ b/docs/developer/getting-started.md
@@ -2,6 +2,7 @@

 ### Requirements
 - Maven 3.3
+- Java 8
 - Installed Gatk to maven local repository (see below)
 - Installed Biopet to maven local repository (see below)
 - Some knowledge of the programming language [Scala](http://www.scala-lang.org/) (The pipelines are scripted using Scala)
@@ -16,17 +17,9 @@ Make sure both tools are installed in your local maven repository. To do this on

 ```bash
 # Replace 'mvn' with the location of you maven executable or put it in your PATH with the export command.
-git clone https://github.com/broadgsa/gatk
-cd gatk
-git checkout 3.6
-# The GATK version is bound to a version of Biopet. Biopet 0.7.0 uses Gatk 3.6
-mvn clean install

-cd ..
-
-git clone https://github.com/biopet/biopet.git
+git clone --recursive https://github.com/biopet/biopet.git
 cd biopet
-git checkout 0.7.0
 mvn -DskipTests=true clean install
 ```


--- a/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/MultisampleMappingReport.scala
+++ b/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/MultisampleMappingReport.scala
@@ -50,7 +50,7 @@ trait MultisampleMappingReportTrait extends MultisampleReportBuilder {
    val wgsExecuted = summary.getSampleValues("bammetrics", "stats", "wgs").values.exists(_.isDefined)
    val rnaExecuted = summary.getSampleValues("bammetrics", "stats", "rna").values.exists(_.isDefined)
    val insertsizeExecuted = summary.getSampleValues("bammetrics", "stats", "CollectInsertSizeMetrics", "metrics").values.exists(_ != Some(None))
-    val mappingExecuted = summary.getLibraryValues("mapping").nonEmpty
+    val mappingExecuted = summary.getLibraryValues("mapping").exists(_._2.isDefined)
    val pairedFound = !mappingExecuted || summary.getLibraryValues("mapping", "settings", "paired").exists(_._2 == Some(true))
    val flexiprepExecuted = summary.getLibraryValues("flexiprep")
      .exists { case ((sample, lib), value) => value.isDefined }