diff --git a/README.md b/README.md index a34cd136c8acddbac64ad19fbb6d2e63ae911c80..06afe292ff4b01756841094444d7e020ee394014 100755 --- a/README.md +++ b/README.md @@ -46,7 +46,7 @@ If the dry run proceeds without problems, you can then do the real run by using $ biopet pipeline <pipeline_name> -config <path/to/config.json> -qsub -jobParaEnv BWA -retry 2 -run ~~~ -It is usually a good idea to do the real run using `screen` or `nohup` to prevent the job from terminating when you log out of SHARK. In practice, using `biopet` as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files [here](general/config.md). For the specific structure that each pipeline accepts, please consult the respective pipeline page. +It is usually a good idea to do the real run using `screen` or `nohup` to prevent the job from terminating when you log out of SHARK. In practice, using `biopet` as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files [here](docs/config.md). For the specific structure that each pipeline accepts, please consult the respective pipeline page. ### Running Biopet in your own computer @@ -55,25 +55,25 @@ At the moment, we do not provide links to download the Biopet package. If you ar ## Contributing to Biopet -Biopet is based on the Queue framework developed by the Broad Institute as part of their Genome Analysis Toolkit (GATK) framework. The current Biopet release is based on the GATK 3.3 release. +Biopet is based on the Queue framework developed by the Broad Institute as part of their Genome Analysis Toolkit (GATK) framework. The current Biopet release is based on the GATK 3.4 release. -We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://git.lumc.nl/biopet/biopet](https://git.lumc.nl/biopet/biopet/issues), along with our issue tracker. +We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://github.com/biopet/biopet](https://github.com/biopet/biopet/issues), along with our issue tracker. ## Local development setup -To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.3 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first. +To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.4 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first. ~~~ $ git clone https://github.com/broadgsa/gatk $ cd gatk -$ git checkout 3.3 # the current release is based on GATK 3.3 +$ git checkout 3.4 # the current release is based on GATK 3.4 $ mvn -U clean install ~~~ This will install all the required dependencies to your local maven repository. After this is done, you can clone our repository and test if everything builds fine: ~~~ -$ git clone git@git.lumc.nl:biopet/biopet.git +$ git clone https://github.com/biopet/biopet.git $ cd biopet $ mvn -U clean install ~~~ @@ -83,8 +83,8 @@ If everything builds fine, you're good to go! Otherwise, don't hesitate to conta ## About -Go to the [about page](about) +Go to the [about page](docs/about.md) ## License -See: [License](license.md) +See: [License](docs/license.md) diff --git a/biopet-aggregate/copy-src.sh b/biopet-aggregate/copy-src.sh new file mode 100755 index 0000000000000000000000000000000000000000..fdfefa48461afe7f1f8e1b4aad226ab818caee25 --- /dev/null +++ b/biopet-aggregate/copy-src.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +DIR=`readlink -f \`dirname $0\`` + +cp -r $DIR/../*/*/src/* $DIR/src + diff --git a/biopet-aggregate/pom.xml b/biopet-aggregate/pom.xml new file mode 100644 index 0000000000000000000000000000000000000000..4b89651970b5b9c05bf599a6b14d44b161793663 --- /dev/null +++ b/biopet-aggregate/pom.xml @@ -0,0 +1,46 @@ +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <parent> + <artifactId>BiopetRoot</artifactId> + <groupId>nl.lumc.sasc</groupId> + <version>0.5.0-SNAPSHOT</version> + </parent> + <modelVersion>4.0.0</modelVersion> + + <artifactId>BiopetAggregate</artifactId> + + <dependencies> + <dependency> + <groupId>org.testng</groupId> + <artifactId>testng</artifactId> + <version>6.8</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.mockito</groupId> + <artifactId>mockito-all</artifactId> + <version>1.9.5</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.scalatest</groupId> + <artifactId>scalatest_2.10</artifactId> + <version>2.2.1</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetProtectedPackage</artifactId> + <version>0.5.0-SNAPSHOT</version> + </dependency> + <dependency> + <groupId>com.google.guava</groupId> + <artifactId>guava</artifactId> + <version>18.0</version> + </dependency> + + </dependencies> + +</project> \ No newline at end of file diff --git a/biopet-aggregate/rm-src.sh b/biopet-aggregate/rm-src.sh new file mode 100755 index 0000000000000000000000000000000000000000..f0a2e2b9307150913a9705bd237f226dc157157e --- /dev/null +++ b/biopet-aggregate/rm-src.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +DIR=`readlink -f \`dirname $0\`` + +rm -r $DIR/src/main $DIR/src/test + diff --git a/biopet-aggregate/src/.gitignore b/biopet-aggregate/src/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..59e4d6049a3756e3d0c5414611e042fcc5c1bce6 --- /dev/null +++ b/biopet-aggregate/src/.gitignore @@ -0,0 +1,2 @@ +main +test diff --git a/docs/general/requirements.md b/docs/general/requirements.md new file mode 100644 index 0000000000000000000000000000000000000000..0105f7ccc29dcbd5def4b6b49a6bb1235031d858 --- /dev/null +++ b/docs/general/requirements.md @@ -0,0 +1,17 @@ +### System Requirements + +Biopet is build on top of GATK Queue, which requires having `java` installed on the analysis machine(s). + +For end-users: + + * [Java 7 JVM](http://www.oracle.com/technetwork/java/javase/downloads/index.html) or [OpenJDK 7](http://openjdk.java.net/install/) + * [Cran R 2.15.3](http://cran.r-project.org/) + +For developers: + + * [OpenJDK 7](http://openjdk.java.net/install/) + * Minimum of 4 GB RAM {todo: provide more accurate estimation on building} + * Maven 3 + * Compiled and installed version 3.4 of [GATK + Queue](https://github.com/broadgsa/gatk-protected/) in your maven repository. + * IntelliJ or Netbeans 8.0 for development + diff --git a/docs/index.md b/docs/index.md deleted file mode 120000 index 32d46ee883b58d6a383eed06eb98f33aa6530ded..0000000000000000000000000000000000000000 --- a/docs/index.md +++ /dev/null @@ -1 +0,0 @@ -../README.md \ No newline at end of file diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000000000000000000000000000000000000..f55a39193eabc688fca65dc34bb6332c01be6a3b --- /dev/null +++ b/docs/index.md @@ -0,0 +1,90 @@ +# Welcome to Biopet + + +## Introduction + +Biopet (Bio Pipeline Execution Toolkit) is the main pipeline development framework of the LUMC Sequencing Analysis Support Core team. It contains our main pipelines and some of the command line tools we develop in-house. It is meant to be used in the main [SHARK](https://humgenprojects.lumc.nl/trac/shark) computing cluster. While usage outside of SHARK is technically possible, some adjustments may need to be made in order to do so. + + +## Quick Start + +### Running Biopet in the SHARK cluster + +Biopet is available as a JAR package in SHARK. The easiest way to start using it is to activate the `biopet` environment module, which sets useful aliases and environment variables: + +~~~ +$ module load biopet/v0.4.0 +~~~ + +With each Biopet release, an accompanying environment module is also released. The latest release is version 0.4.0, thus `biopet/v0.4.0` is the module you would want to load. + +After loading the module, you can access the biopet package by simply typing `biopet`: + +~~~ +$ biopet +~~~ + +This will show you a list of tools and pipelines that you can use straight away. You can also execute `biopet pipeline` to show only available pipelines or `biopet tool` to show only the tools. What you should be aware of, is that this is actually a shell function that calls `java` on the system-wide available Biopet JAR file. + +~~~ +$ java -jar <path/to/current/biopet/release.jar> +~~~ + +The actual path will vary from version to version, which is controlled by which module you loaded. + +Almost all of the pipelines have a common usage pattern with a similar set of flags, for example: + +~~~ +$ biopet pipeline <pipeline_name> -config <path/to/config.json> -qsub -jobParaEnv BWA -retry 2 +~~~ + +The command above will do a *dry* run of a pipeline using a config file as if the command would be submitted to the SHARK cluster (the `-qsub` flag) to the `BWA` parallel environment (the `-jobParaEnv BWA` flag). We also set the maximum retry of failing jobs to two times (via the `-retry 2` flag). Doing a good run is a good idea to ensure that your real run proceeds smoothly. It may not catch all the errors, but if the dry run fails you can be sure that the real run will never succeed. + +If the dry run proceeds without problems, you can then do the real run by using the `-run` flag: + +~~~ +$ biopet pipeline <pipeline_name> -config <path/to/config.json> -qsub -jobParaEnv BWA -retry 2 -run +~~~ + +It is usually a good idea to do the real run using `screen` or `nohup` to prevent the job from terminating when you log out of SHARK. In practice, using `biopet` as it is is also fine. What you need to keep in mind, is that each pipeline has their own expected config layout. You can check out more about the general structure of our config files [here](general/config.md). For the specific structure that each pipeline accepts, please consult the respective pipeline page. + +### Running Biopet in your own computer + +At the moment, we do not provide links to download the Biopet package. If you are interested in trying out Biopet locally, please contact us as [sasc@lumc.nl](mailto:sasc@lumc.nl). + + +## Contributing to Biopet + +Biopet is based on the Queue framework developed by the Broad Institute as part of their Genome Analysis Toolkit (GATK) framework. The current Biopet release is based on the GATK 3.4 release. + +We welcome any kind of contribution, be it merge requests on the code base, documentation updates, or any kinds of other fixes! The main language we use is Scala, though the repository also contains a small bit of Python and R. Our main code repository is located at [https://github.com/biopet/biopet](https://github.com/biopet/biopet/issues), along with our issue tracker. + +## Local development setup + +To develop Biopet, Java 7, Maven 3.2.2, and GATK Queue 3.4 is required. Please consult the Java homepage and Maven homepage for the respective installation instruction. After you have both Java and Maven installed, you would then need to install GATK Queue. However, as the GATK Queue package is not yet available as an artifact in Maven Central, you will need to download, compile, and install GATK Queue first. + +~~~ +$ git clone https://github.com/broadgsa/gatk +$ cd gatk +$ git checkout 3.4 # the current release is based on GATK 3.4 +$ mvn -U clean install +~~~ + +This will install all the required dependencies to your local maven repository. After this is done, you can clone our repository and test if everything builds fine: + +~~~ +$ git clone https://github.com/biopet/biopet.git +$ cd biopet +$ mvn -U clean install +~~~ + +If everything builds fine, you're good to go! Otherwise, don't hesitate to contact us or file an issue at our issue tracker. + + +## About + +Go to the [about page](about.md) + +## License + +See: [License](license.md) diff --git a/external-example/.gitignore b/external-example/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..77146859b6e12b32e90cc6f6d388ebc878d548d6 --- /dev/null +++ b/external-example/.gitignore @@ -0,0 +1,14 @@ +# Project-related +dependency-reduced-pom.xml +git.properties + +# gedit +*~ +# Vim +*.swp +# IntelliJ +.idea/* +*.iml +target/ +public/target/ +protected/target/ diff --git a/external-example/README.md b/external-example/README.md new file mode 100644 index 0000000000000000000000000000000000000000..956c6ccf2746212b45f3650ef917147bdd9ed800 --- /dev/null +++ b/external-example/README.md @@ -0,0 +1 @@ +Test implementation of Magpie 2.0 \ No newline at end of file diff --git a/external-example/pom.xml b/external-example/pom.xml new file mode 100644 index 0000000000000000000000000000000000000000..da0d2258fc90fe3cc9a49b76504be5388c683fa1 --- /dev/null +++ b/external-example/pom.xml @@ -0,0 +1,294 @@ +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + + <!--TODO: replace groupId --> + <groupId>org.example.group</groupId> + + <!--TODO: replace artifactId --> + <artifactId>ExternalExample</artifactId> + + <!--TODO: replace version, for a new pipeline it's advised to start with '0.1.0-SNAPSHOT' --> + <version>0.1.0-SNAPSHOT</version> + + <properties> + <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> + <scoverage.plugin.version>1.0.4</scoverage.plugin.version> + <sting.shade.phase>package</sting.shade.phase> + + <!-- + TODO: replace app.main.class, this is the class that get executed when running the jar file + This can be any executable that have a main method. In Biopet any pipeline can be used as direct executable. + Value for direct pipeline: 'org.example.group.pipelines.SimplePipeline' + In the given example is an extension of the biopet executable. In this example there are multiple pipelines in 1 executable. + + It's also possible to make your own main function and call the main function with it's argument of the pipeline from there. + --> + <app.main.class>org.example.group.ExecutableExample</app.main.class> + </properties> + + <dependencies> + <!-- + In here maven dependencies can be placed, when importing a biopet pipeline 'Biopet-Framework' is not required. + When only using the framework without pipeline you need to import BiopetFramework. + It's advised to not use different versions of the pipeline and the framework. + --> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetCore</artifactId> + + <!--TODO: replace version of pipeline to a fixed version --> + <version>0.5.0-SNAPSHOT</version> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetExtensions</artifactId> + <version>0.5.0-SNAPSHOT</version> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>Shiva</artifactId> + + <!--TODO: replace version of pipeline to a fixed version --> + <version>0.5.0-SNAPSHOT</version> + </dependency> + </dependencies> + + <build> + <sourceDirectory>${basedir}/src/main/scala</sourceDirectory> + <testSourceDirectory>${basedir}/src/test/scala</testSourceDirectory> + <testResources> + <testResource> + <directory>${basedir}/src/test/resources</directory> + <includes> + <include>**/*</include> + </includes> + </testResource> + </testResources> + <resources> + <resource> + <directory>${basedir}/src/main/resources</directory> + <includes> + <include>**/*</include> + </includes> + </resource> + </resources> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-shade-plugin</artifactId> + <version>2.4.1</version> + <configuration> + <!--suppress MavenModelInspection --> + <finalName>${project.artifactId}-${project.version}-${git.commit.id.abbrev}</finalName> + <transformers> + <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> + <manifestEntries> + <Main-Class>${app.main.class}</Main-Class> + <!--suppress MavenModelInspection --> + <X-Compile-Source-JDK>${maven.compile.source}</X-Compile-Source-JDK> + <!--suppress MavenModelInspection --> + <X-Compile-Target-JDK>${maven.compile.target}</X-Compile-Target-JDK> + </manifestEntries> + </transformer> + </transformers> + <filters> + </filters> + </configuration> + <executions> + <execution> + <phase>package</phase> + <goals> + <goal>shade</goal> + </goals> + </execution> + </executions> + </plugin> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-surefire-plugin</artifactId> + <version>2.18.1</version> + <configuration> + <forkCount>1C</forkCount> + <workingDirectory>${project.build.directory}</workingDirectory> + </configuration> + </plugin> + <plugin> + <artifactId>maven-dependency-plugin</artifactId> + <version>2.10</version> + <executions> + <execution> + <id>copy-installed</id> + <phase>prepare-package</phase> + <goals> + <goal>list</goal> + </goals> + <configuration> + <outputFile>${project.build.outputDirectory}/dependency_list.txt</outputFile> + </configuration> + </execution> + </executions> + </plugin> + <plugin> + <groupId>net.alchim31.maven</groupId> + <artifactId>scala-maven-plugin</artifactId> + <version>3.2.0</version> + <executions> + <execution> + <id>scala-compile</id> + <goals> + <goal>compile</goal> + <goal>testCompile</goal> + </goals> + <configuration> + <args> + <arg>-dependencyfile</arg> + <arg>${project.build.directory}/.scala_dependencies</arg> + <arg>-deprecation</arg> + <arg>-feature</arg> + </args> + </configuration> + </execution> + </executions> + <!-- ... (see other usage or goals for details) ... --> + </plugin> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-jar-plugin</artifactId> + <version>2.5</version> + <executions> + <execution> + <goals> + <goal>test-jar</goal> + </goals> + </execution> + </executions> + <configuration> + <archive> + <manifest> + <addDefaultImplementationEntries>true</addDefaultImplementationEntries> + <addDefaultSpecificationEntries>true</addDefaultSpecificationEntries> + </manifest> + </archive> + </configuration> + </plugin> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-compiler-plugin</artifactId> + <version>2.3.2</version> + <configuration> + <showDeprecation>true</showDeprecation> + </configuration> + </plugin> + <plugin> + <groupId>org.scalariform</groupId> + <artifactId>scalariform-maven-plugin</artifactId> + <version>0.1.4</version> + <executions> + <execution> + <phase>process-sources</phase> + <goals> + <goal>format</goal> + </goals> + <configuration> + <rewriteArrowSymbols>false</rewriteArrowSymbols> + <alignParameters>true</alignParameters> + <alignSingleLineCaseStatements_maxArrowIndent>40 + </alignSingleLineCaseStatements_maxArrowIndent> + <alignSingleLineCaseStatements>true</alignSingleLineCaseStatements> + <compactStringConcatenation>false</compactStringConcatenation> + <compactControlReadability>false</compactControlReadability> + <doubleIndentClassDeclaration>false</doubleIndentClassDeclaration> + <formatXml>true</formatXml> + <indentLocalDefs>false</indentLocalDefs> + <indentPackageBlocks>true</indentPackageBlocks> + <indentSpaces>2</indentSpaces> + <placeScaladocAsterisksBeneathSecondAsterisk>false + </placeScaladocAsterisksBeneathSecondAsterisk> + <preserveDanglingCloseParenthesis>true</preserveDanglingCloseParenthesis> + <preserveSpaceBeforeArguments>false</preserveSpaceBeforeArguments> + <rewriteArrowSymbols>false</rewriteArrowSymbols> + <spaceBeforeColon>false</spaceBeforeColon> + <spaceInsideBrackets>false</spaceInsideBrackets> + <spaceInsideParentheses>false</spaceInsideParentheses> + <spacesWithinPatternBinders>true</spacesWithinPatternBinders> + </configuration> + </execution> + </executions> + </plugin> + <plugin> + <groupId>pl.project13.maven</groupId> + <artifactId>git-commit-id-plugin</artifactId> + <version>2.1.10</version> + <executions> + <execution> + <goals> + <goal>revision</goal> + </goals> + </execution> + </executions> + <configuration> + <prefix>git</prefix> + <dateFormat>dd.MM.yyyy '@' HH:mm:ss z</dateFormat> + <verbose>false</verbose> + <!-- TODO: This directory need to be changed depening where your .git folder is relative from this pom.xml --> + <dotGitDirectory>${basedir}/../.git</dotGitDirectory> + <useNativeGit>true</useNativeGit> + <skipPoms>false</skipPoms> + <generateGitPropertiesFile>true</generateGitPropertiesFile> + <generateGitPropertiesFilename>src/main/resources/git.properties</generateGitPropertiesFilename> + <failOnNoGitDirectory>false</failOnNoGitDirectory> + <abbrevLength>8</abbrevLength> + <skip>false</skip> + <gitDescribe> + <skip>false</skip> + <always>false</always> + <abbrev>8</abbrev> + <dirty>-dirty</dirty> + <forceLongFormat>false</forceLongFormat> + </gitDescribe> + </configuration> + </plugin> + <plugin> + <groupId>com.mycila</groupId> + <artifactId>license-maven-plugin</artifactId> + <version>2.6</version> + <configuration> + <excludes> + <exclude>**/*git*</exclude> + <exclude>**/License*</exclude> + <exclude>**/*.bam</exclude> + <exclude>**/*.bai</exclude> + <exclude>**/*.gtf</exclude> + <exclude>**/*.fq</exclude> + <exclude>**/*.sam</exclude> + <exclude>**/*.bed</exclude> + <exclude>**/*.refFlat</exclude> + <exclude>**/*.R</exclude> + <exclude>**/*.rscript</exclude> + </excludes> + </configuration> + </plugin> + <plugin> + <groupId>org.scoverage</groupId> + <artifactId>scoverage-maven-plugin</artifactId> + <version>${scoverage.plugin.version}</version> + <configuration> + <scalaVersion>2.10.2</scalaVersion> + <!-- other parameters --> + </configuration> + </plugin> + </plugins> + </build> + <reporting> + <plugins> + <plugin> + <groupId>org.scoverage</groupId> + <artifactId>scoverage-maven-plugin</artifactId> + <version>${scoverage.plugin.version}</version> + </plugin> + </plugins> + </reporting> +</project> \ No newline at end of file diff --git a/external-example/src/main/scala/org/example/group/ExecutableExample.scala b/external-example/src/main/scala/org/example/group/ExecutableExample.scala new file mode 100644 index 0000000000000000000000000000000000000000..fe0aaa13d57d5380f8eb14bf984145992ce297e8 --- /dev/null +++ b/external-example/src/main/scala/org/example/group/ExecutableExample.scala @@ -0,0 +1,19 @@ +package org.example.group + +import nl.lumc.sasc.biopet.utils.{ BiopetExecutable, MainCommand } + +/** + * Created by pjvanthof on 30/08/15. + */ +object ExecutableExample extends BiopetExecutable { + + /** This list defines the pipeline that are usable from the executable */ + def pipelines: List[MainCommand] = List( + org.example.group.pipelines.MultisamplePipeline, + org.example.group.pipelines.BiopetPipeline, + org.example.group.pipelines.SimplePipeline + ) + + /** This list defines the (biopet)tools that are usable from the executable */ + def tools: List[MainCommand] = Nil +} diff --git a/external-example/src/main/scala/org/example/group/pipelines/BiopetPipeline.scala b/external-example/src/main/scala/org/example/group/pipelines/BiopetPipeline.scala new file mode 100644 index 0000000000000000000000000000000000000000..6099047a6e5153c15df565d3e70179006ef4ceac --- /dev/null +++ b/external-example/src/main/scala/org/example/group/pipelines/BiopetPipeline.scala @@ -0,0 +1,47 @@ +package org.example.group.pipelines + +import nl.lumc.sasc.biopet.core.PipelineCommand +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.core.summary.SummaryQScript +import nl.lumc.sasc.biopet.pipelines.shiva.Shiva +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.queue.QScript + +/** + * Created by pjvan_thof on 8/28/15. + */ +//TODO: Replace class Name +class BiopetPipeline(val root: Configurable) extends QScript with SummaryQScript { + def this() = this(null) + + /** Only required when using [[SummaryQScript]] */ + def summaryFile = new File(outputDir, "magpie.summary.json") + + /** Only required when using [[SummaryQScript]] */ + def summaryFiles: Map[String, File] = Map() + + /** Only required when using [[SummaryQScript]] */ + def summarySettings = Map() + + // This method can be used to initialize some classes where needed + def init(): Unit = { + } + + // This method is the actual pipeline + def biopetScript: Unit = { + + // Executing a biopet pipeline inside + val shiva = new Shiva(this) + shiva.init() + shiva.biopetScript() + addAll(shiva.functions) + + /* Only required when using [[SummaryQScript]] */ + addSummaryQScript(shiva) + + // From here you can use the output files of shiva as input file of other jobs + } +} + +//TODO: Replace object Name, must be the same as the class of the pipeline +object BiopetPipeline extends PipelineCommand diff --git a/external-example/src/main/scala/org/example/group/pipelines/MultisamplePipeline.scala b/external-example/src/main/scala/org/example/group/pipelines/MultisamplePipeline.scala new file mode 100644 index 0000000000000000000000000000000000000000..ee66d89663c22868958720615a83c5d72f85e8f7 --- /dev/null +++ b/external-example/src/main/scala/org/example/group/pipelines/MultisamplePipeline.scala @@ -0,0 +1,64 @@ +package org.example.group.pipelines + +import nl.lumc.sasc.biopet.core.{ PipelineCommand, MultiSampleQScript } +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.queue.QScript + +/** + * Created by pjvanthof on 30/08/15. + */ +class MultisamplePipeline(val root: Configurable) extends QScript with MultiSampleQScript { + qscript => + def this() = this(null) + + def init: Unit = { + } + + def biopetScript: Unit = { + addSamplesJobs() // This executes jobs for all samples + } + + def addMultiSampleJobs: Unit = { + // this code will be executed after all code of all samples is executed + } + + def summaryFile: File = new File(outputDir, "MultisamplePipeline.summary.json") + + //TODO: Add summary + def summaryFiles: Map[String, File] = Map() + + //TODO: Add summary + def summarySettings: Map[String, Any] = Map() + + def makeSample(id: String) = new Sample(id) + class Sample(sampleId: String) extends AbstractSample(sampleId) { + + def makeLibrary(id: String) = new Library(id) + class Library(libId: String) extends AbstractLibrary(libId) { + //TODO: Add summary + def summaryFiles: Map[String, File] = Map() + + //TODO: Add summary + def summaryStats: Map[String, Any] = Map() + + def addJobs: Unit = { + //TODO: add library specific jobs + } + } + + //TODO: Add summary + def summaryFiles: Map[String, File] = Map() + + //TODO: Add summary + def summaryStats: Map[String, Any] = Map() + + def addJobs: Unit = { + addPerLibJobs() // This add jobs for each library + //TODO: add sample specific jobs + } + } + +} + +object MultisamplePipeline extends PipelineCommand \ No newline at end of file diff --git a/external-example/src/main/scala/org/example/group/pipelines/SimplePipeline.scala b/external-example/src/main/scala/org/example/group/pipelines/SimplePipeline.scala new file mode 100644 index 0000000000000000000000000000000000000000..f24b0f6152a03979d74c2f7337760cdc06c9e3be --- /dev/null +++ b/external-example/src/main/scala/org/example/group/pipelines/SimplePipeline.scala @@ -0,0 +1,38 @@ +package org.example.group.pipelines + +import nl.lumc.sasc.biopet.core.{ BiopetQScript, PipelineCommand } +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.extensions.{ Gzip, Cat } +import org.broadinstitute.gatk.queue.QScript + +/** + * Created by pjvanthof on 30/08/15. + */ +//TODO: Replace class name, must be the same as the class of the pipeline +class SimplePipeline(val root: Configurable) extends QScript with BiopetQScript { + // A constructor without arguments is needed if this pipeline is a root pipeline + def this() = this(null) + + @Input(required = true) + var inputFile: File = null + + /** This method can be used to initialize some classes where needed */ + def init(): Unit = { + } + + /** This method is the actual pipeline */ + def biopetScript: Unit = { + val cat = new Cat(this) + cat.input :+= inputFile + cat.output = new File(outputDir, "file.out") + add(cat) + + val gzip = new Gzip(this) + gzip.input :+= cat.output + gzip.output = new File(outputDir, "file.out.gz") + add(gzip) + } +} + +//TODO: Replace object name, must be the same as the class of the pipeline +object SimplePipeline extends PipelineCommand diff --git a/mkdocs.yml b/mkdocs.yml index b5cb8fdb5a2eb8564edf286f868faf57fc58ceca..871f41fbe83b44214aebc395eda59acd30bc5d5b 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -3,6 +3,7 @@ pages: - Home: 'index.md' - General: - Config: 'general/config.md' + - Requirements: 'general/requirements.md' - Pipelines: - Basty: 'pipelines/basty.md' - Bam2Wig: 'pipelines/bam2wig.md' @@ -38,4 +39,4 @@ pages: - License: 'license.md' #- ['developing/Setup.md', 'Developing', 'Setting up your local development environment'] #theme: readthedocs -repo_url: https://git.lumc.nl/biopet/biopet +repo_url: https://github.com/biopet/biopet diff --git a/pom.xml b/pom.xml index 0be8684de6d993fb943fe59061105d1831d82cdc..964da3f4aaf87824c6b9fade63d5bc607fe0e0aa 100644 --- a/pom.xml +++ b/pom.xml @@ -9,12 +9,14 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>public</relativePath> </parent> <modules> <module>public</module> <module>protected</module> + <module>external-example</module> + <!--<module>biopet-aggregate</module>--> </modules> </project> diff --git a/protected/biopet-gatk-extensions/pom.xml b/protected/biopet-gatk-extensions/pom.xml index 024f618a48ce58d5727cddad82937a950cb48a44..7560fbe54364e42f059b114298134fa300726cd2 100644 --- a/protected/biopet-gatk-extensions/pom.xml +++ b/protected/biopet-gatk-extensions/pom.xml @@ -15,7 +15,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>BiopetGatk</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -25,7 +25,7 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/AnalyzeCovariates.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/AnalyzeCovariates.scala index 4679a3f7ad5cb542820ca229a1aa64ec5800dd1f..277390751529e743644dce7a2d9396b8d10b1228 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/AnalyzeCovariates.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/AnalyzeCovariates.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class AnalyzeCovariates(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.AnalyzeCovariates with GatkGeneral { } diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/ApplyRecalibration.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/ApplyRecalibration.scala index fbf4ad4e79866354f5244277290f9c553e747dd4..8b8ea7d5a97ca2095c981b1b11a3bae6496583a8 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/ApplyRecalibration.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/ApplyRecalibration.scala @@ -7,15 +7,17 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class ApplyRecalibration(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.ApplyRecalibration with GatkGeneral { scatterCount = config("scattercount", default = 0) - override def beforeGraph() { - super.beforeGraph() + override val defaultThreads = 3 - nt = Option(getThreads(3)) + override def freezeFieldValues() { + super.freezeFieldValues() + + nt = Option(getThreads) memoryLimit = Option(nt.getOrElse(1) * 2) import org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorArgumentCollection.Mode diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/BaseRecalibrator.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/BaseRecalibrator.scala index 6c24a8e80431b131e870d70c677408cea05689b8..d51a28375b07aea9d3462f9d2b5f185fc2f4b629 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/BaseRecalibrator.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/BaseRecalibrator.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class BaseRecalibrator(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.BaseRecalibrator with GatkGeneral { if (config.contains("scattercount")) scatterCount = config("scattercount", default = 1) @@ -20,7 +20,6 @@ object BaseRecalibrator { val br = new BaseRecalibrator(root) br.input_file :+= input br.out = output - br.beforeGraph() br } } \ No newline at end of file diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/CombineGVCFs.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/CombineGVCFs.scala index 9384c20c379beffa41ab89b259ec1570f9b45645..138067f1679f3adbeae3d3dc366ea3ecf6355df6 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/CombineGVCFs.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/CombineGVCFs.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class CombineGVCFs(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.CombineGVCFs with GatkGeneral { if (config.contains("scattercount")) scatterCount = config("scattercount") diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/CombineVariants.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/CombineVariants.scala index 943f149926613be116a91ab57738540c89bbd447..b811327b9cb154277b3efb7487a1a2085ae4b9d4 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/CombineVariants.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/CombineVariants.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class CombineVariants(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.CombineVariants with GatkGeneral { if (config.contains("scattercount")) scatterCount = config("scattercount") diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/GatkGeneral.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/GatkGeneral.scala index 1d8de140b277d349095ae459f700ed5437888b16..f0073f0116b99a92fcace99ec1979a907e29eb14 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/GatkGeneral.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/GatkGeneral.scala @@ -5,32 +5,39 @@ */ package nl.lumc.sasc.biopet.extensions.gatk.broad -import nl.lumc.sasc.biopet.core.{ BiopetJavaCommandLineFunction, Reference } +import nl.lumc.sasc.biopet.core.{ CommandLineResources, Reference, BiopetJavaCommandLineFunction } +import org.broadinstitute.gatk.engine.phonehome.GATKRunReport import org.broadinstitute.gatk.queue.extensions.gatk.CommandLineGATK -trait GatkGeneral extends CommandLineGATK with BiopetJavaCommandLineFunction with Reference { +trait GatkGeneral extends CommandLineGATK with CommandLineResources with Reference { memoryLimit = Option(3) override def subPath = "gatk" :: super.subPath jarFile = config("gatk_jar") + reference_sequence = referenceFasta() + override def defaultCoreMemory = 4.0 override def faiRequired = true if (config.contains("intervals")) intervals = config("intervals").asFileList if (config.contains("exclude_intervals")) excludeIntervals = config("exclude_intervals").asFileList + + Option(config("et").value) match { + case Some("NO_ET") => et = GATKRunReport.PhoneHomeOption.NO_ET + case Some("AWS") => et = GATKRunReport.PhoneHomeOption.AWS + case Some("STDOUT") => et = GATKRunReport.PhoneHomeOption.STDOUT + case Some(x) => throw new IllegalArgumentException(s"Unknown et option for gatk: $x") + case _ => + } + if (config.contains("gatk_key")) gatk_key = config("gatk_key") if (config.contains("pedigree")) pedigree = config("pedigree") - override def versionRegex = """(.*)""".r - override def versionExitcode = List(0, 1) - override def versionCommand = executable + " -jar " + jarFile + " -version" - - override def getVersion = super.getVersion.collect { case v => "Gatk " + v } + //override def versionRegex = """(.*)""".r + //override def versionExitcode = List(0, 1) + //override def versionCommand = executable + " -jar " + jarFile + " -version" - override def beforeGraph(): Unit = { - super.beforeGraph() - if (reference_sequence == null) reference_sequence = referenceFasta() - } + //override def getVersion = super.getVersion.collect { case v => "Gatk " + v } } diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/GenotypeGVCFs.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/GenotypeGVCFs.scala index e6cfbbd98d20b73f37171c6133959bb065f2796b..5bd36585d1de2f62c50cb46cedacc8f3a9bcf16d 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/GenotypeGVCFs.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/GenotypeGVCFs.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class GenotypeGVCFs(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.GenotypeGVCFs with GatkGeneral { annotation ++= config("annotation", default = Seq("FisherStrand", "QualByDepth", "ChromosomeCounts")).asStringList diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/HaplotypeCaller.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/HaplotypeCaller.scala index b805baa356814f03828123d7730c77e0baaa1c10..514879f9d2b96088a6e5b3e30eb969d44740934b 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/HaplotypeCaller.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/HaplotypeCaller.scala @@ -5,10 +5,13 @@ */ package nl.lumc.sasc.biopet.extensions.gatk.broad -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.variant.GATKVCFIndexType class HaplotypeCaller(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.HaplotypeCaller with GatkGeneral { + + override val defaultThreads = 1 + min_mapping_quality_score = config("minMappingQualityScore", default = 20) scatterCount = config("scattercount", default = 1) if (config.contains("dbsnp")) this.dbsnp = config("dbsnp") @@ -35,13 +38,13 @@ class HaplotypeCaller(val root: Configurable) extends org.broadinstitute.gatk.qu stand_emit_conf = config("stand_emit_conf", default = 0) } - override def beforeGraph() { - super.beforeGraph() + override def freezeFieldValues() { + super.freezeFieldValues() if (bamOutput != null && nct.getOrElse(1) > 1) { - threads = 1 logger.warn("BamOutput is on, nct/threads is forced to set on 1, this option is only for debug") + nCoresRequest = Some(1) } - nct = Some(getThreads(1)) + nct = Some(getThreads) memoryLimit = Option(memoryLimit.getOrElse(2.0) * nct.getOrElse(1)) } diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/IndelRealigner.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/IndelRealigner.scala index 97105fbfff1fc6dbaadfd769d81f5fbdf11c95fe..44a3eac6607a6165920ab4d4564c9777119a1ca9 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/IndelRealigner.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/IndelRealigner.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class IndelRealigner(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.IndelRealigner with GatkGeneral { if (config.contains("scattercount")) scatterCount = config("scattercount") diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/PrintReads.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/PrintReads.scala index fc9e02b36a5bcf3d401ae660456505e7b682bdd5..554208c3af1d791c71d97a17087f02321657de8c 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/PrintReads.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/PrintReads.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class PrintReads(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.PrintReads with GatkGeneral { if (config.contains("scattercount")) scatterCount = config("scattercount") diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/RealignerTargetCreator.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/RealignerTargetCreator.scala index 66c688ebdaf09ff4cf9aa5c1fa190d75bf09ebc5..a884e83781227ce2b39b3f98573c3d89f8129d1b 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/RealignerTargetCreator.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/RealignerTargetCreator.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class RealignerTargetCreator(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.RealignerTargetCreator with GatkGeneral { if (config.contains("scattercount")) scatterCount = config("scattercount") diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/SelectVariants.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/SelectVariants.scala index f1f5631b8c1f248f4d452200671d370b097ac066..abb27c5fc34d73ab62ffae928e622d6cda64c4d9 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/SelectVariants.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/SelectVariants.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class SelectVariants(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.SelectVariants with GatkGeneral { if (config.contains("scattercount")) scatterCount = config("scattercount") diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/UnifiedGenotyper.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/UnifiedGenotyper.scala index 5da68c1d94bae7b27e6b6994095aa8a5391087a1..70d988f4b057572bc1c110ad597c232d6093a2cc 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/UnifiedGenotyper.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/UnifiedGenotyper.scala @@ -5,7 +5,7 @@ */ package nl.lumc.sasc.biopet.extensions.gatk.broad -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class UnifiedGenotyper(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.UnifiedGenotyper with GatkGeneral { if (config.contains("scattercount")) scatterCount = config("scattercount") @@ -26,11 +26,13 @@ class UnifiedGenotyper(val root: Configurable) extends org.broadinstitute.gatk.q } } - override def beforeGraph() { - super.beforeGraph() + override val defaultThreads = 1 + + override def freezeFieldValues() { + super.freezeFieldValues() genotype_likelihoods_model = org.broadinstitute.gatk.tools.walkers.genotyper.GenotypeLikelihoodsCalculationModel.Model.BOTH - nct = Some(getThreads(1)) + nct = Some(getThreads) memoryLimit = Option(nct.getOrElse(1) * memoryLimit.getOrElse(2.0)) } } diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantAnnotator.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantAnnotator.scala index 9aadadf49331f93d9b42ff3a1d9b35b7e9ddaffe..b26549622f2ee5131538baf8cc2882416bf3ae33 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantAnnotator.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantAnnotator.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class VariantAnnotator(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.VariantAnnotator with GatkGeneral { if (config.contains("scattercount")) scatterCount = config("scattercount") diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantEval.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantEval.scala index 13f5027a3f9546e58e9ee084ea5429511785331d..fdb6e7e27b08793f44fb94d82c5ceef0033d4ed5 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantEval.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantEval.scala @@ -7,12 +7,9 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable class VariantEval(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.VariantEval with GatkGeneral { - override def beforeGraph() { - super.beforeGraph() - } } object VariantEval { @@ -22,7 +19,6 @@ object VariantEval { vareval.eval = Seq(sample) vareval.comp = Seq(compareWith) vareval.out = output - vareval.beforeGraph() vareval } @@ -36,7 +32,6 @@ object VariantEval { vareval.ST = ST vareval.noEV = true vareval.EV = EV - vareval.beforeGraph() vareval } diff --git a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantRecalibrator.scala b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantRecalibrator.scala index d4240f01a1970d84dfd9a5e7a329e85ef670b8d6..11560ea25e216a49515b618f03bbb9604accc731 100644 --- a/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantRecalibrator.scala +++ b/protected/biopet-gatk-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/broad/VariantRecalibrator.scala @@ -7,11 +7,13 @@ package nl.lumc.sasc.biopet.extensions.gatk.broad import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.queue.extensions.gatk.TaggedFile class VariantRecalibrator(val root: Configurable) extends org.broadinstitute.gatk.queue.extensions.gatk.VariantRecalibrator with GatkGeneral { - nt = Option(getThreads(4)) + override val defaultThreads = 4 + + nt = Option(getThreads) memoryLimit = Option(nt.getOrElse(1) * 2) if (config.contains("dbsnp")) resource :+= new TaggedFile(config("dbsnp").asString, "known=true,training=false,truth=false,prior=2.0") diff --git a/protected/biopet-gatk-pipelines/pom.xml b/protected/biopet-gatk-pipelines/pom.xml index eb29d48b5ed228f17ecc3c8ace0b228c073f0b4b..90fbbf942bfd68c1b25a21174c60341634c78c92 100644 --- a/protected/biopet-gatk-pipelines/pom.xml +++ b/protected/biopet-gatk-pipelines/pom.xml @@ -15,7 +15,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>BiopetGatk</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -25,7 +25,7 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/Basty.scala b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/Basty.scala index ac17efd3c676b9fe5ed34bb2342fdf234347fec9..ecf4ccf901f8339db7da75c499bfae17a657eb16 100644 --- a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/Basty.scala +++ b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/Basty.scala @@ -6,7 +6,7 @@ package nl.lumc.sasc.biopet.pipelines.gatk import nl.lumc.sasc.biopet.core.PipelineCommand -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.pipelines.basty.BastyTrait import org.broadinstitute.gatk.queue.QScript diff --git a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkBenchmarkGenotyping.scala b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkBenchmarkGenotyping.scala index 47d0525a13447531024d6568f63d2dd7619d27f1..e489c4afdf4a30b8ca0b2a965242eae7811ad24c 100644 --- a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkBenchmarkGenotyping.scala +++ b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkBenchmarkGenotyping.scala @@ -5,7 +5,7 @@ */ package nl.lumc.sasc.biopet.pipelines.gatk -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ BiopetQScript, PipelineCommand } import org.broadinstitute.gatk.queue.QScript diff --git a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkGenotyping.scala b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkGenotyping.scala index 9c6abbb01bcc830a0ba0cf2ce946d2dd778c26cf..2f54cbbc70bc7c2666ef9c043017603c0b1c4b9f 100644 --- a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkGenotyping.scala +++ b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkGenotyping.scala @@ -5,7 +5,7 @@ */ package nl.lumc.sasc.biopet.pipelines.gatk -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ BiopetQScript, PipelineCommand } import nl.lumc.sasc.biopet.extensions.gatk.broad.{ GenotypeGVCFs, SelectVariants } import org.broadinstitute.gatk.queue.QScript diff --git a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkPipeline.scala b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkPipeline.scala index 9cde0a1f89ae94aeadebac6c58a5ac3887188ad8..3707ec2751cd21c82193f65c47281d294a764778 100644 --- a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkPipeline.scala +++ b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkPipeline.scala @@ -7,7 +7,7 @@ package nl.lumc.sasc.biopet.pipelines.gatk import htsjdk.samtools.SamReaderFactory import nl.lumc.sasc.biopet.core.{ MultiSampleQScript, PipelineCommand } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.SummaryQScript import nl.lumc.sasc.biopet.extensions.gatk.broad.{ CombineGVCFs, CombineVariants } import nl.lumc.sasc.biopet.extensions.picard.{ AddOrReplaceReadGroups, SamToFastq } diff --git a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkVariantRecalibration.scala b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkVariantRecalibration.scala index 232502bc7e43c17af0c0aea5b8f7e67569a4683e..772aa6887d7a7e4983059f84ea3ac7b455880c4a 100644 --- a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkVariantRecalibration.scala +++ b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkVariantRecalibration.scala @@ -6,7 +6,7 @@ package nl.lumc.sasc.biopet.pipelines.gatk import nl.lumc.sasc.biopet.core.{ BiopetQScript, PipelineCommand } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.extensions.gatk.broad.{ ApplyRecalibration, VariantAnnotator, VariantRecalibrator } import org.broadinstitute.gatk.queue.QScript diff --git a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkVariantcalling.scala b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkVariantcalling.scala index abc05e06cf8836e4d30954c07ed58071b975134b..750cbee21adea6c059d939aad518ca9aed0eb06e 100644 --- a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkVariantcalling.scala +++ b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/GatkVariantcalling.scala @@ -7,12 +7,12 @@ package nl.lumc.sasc.biopet.pipelines.gatk import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ BiopetQScript, PipelineCommand } import nl.lumc.sasc.biopet.extensions.Ln import nl.lumc.sasc.biopet.extensions.gatk.broad._ import nl.lumc.sasc.biopet.extensions.picard.MarkDuplicates -import nl.lumc.sasc.biopet.tools.{ MergeAlleles, MpileupToVcf, VcfFilter, VcfStats } +import nl.lumc.sasc.biopet.extensions.tools.{ MergeAlleles, MpileupToVcf, VcfFilter, VcfStats } import nl.lumc.sasc.biopet.utils.ConfigUtils import org.broadinstitute.gatk.queue.QScript import org.broadinstitute.gatk.queue.extensions.gatk.TaggedFile @@ -153,11 +153,11 @@ class GatkVariantcalling(val root: Configurable) extends QScript with BiopetQScr scriptOutput.rawVcfFile = m2v.output val vcfFilter = new VcfFilter(this) { - override def defaults = ConfigUtils.mergeMaps(Map("min_sample_depth" -> 8, + override def defaults = Map("min_sample_depth" -> 8, "min_alternate_depth" -> 2, "min_samples_pass" -> 1, "filter_ref_calls" -> true - ), super.defaults) + ) } vcfFilter.inputVcf = m2v.output vcfFilter.outputVcf = swapExt(outputDir, m2v.output, ".vcf", ".filter.vcf.gz") diff --git a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/Shiva.scala b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/Shiva.scala index 3ab885710ea89d177689640579924a3eb6b657e4..cf5aa84c5bf75623b78ab3ad696a3d75300bd7fb 100644 --- a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/Shiva.scala +++ b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/Shiva.scala @@ -6,7 +6,7 @@ package nl.lumc.sasc.biopet.pipelines.gatk import nl.lumc.sasc.biopet.core.PipelineCommand -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.extensions.gatk.broad._ import nl.lumc.sasc.biopet.pipelines.shiva.{ ShivaTrait, ShivaVariantcallingTrait } import org.broadinstitute.gatk.queue.QScript diff --git a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaVariantcalling.scala b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaVariantcalling.scala index 5981fd56aa6f22fb508e593aa740b87a48e0e7aa..1878d86fd150af451ab46fb69234ccbb66b05ec2 100644 --- a/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaVariantcalling.scala +++ b/protected/biopet-gatk-pipelines/src/main/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaVariantcalling.scala @@ -6,7 +6,7 @@ package nl.lumc.sasc.biopet.pipelines.gatk import nl.lumc.sasc.biopet.core.PipelineCommand -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.extensions.gatk.broad.GenotypeGVCFs import nl.lumc.sasc.biopet.pipelines.shiva.ShivaVariantcallingTrait import org.broadinstitute.gatk.queue.QScript diff --git a/protected/biopet-gatk-pipelines/src/test/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaTest.scala b/protected/biopet-gatk-pipelines/src/test/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaTest.scala index 634a3fb17945144093f26dbcd11427b65474f7b6..f1b326b7017b129eaf6edf0d44014af4f9507b7b 100644 --- a/protected/biopet-gatk-pipelines/src/test/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaTest.scala +++ b/protected/biopet-gatk-pipelines/src/test/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaTest.scala @@ -8,11 +8,11 @@ package nl.lumc.sasc.biopet.pipelines.gatk import java.io.{ File, FileOutputStream } import com.google.common.io.Files -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import nl.lumc.sasc.biopet.extensions.bwa.BwaMem import nl.lumc.sasc.biopet.extensions.gatk.broad._ import nl.lumc.sasc.biopet.extensions.picard.{ MarkDuplicates, SortSam } -import nl.lumc.sasc.biopet.tools.VcfStats +import nl.lumc.sasc.biopet.extensions.tools.VcfStats import nl.lumc.sasc.biopet.utils.ConfigUtils import org.broadinstitute.gatk.queue.QSettings import org.scalatest.Matchers @@ -39,31 +39,27 @@ class ShivaTest extends TestNGSuite with Matchers { val bool = Array(true, false) for ( - s1 <- bool; s2 <- bool; s3 <- bool; multi <- bool; single <- bool; - library <- bool; dbsnp <- bool; covariates <- bool; realign <- bool; baseRecalibration <- bool - ) yield Array("", s1, s2, s3, multi, single, library, dbsnp, covariates, realign, baseRecalibration) + s1 <- bool; s2 <- bool; multi <- bool; + dbsnp <- bool; realign <- bool; baseRecalibration <- bool + ) yield Array("", s1, s2, multi, dbsnp, realign, baseRecalibration) } @Test(dataProvider = "shivaOptions") - def testShiva(f: String, sample1: Boolean, sample2: Boolean, sample3: Boolean, - multi: Boolean, single: Boolean, library: Boolean, dbsnp: Boolean, - covariates: Boolean, realign: Boolean, baseRecalibration: Boolean): Unit = { + def testShiva(f: String, sample1: Boolean, sample2: Boolean, + multi: Boolean, dbsnp: Boolean, + realign: Boolean, baseRecalibration: Boolean): Unit = { val map = { var m: Map[String, Any] = ShivaTest.config if (sample1) m = ConfigUtils.mergeMaps(ShivaTest.sample1, m) if (sample2) m = ConfigUtils.mergeMaps(ShivaTest.sample2, m) - if (sample3) m = ConfigUtils.mergeMaps(ShivaTest.sample3, m) if (dbsnp) m = ConfigUtils.mergeMaps(Map("dbsnp" -> "test"), m) ConfigUtils.mergeMaps(Map("multisample_variantcalling" -> multi, - "single_sample_variantcalling" -> single, - "library_variantcalling" -> library, - "use_analyze_covariates" -> covariates, "use_indel_realigner" -> realign, "use_base_recalibration" -> baseRecalibration), m) } - if (!sample1 && !sample2 && !sample3) { // When no samples + if (!sample1 && !sample2) { // When no samples intercept[IllegalArgumentException] { initPipeline(map).script() } @@ -71,28 +67,30 @@ class ShivaTest extends TestNGSuite with Matchers { val pipeline = initPipeline(map) pipeline.script() - val numberLibs = (if (sample1) 1 else 0) + (if (sample2) 1 else 0) + (if (sample3) 2 else 0) - val numberSamples = (if (sample1) 1 else 0) + (if (sample2) 1 else 0) + (if (sample3) 1 else 0) + val numberLibs = (if (sample1) 1 else 0) + (if (sample2) 2 else 0) + val numberSamples = (if (sample1) 1 else 0) + (if (sample2) 1 else 0) - pipeline.functions.count(_.isInstanceOf[BwaMem]) shouldBe numberLibs - pipeline.functions.count(_.isInstanceOf[SortSam]) shouldBe numberLibs - pipeline.functions.count(_.isInstanceOf[MarkDuplicates]) shouldBe (numberLibs + (if (sample3) 1 else 0)) + pipeline.functions.count(_.isInstanceOf[MarkDuplicates]) shouldBe (numberLibs + (if (sample2) 1 else 0)) // Gatk preprocess - pipeline.functions.count(_.isInstanceOf[IndelRealigner]) shouldBe (numberLibs + (if (sample3) 1 else 0)) * (if (realign) 1 else 0) - pipeline.functions.count(_.isInstanceOf[RealignerTargetCreator]) shouldBe (numberLibs + (if (sample3) 1 else 0)) * (if (realign) 1 else 0) - pipeline.functions.count(_.isInstanceOf[BaseRecalibrator]) shouldBe (if (dbsnp && baseRecalibration) numberLibs else 0) * (if (covariates) 2 else 1) - pipeline.functions.count(_.isInstanceOf[AnalyzeCovariates]) shouldBe (if (dbsnp && covariates && baseRecalibration) numberLibs else 0) + pipeline.functions.count(_.isInstanceOf[IndelRealigner]) shouldBe (numberLibs * (if (realign) 1 else 0) + (if (sample2 && realign) 1 else 0)) + pipeline.functions.count(_.isInstanceOf[RealignerTargetCreator]) shouldBe (numberLibs * (if (realign) 1 else 0) + (if (sample2 && realign) 1 else 0)) + pipeline.functions.count(_.isInstanceOf[BaseRecalibrator]) shouldBe (if (dbsnp && baseRecalibration) numberLibs else 0) pipeline.functions.count(_.isInstanceOf[PrintReads]) shouldBe (if (dbsnp && baseRecalibration) numberLibs else 0) - pipeline.functions.count(_.isInstanceOf[VcfStats]) shouldBe (if (multi) 2 else 0) + - (if (single) numberSamples * 2 else 0) + (if (library) numberLibs * 2 else 0) + pipeline.functions.count(_.isInstanceOf[VcfStats]) shouldBe (if (multi) 2 else 0) } } } object ShivaTest { val outputDir = Files.createTempDir() + new File(outputDir, "input").mkdirs() + def inputTouch(name: String): String = { + val file = new File(outputDir, "input" + File.separator + name) + Files.touch(file) + file.getAbsolutePath + } private def copyFile(name: String): Unit = { val is = getClass.getResourceAsStream("/" + name) @@ -111,7 +109,6 @@ object ShivaTest { "dir" -> "test", "vep_script" -> "test", "output_dir" -> outputDir, - "reference" -> (outputDir + File.separator + "ref.fa"), "reference_fasta" -> (outputDir + File.separator + "ref.fa"), "gatk_jar" -> "test", "samtools" -> Map("exe" -> "test"), @@ -136,30 +133,21 @@ object ShivaTest { val sample1 = Map( "samples" -> Map("sample1" -> Map("libraries" -> Map( "lib1" -> Map( - "R1" -> "1_1_R1.fq", - "R2" -> "1_1_R2.fq" + "R1" -> inputTouch("1_1_R1.fq"), + "R2" -> inputTouch("1_1_R2.fq") ) ) ))) val sample2 = Map( - "samples" -> Map("sample2" -> Map("libraries" -> Map( - "lib1" -> Map( - "R1" -> "2_1_R1.fq", - "R2" -> "2_1_R2.fq" - ) - ) - ))) - - val sample3 = Map( "samples" -> Map("sample3" -> Map("libraries" -> Map( "lib1" -> Map( - "R1" -> "3_1_R1.fq", - "R2" -> "3_1_R2.fq" + "R1" -> inputTouch("2_1_R1.fq"), + "R2" -> inputTouch("2_1_R2.fq") ), "lib2" -> Map( - "R1" -> "3_2_R1.fq", - "R2" -> "3_2_R2.fq" + "R1" -> inputTouch("2_2_R1.fq"), + "R2" -> inputTouch("2_2_R2.fq") ) ) ))) diff --git a/protected/biopet-gatk-pipelines/src/test/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaVariantcallingTest.scala b/protected/biopet-gatk-pipelines/src/test/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaVariantcallingTest.scala index 1609f5c31bb4a6b648a9a5aa4bb5a7de531e4714..1740b33ac108a4c20205973a2410aabbeb4dbd83 100644 --- a/protected/biopet-gatk-pipelines/src/test/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaVariantcallingTest.scala +++ b/protected/biopet-gatk-pipelines/src/test/scala/nl/lumc/sasc/biopet/pipelines/gatk/ShivaVariantcallingTest.scala @@ -8,10 +8,10 @@ package nl.lumc.sasc.biopet.pipelines.gatk import java.io.{ File, FileOutputStream } import com.google.common.io.Files -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import nl.lumc.sasc.biopet.extensions.gatk.CombineVariants import nl.lumc.sasc.biopet.extensions.gatk.broad.{ HaplotypeCaller, UnifiedGenotyper } -import nl.lumc.sasc.biopet.tools.{ MpileupToVcf, VcfFilter, VcfStats } +import nl.lumc.sasc.biopet.extensions.tools.{ MpileupToVcf, VcfFilter, VcfStats } import nl.lumc.sasc.biopet.utils.ConfigUtils import org.apache.commons.io.FileUtils import org.broadinstitute.gatk.queue.QSettings @@ -73,7 +73,7 @@ class ShivaVariantcallingTest extends TestNGSuite with Matchers { val map = Map("variantcallers" -> callers.toList) val pipeline = initPipeline(map) - pipeline.inputBams = (for (n <- 1 to bams) yield new File("bam_" + n + ".bam")).toList + pipeline.inputBams = (for (n <- 1 to bams) yield ShivaVariantcallingTest.inputTouch("bam_" + n + ".bam")).toList val illegalArgumentException = pipeline.inputBams.isEmpty || (!raw && !bcftools && @@ -107,6 +107,12 @@ class ShivaVariantcallingTest extends TestNGSuite with Matchers { object ShivaVariantcallingTest { val outputDir = Files.createTempDir() + new File(outputDir, "input").mkdirs() + def inputTouch(name: String): File = { + val file = new File(outputDir, "input" + File.separator + name).getAbsoluteFile + Files.touch(file) + file + } private def copyFile(name: String): Unit = { val is = getClass.getResourceAsStream("/" + name) @@ -125,7 +131,6 @@ object ShivaVariantcallingTest { "cache" -> true, "dir" -> "test", "vep_script" -> "test", - "reference" -> (outputDir + File.separator + "ref.fa"), "reference_fasta" -> (outputDir + File.separator + "ref.fa"), "gatk_jar" -> "test", "samtools" -> Map("exe" -> "test"), diff --git a/protected/biopet-protected-package/pom.xml b/protected/biopet-protected-package/pom.xml index 83fc4124d421ec508631f3eefd17bb36186bbc01..c5d31a083c6e328bae807e9c1001510689a25d67 100644 --- a/protected/biopet-protected-package/pom.xml +++ b/protected/biopet-protected-package/pom.xml @@ -15,7 +15,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>BiopetGatk</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -24,13 +24,13 @@ <properties> <sting.shade.phase>package</sting.shade.phase> - <app.main.class>nl.lumc.sasc.biopet.core.BiopetExecutableProtected</app.main.class> + <app.main.class>nl.lumc.sasc.biopet.BiopetExecutableProtected</app.main.class> </properties> <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/protected/biopet-protected-package/src/main/scala/nl/lumc/sasc/biopet/core/BiopetExecutableProtected.scala b/protected/biopet-protected-package/src/main/scala/nl/lumc/sasc/biopet/BiopetExecutableProtected.scala similarity index 73% rename from protected/biopet-protected-package/src/main/scala/nl/lumc/sasc/biopet/core/BiopetExecutableProtected.scala rename to protected/biopet-protected-package/src/main/scala/nl/lumc/sasc/biopet/BiopetExecutableProtected.scala index 7b22399d3c33f1c71883b30c5aaeb7150324b4f5..9155e7dbacf4fd624694ac06a1ddf24c69071afe 100644 --- a/protected/biopet-protected-package/src/main/scala/nl/lumc/sasc/biopet/core/BiopetExecutableProtected.scala +++ b/protected/biopet-protected-package/src/main/scala/nl/lumc/sasc/biopet/BiopetExecutableProtected.scala @@ -3,10 +3,12 @@ * LUMC. Please refer to https://git.lumc.nl/biopet/biopet/wikis/home for instructions * on how to use this protected part of biopet or contact us at sasc@lumc.nl */ -package nl.lumc.sasc.biopet.core +package nl.lumc.sasc.biopet + +import nl.lumc.sasc.biopet.utils.{ BiopetExecutable, MainCommand } object BiopetExecutableProtected extends BiopetExecutable { - def pipelines: List[MainCommand] = BiopetExecutablePublic.pipelines ::: List( + def pipelines: List[MainCommand] = BiopetExecutablePublic.publicPipelines ::: List( nl.lumc.sasc.biopet.pipelines.gatk.Shiva, nl.lumc.sasc.biopet.pipelines.gatk.ShivaVariantcalling, nl.lumc.sasc.biopet.pipelines.gatk.Basty) diff --git a/protected/pom.xml b/protected/pom.xml index 66da8ccb8fcaed65c0417f920fdca204d8df1e24..15bbc1b4af960f9f238608ce50f6361df86b1e42 100644 --- a/protected/pom.xml +++ b/protected/pom.xml @@ -11,7 +11,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>BiopetRoot</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> <artifactId>BiopetGatk</artifactId> diff --git a/public/bam2wig/pom.xml b/public/bam2wig/pom.xml index 2df9f9bd3df430d75dcece7f9dfbc6ff04292756..4ee5fd2681e87e9a999c4cfad22413dbd67a0bab 100644 --- a/public/bam2wig/pom.xml +++ b/public/bam2wig/pom.xml @@ -27,7 +27,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -37,7 +37,12 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetExtensions</artifactId> <version>${project.version}</version> </dependency> </dependencies> diff --git a/public/bam2wig/src/main/scala/nl/lumc/sasc/biopet/pipelines/bamtobigwig/Bam2Wig.scala b/public/bam2wig/src/main/scala/nl/lumc/sasc/biopet/pipelines/bamtobigwig/Bam2Wig.scala index 50c44889853bfc684d84bb1082cfca38e3fa14cb..fb9e39611fd861f9158503581f17d4bd12e6cb30 100644 --- a/public/bam2wig/src/main/scala/nl/lumc/sasc/biopet/pipelines/bamtobigwig/Bam2Wig.scala +++ b/public/bam2wig/src/main/scala/nl/lumc/sasc/biopet/pipelines/bamtobigwig/Bam2Wig.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.pipelines.bamtobigwig import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ BiopetQScript, PipelineCommand } import nl.lumc.sasc.biopet.extensions.WigToBigWig import nl.lumc.sasc.biopet.extensions.igvtools.IGVToolsCount @@ -35,6 +35,7 @@ class Bam2Wig(val root: Configurable) extends QScript with BiopetQScript { var bamFile: File = null def init(): Unit = { + inputFiles :+= new InputFile(bamFile) } def biopetScript(): Unit = { diff --git a/public/bam2wig/src/main/scala/nl/lumc/sasc/biopet/pipelines/bamtobigwig/BamToChromSizes.scala b/public/bam2wig/src/main/scala/nl/lumc/sasc/biopet/pipelines/bamtobigwig/BamToChromSizes.scala index 405fdbbb00f1bcbbeeb7286a7b2edfd6dbadb0e0..6e9d6f658042d9179bf868ace85e234135adf4fc 100644 --- a/public/bam2wig/src/main/scala/nl/lumc/sasc/biopet/pipelines/bamtobigwig/BamToChromSizes.scala +++ b/public/bam2wig/src/main/scala/nl/lumc/sasc/biopet/pipelines/bamtobigwig/BamToChromSizes.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.pipelines.bamtobigwig import java.io.{ File, PrintWriter } import htsjdk.samtools.SamReaderFactory -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.queue.function.InProcessFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } diff --git a/public/bammetrics/pom.xml b/public/bammetrics/pom.xml index 94e6d79619d43bcd289cef4d950303fdb1994fe9..00a9b094e48a83b83105ce348060d94743a4d75c 100644 --- a/public/bammetrics/pom.xml +++ b/public/bammetrics/pom.xml @@ -25,7 +25,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -35,7 +35,12 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetToolsExtensions</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/alignmentSummary.ssp b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/alignmentSummary.ssp index 19daa26e625d0445459f26e50a7f5e8918e191d9..e2ad934d400e15283a8e94c428171e3e777e2172 100644 --- a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/alignmentSummary.ssp +++ b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/alignmentSummary.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(nl.lumc.sasc.biopet.pipelines.bammetrics.BammetricsReport) #import(java.io.File) diff --git a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bamMetricsFront.ssp b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bamMetricsFront.ssp index 4967f648483b5fbd68d93387ca6f6c8bcf7402e3..2d871d2663f5c17e0a1ea66e3434d267c01104d8 100644 --- a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bamMetricsFront.ssp +++ b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bamMetricsFront.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) <%@ var summary: Summary %> <%@ var rootPath: String %> diff --git a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bamStats.ssp b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bamStats.ssp index c6ea6f5db5084bc3644cd6dacd22728821b024be..aab83e541800b520b4b871689536aff6fba84d0c 100644 --- a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bamStats.ssp +++ b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bamStats.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) <%@ var summary: Summary %> <%@ var sampleId: Option[String] %> diff --git a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bammetricsInputFile.ssp b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bammetricsInputFile.ssp index 6497546891055ccdc882cbf2be5b609a7c1ae229..fd7820273f6164034df2f97f8d8d871fb0647be4 100644 --- a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bammetricsInputFile.ssp +++ b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/bammetricsInputFile.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(java.io.File) <%@ var summary: Summary %> diff --git a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/covstatsMultiTable.ssp b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/covstatsMultiTable.ssp index 294b7ba010b36d9673848d91f13d46b376fdefaa..f1bae9632f6bb046360496b0bf924515b791f794 100644 --- a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/covstatsMultiTable.ssp +++ b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/covstatsMultiTable.ssp @@ -1,5 +1,5 @@ #import(nl.lumc.sasc.biopet.utils.IoUtils) -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(org.apache.commons.io.FileUtils) #import(java.io.File) diff --git a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/covstatsPlot.ssp b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/covstatsPlot.ssp index ddb6817dd9e8dc90345881517637cbac53fcea56..8aea9602a65fd2282e38909b170722151ac13603 100644 --- a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/covstatsPlot.ssp +++ b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/covstatsPlot.ssp @@ -1,5 +1,5 @@ #import(nl.lumc.sasc.biopet.utils.IoUtils) -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(org.apache.commons.io.FileUtils) #import(java.io.File) diff --git a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/insertSize.ssp b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/insertSize.ssp index c5534f78cbeec347dc6bcbdf71fbceb70bba2c02..ba6692dd7af137c57e51d023438995f136d2e477 100644 --- a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/insertSize.ssp +++ b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/insertSize.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(nl.lumc.sasc.biopet.pipelines.bammetrics.BammetricsReport) #import(java.io.File) @@ -42,7 +42,7 @@ #end #if (showPlot) - #{ BammetricsReport.insertSizePlot(outputDir, "insertsize", summary, !sampleLevel, sampleId = sampleId) }# + #{ BammetricsReport.insertSizePlot(outputDir, "insertsize", summary, !sampleLevel, sampleId = sampleId, libId = libId) }# <div class="panel-body"> <img src="insertsize.png" class="img-responsive" /> diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/scripts/bedtools_cov_stats.py b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/scripts/bedtools_cov_stats.py similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/scripts/bedtools_cov_stats.py rename to public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/scripts/bedtools_cov_stats.py diff --git a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/wgsHistogram.ssp b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/wgsHistogram.ssp index a0af2ced54e981db95952165d5053ee575367237..e900774e0788d03fc5180444fbbb6bd875ad253e 100644 --- a/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/wgsHistogram.ssp +++ b/public/bammetrics/src/main/resources/nl/lumc/sasc/biopet/pipelines/bammetrics/wgsHistogram.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(nl.lumc.sasc.biopet.pipelines.bammetrics.BammetricsReport) #import(java.io.File) @@ -36,7 +36,7 @@ #end #if (showPlot) - #{ BammetricsReport.wgsHistogramPlot(outputDir, "wgs", summary, !sampleLevel, sampleId = sampleId) }# + #{ BammetricsReport.wgsHistogramPlot(outputDir, "wgs", summary, !sampleLevel, sampleId = sampleId, libId = libId) }# <div class="panel-body"> <img src="wgs.png" class="img-responsive" /> diff --git a/public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BamMetrics.scala b/public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BamMetrics.scala index bd546b41f3e24f8b9ad3e74cc7d547eeede3778b..f6a6dc090defdb238f76b68f21189faf38fd207c 100644 --- a/public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BamMetrics.scala +++ b/public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BamMetrics.scala @@ -17,14 +17,14 @@ package nl.lumc.sasc.biopet.pipelines.bammetrics import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.SummaryQScript import nl.lumc.sasc.biopet.core.{ PipelineCommand, SampleLibraryTag } import nl.lumc.sasc.biopet.extensions.bedtools.{ BedtoolsCoverage, BedtoolsIntersect } import nl.lumc.sasc.biopet.extensions.picard._ import nl.lumc.sasc.biopet.extensions.samtools.SamtoolsFlagstat -import nl.lumc.sasc.biopet.scripts.CoverageStats -import nl.lumc.sasc.biopet.tools.BiopetFlagstat +import nl.lumc.sasc.biopet.pipelines.bammetrics.scripts.CoverageStats +import nl.lumc.sasc.biopet.extensions.tools.BiopetFlagstat import org.broadinstitute.gatk.queue.QScript class BamMetrics(val root: Configurable) extends QScript with SummaryQScript with SampleLibraryTag { @@ -71,7 +71,8 @@ class BamMetrics(val root: Configurable) extends QScript with SummaryQScript wit } /** executed before script */ - def init() { + def init(): Unit = { + inputFiles :+= new InputFile(inputBam) } /** Script to add jobs */ @@ -186,8 +187,13 @@ class BamMetrics(val root: Configurable) extends QScript with SummaryQScript wit object BamMetrics extends PipelineCommand { /** Make default implementation of BamMetrics and runs script already */ - def apply(root: Configurable, bamFile: File, outputDir: File): BamMetrics = { + def apply(root: Configurable, + bamFile: File, outputDir: File, + sampleId: Option[String] = None, + libId: Option[String] = None): BamMetrics = { val bamMetrics = new BamMetrics(root) + bamMetrics.sampleId = sampleId + bamMetrics.libId = libId bamMetrics.inputBam = bamFile bamMetrics.outputDir = outputDir diff --git a/public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BammetricsReport.scala b/public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BammetricsReport.scala index 395eebf1a615b0b8a1e5fc3405bb294d8b680475..babc43f721b9a1bd4d03f6930a97e4cb454f4187 100644 --- a/public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BammetricsReport.scala +++ b/public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BammetricsReport.scala @@ -17,10 +17,10 @@ package nl.lumc.sasc.biopet.pipelines.bammetrics import java.io.{ File, PrintWriter } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.report.{ ReportBuilderExtension, ReportBuilder, ReportPage, ReportSection } -import nl.lumc.sasc.biopet.core.summary.{ Summary, SummaryValue } -import nl.lumc.sasc.biopet.extensions.rscript.{ StackedBarPlot, LinePlot } +import nl.lumc.sasc.biopet.utils.summary.{ Summary, SummaryValue } +import nl.lumc.sasc.biopet.utils.rscript.{ StackedBarPlot, LinePlot } class BammetricsReport(val root: Configurable) extends ReportBuilderExtension { val builder = BammetricsReport @@ -156,14 +156,15 @@ object BammetricsReport extends ReportBuilder { prefix: String, summary: Summary, libraryLevel: Boolean = false, - sampleId: Option[String] = None): Unit = { + sampleId: Option[String] = None, + libId: Option[String] = None): Unit = { val tsvFile = new File(outputDir, prefix + ".tsv") val pngFile = new File(outputDir, prefix + ".png") val tsvWriter = new PrintWriter(tsvFile) if (libraryLevel) { tsvWriter.println((for ( sample <- summary.samples if sampleId.isEmpty || sampleId.get == sample; - lib <- summary.libraries(sample) + lib <- summary.libraries(sample) if libId.isEmpty || libId.get == lib ) yield s"$sample-$lib") .mkString("library\t", "\t", "")) } else { @@ -198,7 +199,7 @@ object BammetricsReport extends ReportBuilder { if (libraryLevel) { for ( sample <- summary.samples if sampleId.isEmpty || sampleId.get == sample; - lib <- summary.libraries(sample) + lib <- summary.libraries(sample) if libId.isEmpty || libId.get == lib ) fill(sample, Some(lib)) } else if (sampleId.isDefined) fill(sampleId.get, None) else summary.samples.foreach(fill(_, None)) @@ -208,7 +209,7 @@ object BammetricsReport extends ReportBuilder { if (libraryLevel) { for ( sample <- summary.samples if sampleId.isEmpty || sampleId.get == sample; - lib <- summary.libraries(sample) + lib <- summary.libraries(sample) if libId.isEmpty || libId.get == lib ) tsvWriter.print("\t" + counts.getOrElse(s"$sample-$lib", "0")) } else { for (sample <- summary.samples if sampleId.isEmpty || sampleId.get == sample) { @@ -243,14 +244,15 @@ object BammetricsReport extends ReportBuilder { prefix: String, summary: Summary, libraryLevel: Boolean = false, - sampleId: Option[String] = None): Unit = { + sampleId: Option[String] = None, + libId: Option[String] = None): Unit = { val tsvFile = new File(outputDir, prefix + ".tsv") val pngFile = new File(outputDir, prefix + ".png") val tsvWriter = new PrintWriter(tsvFile) if (libraryLevel) { tsvWriter.println((for ( sample <- summary.samples if sampleId.isEmpty || sampleId.get == sample; - lib <- summary.libraries(sample) + lib <- summary.libraries(sample) if libId.isEmpty || libId.get == lib ) yield s"$sample-$lib") .mkString("library\t", "\t", "")) } else { @@ -285,7 +287,7 @@ object BammetricsReport extends ReportBuilder { if (libraryLevel) { for ( sample <- summary.samples if sampleId.isEmpty || sampleId.get == sample; - lib <- summary.libraries(sample) + lib <- summary.libraries(sample) if libId.isEmpty || libId.get == lib ) fill(sample, Some(lib)) } else if (sampleId.isDefined) fill(sampleId.get, None) else summary.samples.foreach(fill(_, None)) @@ -295,8 +297,10 @@ object BammetricsReport extends ReportBuilder { if (libraryLevel) { for ( sample <- summary.samples if sampleId.isEmpty || sampleId.get == sample; - lib <- summary.libraries(sample) - ) tsvWriter.print("\t" + counts.getOrElse(s"$sample-$lib", "0")) + lib <- summary.libraries(sample) if libId.isEmpty || libId.get == lib + ) { + tsvWriter.print("\t" + counts.getOrElse(s"$sample-$lib", "0")) + } } else { for (sample <- summary.samples if sampleId.isEmpty || sampleId.get == sample) { tsvWriter.print("\t" + counts.getOrElse(sample, "0")) diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/scripts/CoverageStats.scala b/public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/scripts/CoverageStats.scala similarity index 91% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/scripts/CoverageStats.scala rename to public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/scripts/CoverageStats.scala index 1ca92e3c62fcfe34ac8b289cdc50a93c8fbb0829..212724776c3dad98a1c514d17d6c66623c169127 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/scripts/CoverageStats.scala +++ b/public/bammetrics/src/main/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/scripts/CoverageStats.scala @@ -13,13 +13,13 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.scripts +package nl.lumc.sasc.biopet.pipelines.bammetrics.scripts import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.core.extensions.PythonCommandLineFunction +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable -import nl.lumc.sasc.biopet.extensions.PythonCommandLineFunction import nl.lumc.sasc.biopet.utils.ConfigUtils import org.broadinstitute.gatk.utils.commandline.{ Input, Output } diff --git a/public/bammetrics/src/test/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BamMetricsTest.scala b/public/bammetrics/src/test/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BamMetricsTest.scala index d457009d9f3384f6278fd7702ee71eea8924531d..33304cb198807f46d3801ddfc9b05eaa7d555fe4 100644 --- a/public/bammetrics/src/test/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BamMetricsTest.scala +++ b/public/bammetrics/src/test/scala/nl/lumc/sasc/biopet/pipelines/bammetrics/BamMetricsTest.scala @@ -18,12 +18,12 @@ package nl.lumc.sasc.biopet.pipelines.bammetrics import java.io.{ File, FileOutputStream } import com.google.common.io.Files -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import nl.lumc.sasc.biopet.extensions.bedtools.{ BedtoolsCoverage, BedtoolsIntersect } import nl.lumc.sasc.biopet.extensions.picard._ import nl.lumc.sasc.biopet.extensions.samtools.SamtoolsFlagstat -import nl.lumc.sasc.biopet.scripts.CoverageStats -import nl.lumc.sasc.biopet.tools.BiopetFlagstat +import nl.lumc.sasc.biopet.pipelines.bammetrics.scripts.CoverageStats +import nl.lumc.sasc.biopet.extensions.tools.BiopetFlagstat import nl.lumc.sasc.biopet.utils.ConfigUtils import org.apache.commons.io.FileUtils import org.broadinstitute.gatk.queue.QSettings @@ -69,7 +69,7 @@ class BamMetricsTest extends TestNGSuite with Matchers { Map("regions_of_interest" -> (1 to rois).map("roi_" + _ + ".bed").toList) val bammetrics: BamMetrics = initPipeline(map) - bammetrics.inputBam = new File("input.bam") + bammetrics.inputBam = BamMetricsTest.bam bammetrics.sampleId = Some("1") bammetrics.libId = Some("1") bammetrics.script() @@ -98,6 +98,10 @@ class BamMetricsTest extends TestNGSuite with Matchers { object BamMetricsTest { val outputDir = Files.createTempDir() + new File(outputDir, "input").mkdirs() + + val bam = new File(outputDir, "input" + File.separator + "bla.bam") + Files.touch(bam) private def copyFile(name: String): Unit = { val is = getClass.getResourceAsStream("/" + name) diff --git a/public/basty/pom.xml b/public/basty/pom.xml index 22ab5950db4975e372b08312db59ecdac3426c67..4dd6e5ef31827ea51eb1fc1c6976404570a9cd19 100644 --- a/public/basty/pom.xml +++ b/public/basty/pom.xml @@ -32,7 +32,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -42,7 +42,7 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/public/basty/src/main/scala/nl/lumc/sasc/biopet/pipelines/basty/Basty.scala b/public/basty/src/main/scala/nl/lumc/sasc/biopet/pipelines/basty/Basty.scala index 54fa824fc898e6cde1bc8ee1fecdf5e7ad40a50e..8476d1bbc56270b2da0d1df6eae20e19a6886bcc 100644 --- a/public/basty/src/main/scala/nl/lumc/sasc/biopet/pipelines/basty/Basty.scala +++ b/public/basty/src/main/scala/nl/lumc/sasc/biopet/pipelines/basty/Basty.scala @@ -16,7 +16,7 @@ package nl.lumc.sasc.biopet.pipelines.basty import nl.lumc.sasc.biopet.core.PipelineCommand -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.queue.QScript /** diff --git a/public/basty/src/main/scala/nl/lumc/sasc/biopet/pipelines/basty/BastyTrait.scala b/public/basty/src/main/scala/nl/lumc/sasc/biopet/pipelines/basty/BastyTrait.scala index d931860744eff28c2e730071af614ab37a0dfb11..d9085369f0316b6a0474fe02cd15ef07c683713a 100644 --- a/public/basty/src/main/scala/nl/lumc/sasc/biopet/pipelines/basty/BastyTrait.scala +++ b/public/basty/src/main/scala/nl/lumc/sasc/biopet/pipelines/basty/BastyTrait.scala @@ -25,7 +25,7 @@ import java.io.File import nl.lumc.sasc.biopet.core.MultiSampleQScript import nl.lumc.sasc.biopet.extensions.{ Cat, Raxml, RunGubbins } import nl.lumc.sasc.biopet.pipelines.shiva.{ Shiva, ShivaTrait } -import nl.lumc.sasc.biopet.tools.BastyGenerateFasta +import nl.lumc.sasc.biopet.extensions.tools.BastyGenerateFasta import nl.lumc.sasc.biopet.utils.ConfigUtils trait BastyTrait extends MultiSampleQScript { @@ -35,10 +35,10 @@ trait BastyTrait extends MultiSampleQScript { def variantcallers = List("freebayes") - override def defaults = ConfigUtils.mergeMaps(Map( + override def defaults = Map( "ploidy" -> 1, "variantcallers" -> variantcallers - ), super.defaults) + ) lazy val shiva: ShivaTrait = new Shiva(qscript) @@ -89,6 +89,8 @@ trait BastyTrait extends MultiSampleQScript { addAll(shiva.functions) addSummaryQScript(shiva) + inputFiles :::= shiva.inputFiles + addSamplesJobs() } @@ -135,7 +137,6 @@ trait BastyTrait extends MultiSampleQScript { val numBoot = config("boot_runs", default = 100, submodule = "raxml").asInt val bootList = for (t <- 0 until numBoot) yield { val raxmlBoot = new Raxml(this) - raxmlBoot.threads = 1 raxmlBoot.input = variants raxmlBoot.m = config("raxml_ml_model", default = "GTRGAMMAX") raxmlBoot.p = Some(seed) diff --git a/public/biopet-core/pom.xml b/public/biopet-core/pom.xml new file mode 100644 index 0000000000000000000000000000000000000000..1b38b9f501d68479a60e52d40418367dd44358b3 --- /dev/null +++ b/public/biopet-core/pom.xml @@ -0,0 +1,63 @@ +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <parent> + <artifactId>Biopet</artifactId> + <groupId>nl.lumc.sasc</groupId> + <version>0.5.0-SNAPSHOT</version> + <relativePath>../</relativePath> + </parent> + <modelVersion>4.0.0</modelVersion> + + <artifactId>BiopetCore</artifactId> + <packaging>jar</packaging> + + <dependencies> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetUtils</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>org.testng</groupId> + <artifactId>testng</artifactId> + <version>6.8</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.mockito</groupId> + <artifactId>mockito-all</artifactId> + <version>1.9.5</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.scalatest</groupId> + <artifactId>scalatest_2.10</artifactId> + <version>2.2.1</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.broadinstitute.gatk</groupId> + <artifactId>gatk-queue</artifactId> + <version>3.4</version> + <exclusions> + <exclusion> + <groupId>org.broadinstitute.gatk</groupId> + <artifactId>gsalib</artifactId> + </exclusion> + </exclusions> + </dependency> + <dependency> + <groupId>org.broadinstitute.gatk</groupId> + <artifactId>gatk-queue-extensions-public</artifactId> + <version>3.4</version> + </dependency> + <dependency> + <groupId>org.scalatra.scalate</groupId> + <artifactId>scalate-core_2.10</artifactId> + <version>1.7.0</version> + </dependency> + </dependencies> + +</project> \ No newline at end of file diff --git a/public/biopet-framework/src/main/resources/log4j.properties b/public/biopet-core/src/main/resources/log4j.properties similarity index 100% rename from public/biopet-framework/src/main/resources/log4j.properties rename to public/biopet-core/src/main/resources/log4j.properties diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/License.txt b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/License.txt similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/License.txt rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/License.txt diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/executables.ssp b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/executables.ssp similarity index 95% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/executables.ssp rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/executables.ssp index e3eaba475acffa68c38293624e09e436e8976f62..0dcb32c20659700440cef5f214a6d0f7e6e5a0c4 100644 --- a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/executables.ssp +++ b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/executables.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) <%@ var summary: Summary %> <%@ var rootPath: String %> diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap-theme.min.css b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap-theme.min.css similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap-theme.min.css rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap-theme.min.css diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap.min.css b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap.min.css similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap.min.css rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap.min.css diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap_dashboard.css b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap_dashboard.css similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap_dashboard.css rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/bootstrap_dashboard.css diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/sortable-theme-bootstrap.css b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/sortable-theme-bootstrap.css similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/sortable-theme-bootstrap.css rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/css/sortable-theme-bootstrap.css diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.ttf b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.ttf similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.ttf rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.ttf diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.woff b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.woff similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.woff rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.woff diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.woff2 b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.woff2 similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.woff2 rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/fonts/glyphicons-halflings-regular.woff2 diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/bootstrap.min.js b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/bootstrap.min.js similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/bootstrap.min.js rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/bootstrap.min.js diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/jquery.min.js b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/jquery.min.js similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/jquery.min.js rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/jquery.min.js diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/sortable.min.js b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/sortable.min.js similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/sortable.min.js rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/ext/js/sortable.min.js diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/librariesList.ssp b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/librariesList.ssp similarity index 88% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/librariesList.ssp rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/librariesList.ssp index 49e7a18980f7efe03e57b0b4af763e4d90e845ea..442958b2c5ca531b87d40dedde06b112e7f7bec4 100644 --- a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/librariesList.ssp +++ b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/librariesList.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) <%@ var summary: Summary %> <%@ var rootPath: String %> diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/main.ssp b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/main.ssp similarity index 99% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/main.ssp rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/main.ssp index fe56a9907bb51a322d382f7f9d6a957479fe6c50..b706ac45804afd61411115fbd47ca93bfbaa1fb9 100644 --- a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/main.ssp +++ b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/main.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) <%@ var summary: Summary %> <%@ var indexPage: ReportPage %> diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/reference.ssp b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/reference.ssp similarity index 95% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/reference.ssp rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/reference.ssp index 530d217358475f76461b79fc1fea83aee2746fc2..b12db503c8217bacf150c52a0e8c32b9949a4f6b 100644 --- a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/reference.ssp +++ b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/reference.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) <%@ var summary: Summary %> <%@ var rootPath: String %> diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/samplesList.ssp b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/samplesList.ssp similarity index 89% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/samplesList.ssp rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/samplesList.ssp index 4769c64c19ce843ea2c0f340fcd964e2e813f2d4..20f6945618a17055380a9289d082c1655939f872 100644 --- a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/core/report/samplesList.ssp +++ b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/core/report/samplesList.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) <%@ var summary: Summary %> <%@ var rootPath: String %> diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/tools/plotHeatmap.R b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/tools/plotHeatmap.R similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/tools/plotHeatmap.R rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/tools/plotHeatmap.R diff --git a/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotScatter.R b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotScatter.R new file mode 100644 index 0000000000000000000000000000000000000000..a1959a262cf868d9949b0320f57c9d54b7c50860 --- /dev/null +++ b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotScatter.R @@ -0,0 +1,40 @@ +library(reshape2) +library(ggplot2) +library(argparse) + +parser <- ArgumentParser(description='Process some integers') +parser$add_argument('--input', dest='input', type='character', help='Input tsv file', required=TRUE) +parser$add_argument('--output', dest='output', type='character', help='Output png file', required=TRUE) +parser$add_argument('--width', dest='width', type='integer', default = 500) +parser$add_argument('--height', dest='height', type='integer', default = 500) +parser$add_argument('--xlabel', dest='xlabel', type='character') +parser$add_argument('--ylabel', dest='ylabel', type='character', required=TRUE) +parser$add_argument('--llabel', dest='llabel', type='character') +parser$add_argument('--title', dest='title', type='character') +parser$add_argument('--removeZero', dest='removeZero', type='character', default="false") + +arguments <- parser$parse_args() + +png(filename = arguments$output, width = arguments$width, height = arguments$height) + +DF <- read.table(arguments$input, header=TRUE) + +if (is.null(arguments$xlabel)) xlab <- colnames(DF)[1] else xlab <- arguments$xlabel + +colnames(DF)[1] <- "Rank" + +DF1 <- melt(DF, id.var="Rank") + +if (arguments$removeZero == "true") DF1 <- DF1[DF1$value > 0, ] +if (arguments$removeZero == "true") print("Removed 0 values") + +ggplot(DF1, aes(x = Rank, y = value, group = variable, color = variable)) + + xlab(xlab) + + ylab(arguments$ylabel) + + guides(fill=guide_legend(title=arguments$llabel)) + + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) + + ggtitle(arguments$title) + + theme_bw() + + geom_point() + +dev.off() diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/extensions/rscript/plotXY.R b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotXY.R similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/extensions/rscript/plotXY.R rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotXY.R diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/extensions/rscript/stackedBar.R b/public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/stackedBar.R similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/extensions/rscript/stackedBar.R rename to public/biopet-core/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/stackedBar.R diff --git a/public/biopet-framework/src/main/resources/org/broadinstitute/gatk/queue/util/queueJobReport.R b/public/biopet-core/src/main/resources/org/broadinstitute/gatk/queue/util/queueJobReport.R similarity index 100% rename from public/biopet-framework/src/main/resources/org/broadinstitute/gatk/queue/util/queueJobReport.R rename to public/biopet-core/src/main/resources/org/broadinstitute/gatk/queue/util/queueJobReport.R diff --git a/public/biopet-core/src/main/resources/picard/analysis/baseDistributionByCycle.R b/public/biopet-core/src/main/resources/picard/analysis/baseDistributionByCycle.R new file mode 100644 index 0000000000000000000000000000000000000000..e4430689199353dc39d175d14533f4636df3a2bb --- /dev/null +++ b/public/biopet-core/src/main/resources/picard/analysis/baseDistributionByCycle.R @@ -0,0 +1,52 @@ +# Script to generate a chart of the base distribution by cycle +# @author Nils Homer + +# Parse the arguments +args <- commandArgs(trailing=T); +metricsFile <- args[1]; +outputFile <- args[2]; +bamFile <- args[3]; +subtitle <- ifelse(length(args) < 4, "", args[4]); + + +# Figure out where the metrics and the histogram are in the file and parse them out +startFinder <- scan(metricsFile, what="character", sep="\n", quiet=TRUE, blank.lines.skip=FALSE); + +firstBlankLine=0; + +for (i in 1:length(startFinder)) { + if (startFinder[i] == "") { + if (firstBlankLine==0) { + firstBlankLine=i+1; + } else { + secondBlankLine=i+1; + break; + } + } +} + +metrics <- read.table(metricsFile, header=T, sep="\t", skip=firstBlankLine); + +# Then plot the histogram as a PDF +pdf(outputFile); + +plot(x=c(1, 20+nrow(metrics)), + y=c(0, max(metrics[,3:7])), + main=paste("Base Distribution by Cycle\nin file ",bamFile," ",ifelse(subtitle == "","",paste("(",subtitle,")",sep="")),sep=""), + xlab="Cycle", + ylab="Base Percentage", + type="n"); + +colors = c("red", "orange", "blue", "purple", "black"); + +for (i in 1:5) { + lines(x=1:nrow(metrics), + y=metrics[,2+i], + col=colors[i], + type="l", + lty=1); +} + +legend("bottomright", lwd=1, legend=c("PCT_A", "PCT_C", "PCT_G", "PCT_T", "PCT_N"), col=colors); + +dev.off(); diff --git a/public/biopet-core/src/main/resources/picard/analysis/gcBias.R b/public/biopet-core/src/main/resources/picard/analysis/gcBias.R new file mode 100644 index 0000000000000000000000000000000000000000..1563cdf36dc0e1cb167a77a988c133e29e76139e --- /dev/null +++ b/public/biopet-core/src/main/resources/picard/analysis/gcBias.R @@ -0,0 +1,77 @@ +# Script to generate a chart to display GC bias based upon read starts observed +# in windows along the genome. +# +# @author Tim Fennell + +# Parse the arguments +args <- commandArgs(trailing=T) +metricsFile <- args[1] +outputFile <- args[2] +datasetName <- args[3] +subtitle <- args[4] +windowSize <- args[5] + +# Figure out where the metrics and the histogram are in the file and parse them out +startFinder <- scan(metricsFile, what="character", sep="\n", quiet=TRUE, blank.lines.skip=FALSE) + +firstBlankLine=0 + +for (i in 1:length(startFinder)) { + if (startFinder[i] == "") { + if (firstBlankLine==0) { + firstBlankLine=i+1 + } else { + secondBlankLine=i+1 + break + } + } +} + +metrics <- read.table(metricsFile, header=T, sep="\t", skip=firstBlankLine) +pdf(outputFile) + +# Some constants that are used below +Y_AXIS_LIM = 2; +MAX_QUALITY_SCORE = 40; +COLORS = c("royalblue", "#FFAAAA", "palegreen3"); + +# Adjust to give more margin on the right hand side +par(mar = c(5, 4, 4, 4)); + +# Do the main plot of the normalized coverage by GC +plot(type="p", x=metrics$GC, y=metrics$NORMALIZED_COVERAGE, + xlab=paste(c("GC% of", windowSize, "base windows"), sep=" ", collapse=" "), + ylab="Fraction of normalized coverage", + xlim=c(0,100), + ylim=c(0, Y_AXIS_LIM), + col=COLORS[1], + main=paste(datasetName, "GC Bias Plot", "\n", subtitle) + ); + +# Add lines at the 50% GC and coverage=1 +abline(h=1, v=50, col="lightgrey"); + +# Add error bars +arrows(metrics$GC, + metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, + metrics$GC, + metrics$NORMALIZED_COVERAGE + metrics$ERROR_BAR_WIDTH, + code = 3, angle = 90, length = 0.05, col="grey"); + +# Plot count of windows as a separate series near the bottom +window_ratio = 0.5 / max(metrics$WINDOWS); +scaled_windows = metrics$WINDOWS * window_ratio; +lines(metrics$GC, scaled_windows, type="h", col=COLORS[2], lwd=3); + +# Plot the quality series +lines(metrics$GC, metrics$MEAN_BASE_QUALITY * Y_AXIS_LIM / MAX_QUALITY_SCORE, type="l", col=COLORS[3]); +axis(4, + at=c(0, Y_AXIS_LIM/4, Y_AXIS_LIM/4*2, Y_AXIS_LIM/4*3, Y_AXIS_LIM), + labels=c(0, MAX_QUALITY_SCORE/4, MAX_QUALITY_SCORE/4*2, MAX_QUALITY_SCORE/4*3, MAX_QUALITY_SCORE) + ); +mtext("Mean base quality", side=4, line=2.5); + +# And finally add a legend +legend("topleft", pch=c(1,15, 45), legend=c("Normalized Coverage", "Windows at GC%", "Base Quality at GC%"), col=COLORS) + +dev.off(); \ No newline at end of file diff --git a/public/biopet-core/src/main/resources/picard/analysis/insertSizeHistogram.R b/public/biopet-core/src/main/resources/picard/analysis/insertSizeHistogram.R new file mode 100644 index 0000000000000000000000000000000000000000..a2cdd32011ed2e5c15da71205589a23e5f7be39a --- /dev/null +++ b/public/biopet-core/src/main/resources/picard/analysis/insertSizeHistogram.R @@ -0,0 +1,98 @@ +## script to generate histogram of insert sizes from metrics file +## expecting 3 arguments: +## first is the metrics file with the histogram info +## second is the output file +## third is a name for the plot + +args <- commandArgs(trailing=TRUE) +metricsFile <- args[1] +pdfFile <- args[2] +bamName <- args[3] +histoWidth <- ifelse(length(args) < 4, 0, as.numeric(args[4])) + +startFinder <- scan(metricsFile, what="character", sep="\n", quiet=TRUE, blank.lines.skip=FALSE) + +firstBlankLine=0 + +for (i in 1:length(startFinder)) { + if (startFinder[i] == "") { + if (firstBlankLine==0) { + firstBlankLine=i+1 + } else { + secondBlankLine=i+1 + break + } + } +} + +histogram <- read.table(metricsFile, header=TRUE, sep="\t", skip=secondBlankLine, comment.char="", quote='', check.names=FALSE) + +## The histogram has a fr_count/rf_count/tandem_count for each metric "level" +## This code parses out the distinct levels so we can output one graph per level +headers <- sapply(sub(".fr_count","",names(histogram),fixed=TRUE), "[[" ,1) +headers <- sapply(sub(".rf_count","",headers,fixed=TRUE), "[[" ,1) +headers <- sapply(sub(".tandem_count","",headers,fixed=TRUE), "[[" ,1) + +## Duplicated header names cause this to barf. KT & Yossi report that this is going to be extremely difficult to +## resolve and it's unlikely that anyone cares anyways. Trap this situation and avoid the PDF so it won't cause +## the workflow to fail +if (any(duplicated(headers))) { + print(paste("Not creating insert size PDF as there are duplicated header names:", headers[which(duplicated(headers))])) +} else { + levels <- c() + for (i in 2:length(headers)) { + if (!(headers[i] %in% levels)) { + levels[length(levels)+1] <- headers[i] + } + } + + pdf(pdfFile) + + for (i in 1:length(levels)) { + ## Reconstitutes the histogram column headers for this level + fr <- paste(levels[i], "fr_count", sep=".") + rf <- paste(levels[i], "rf_count", sep=".") + tandem <- paste(levels[i], "tandem_count", sep=".") + + frrange = ifelse(fr %in% names(histogram), max(histogram[fr]), 0) + rfrange = ifelse(rf %in% names(histogram), max(histogram[rf]), 0) + tandemrange = ifelse(tandem %in% names(histogram), max(histogram[tandem]), 0) + + yrange <- max(frrange, rfrange, tandemrange) + xrange <- ifelse(histoWidth > 0, histoWidth, max(histogram$insert_size)) + + plot(x=NULL, y=NULL, + type="n", + main=paste("Insert Size Histogram for", levels[i], "\nin file", bamName), + xlab="Insert Size", + ylab="Count", + xlim=range(0, xrange), + ylim=range(0, yrange)) + + colors <- c() + labels <- c() + + if (fr %in% names(histogram) ) { + lines(histogram$insert_size, as.matrix(histogram[fr]), type="h", col="red") + colors <- c(colors, "red") + labels <- c(labels, "FR") + } + if (rf %in% names(histogram)) { + lines(histogram$insert_size, as.matrix(histogram[rf]), type="h", col="blue") + colors <- c(colors, "blue") + labels <- c(labels, "RF") + } + + if (tandem %in% names(histogram)) { + lines(histogram$insert_size, as.matrix(histogram[tandem]), type="h", col="orange") + colors <- c(colors, "orange") + labels <- c(labels, "TANDEM") + } + + ## Create the legend + legend("topright", labels, fill=colors, col=colors, cex=0.7) + } + + dev.off() +} + diff --git a/public/biopet-core/src/main/resources/picard/analysis/meanQualityByCycle.R b/public/biopet-core/src/main/resources/picard/analysis/meanQualityByCycle.R new file mode 100755 index 0000000000000000000000000000000000000000..679e2b42d4e12ab507b18e652bc43224d5473da0 --- /dev/null +++ b/public/biopet-core/src/main/resources/picard/analysis/meanQualityByCycle.R @@ -0,0 +1,59 @@ +# Script to generate a chart of mean quality by cycle from a BAM file +# @author Tim Fennell + +# Parse the arguments +args <- commandArgs(trailing=T) +metricsFile <- args[1] +outputFile <- args[2] +bamFile <- args[3] +subtitle <- ifelse(length(args) < 4, "", args[4]) + + +# Figure out where the metrics and the histogram are in the file and parse them out +startFinder <- scan(metricsFile, what="character", sep="\n", quiet=TRUE, blank.lines.skip=FALSE) + +firstBlankLine=0 + +for (i in 1:length(startFinder)) +{ + if (startFinder[i] == "") { + if (firstBlankLine==0) { + firstBlankLine=i+1 + } else { + secondBlankLine=i+1 + break + } + } +} + +metrics <- read.table(metricsFile, header=T, nrows=1, sep="\t", skip=firstBlankLine) +histogram <- read.table(metricsFile, header=T, sep="\t", skip=secondBlankLine) + +# Then plot the histogram as a PDF +pdf(outputFile) + +plot(histogram$CYCLE, + histogram$MEAN_QUALITY, + type="n", + main=paste("Quality by Cycle\nin file ",bamFile," ",ifelse(subtitle == "","",paste("(",subtitle,")",sep="")),sep=""), + xlab="Cycle", + ylab="Mean Quality", + ylim=range(0,50)) + +qColor <- "darkblue" +oqColor <- rgb(1, 0.25, 0.25, 0.75) + +# Plot OQ first so that it's "behind" the regular qualities +if (!is.null(histogram$MEAN_ORIGINAL_QUALITY)) { + lines(histogram$CYCLE, histogram$MEAN_ORIGINAL_QUALITY, type="l", col=oqColor, lty=1); +} + +# Then plot the regular qualities +lines(histogram$CYCLE, histogram$MEAN_QUALITY, type="h", col=qColor, lty=1); + +# And add a legend +legend("topleft", pch=c(15,15), legend=c("Mean Quality", "Mean Original Quality"), col=c(qColor, oqColor)) + + +dev.off() + diff --git a/public/biopet-core/src/main/resources/picard/analysis/qualityScoreDistribution.R b/public/biopet-core/src/main/resources/picard/analysis/qualityScoreDistribution.R new file mode 100755 index 0000000000000000000000000000000000000000..ced1f8c0ab82842afe00f0abab84c3d575d18d1d --- /dev/null +++ b/public/biopet-core/src/main/resources/picard/analysis/qualityScoreDistribution.R @@ -0,0 +1,57 @@ +# Script to generate a chart of quality score distribution in a file +# @author Tim Fennell + +# Parse the arguments +args <- commandArgs(trailing=T) +metricsFile <- args[1] +outputFile <- args[2] +bamFile <- args[3] +subtitle <- ifelse(length(args) < 4, "", args[4]) + +# Figure out where the metrics and the histogram are in the file and parse them out +startFinder <- scan(metricsFile, what="character", sep="\n", quiet=TRUE, blank.lines.skip=FALSE) + +firstBlankLine=0 + +for (i in 1:length(startFinder)) +{ + if (startFinder[i] == "") { + if (firstBlankLine==0) { + firstBlankLine=i+1 + } else { + secondBlankLine=i+1 + break + } + } +} + +metrics <- read.table(metricsFile, header=T, nrows=1, sep="\t", skip=firstBlankLine) +histogram <- read.table(metricsFile, header=T, sep="\t", skip=secondBlankLine) + +# Then plot the histogram as a PDF +pdf(outputFile) + +plot(histogram$QUALITY, + histogram$COUNT_OF_Q, + type="n", + main=paste("Quality Score Distribution\nin file ",bamFile," ",ifelse(subtitle == "","",paste("(",subtitle,")",sep="")),sep=""), + xlab="Quality Score", + ylab="Observations") + +qColor <- "blue" +oqColor <- "lightcyan2" +width <- 5 + +# Plot OQ first so that it's "behind" the regular qualities +if (!is.null(histogram$COUNT_OF_OQ)) { + lines(histogram$QUALITY+0.25, histogram$COUNT_OF_OQ, type="h", col=oqColor, lty=1, lwd=width, lend="square"); +} + +# Then plot the regular qualities +lines(histogram$QUALITY, histogram$COUNT_OF_Q, type="h", col=qColor, lty=1, lwd=width, lend="square"); + +# And add a legend +legend("topleft", pch=c(15,15), legend=c("Quality Scores", "Original Quality Scores"), col=c(qColor, oqColor)) + +dev.off() + diff --git a/public/biopet-core/src/main/resources/picard/analysis/rnaSeqCoverage.R b/public/biopet-core/src/main/resources/picard/analysis/rnaSeqCoverage.R new file mode 100644 index 0000000000000000000000000000000000000000..1cf004ee3802e8b7e485d3c2b14dbbc8e8cf7fee --- /dev/null +++ b/public/biopet-core/src/main/resources/picard/analysis/rnaSeqCoverage.R @@ -0,0 +1,70 @@ +# Script to generate a normalized coverage vs. position along transcript plot. +# +# @author Tim Fennell + +# Parse the arguments +args <- commandArgs(trailing = TRUE) +metricsFile <- args[1] +outputFile <- args[2] +bamName <- args[3] +subtitle <- ifelse(length(args) < 4, "", args[4]) + +# Figure out where the metrics and the histogram are in the file and parse them out +startFinder <- scan(metricsFile, what="character", sep="\n", quiet=TRUE, blank.lines.skip=FALSE) + +firstBlankLine=0 + +for (i in 1:length(startFinder)) { + if (startFinder[i] == "") { + if (firstBlankLine==0) { + firstBlankLine=i+1 + } else { + secondBlankLine=i+1 + break + } + } +} + +data <- read.table(metricsFile, header=T, sep="\t", skip=secondBlankLine, check.names=FALSE) + +# The histogram has a normalized_position and normalized_coverage column for each metric "level" +# This code parses out the distinct levels so we can output one graph per level +headers <- sapply(sub(".normalized_coverage","",names(data),fixed=TRUE), "[[" ,1) + +## Duplicated header names cause this to barf. KT & Yossi report that this is going to be extremely difficult to +## resolve and it's unlikely that anyone cares anyways. Trap this situation and avoid the PDF so it won't cause +## the workflow to fail +if (any(duplicated(headers))) { + print(paste("Not creating insert size PDF as there are duplicated header names:", headers[which(duplicated(headers))])) +} else { + pdf(outputFile) + levels <- c() + for (i in 2:length(headers)) { + if (!(headers[i] %in% levels)) { + levels[length(levels)+1] <- headers[i] + } + } + + # Some constants that are used below + COLORS = c("royalblue", "#FFAAAA", "palegreen3"); + + # For each level, plot of the normalized coverage by GC + for (i in 1:length(levels)) { + + # Reconstitutes the histogram column header for this level + nc <- paste(levels[i], "normalized_coverage", sep=".") + + plot(x=data$normalized_position, y=as.matrix(data[nc]), + type="o", + xlab="Normalized Distance Along Transcript", + ylab="Normalized Coverage", + xlim=c(0, 100), + ylim=range(0, max(data[nc])), + col="royalblue", + main=paste("RNA-Seq Coverage vs. Transcript Position\n", levels[i], " ", ifelse(subtitle=="", "", paste("(", subtitle, ")", sep="")), "\nin file ", bamName,sep="")) + + # Add a horizontal line at coverage=1 + abline(h=1, col="lightgrey"); + } + dev.off(); +} \ No newline at end of file diff --git a/public/biopet-core/src/main/resources/picard/analysis/rrbsQc.R b/public/biopet-core/src/main/resources/picard/analysis/rrbsQc.R new file mode 100644 index 0000000000000000000000000000000000000000..c7d7abeee2c88223faddf2621f0757bdfe6123e0 --- /dev/null +++ b/public/biopet-core/src/main/resources/picard/analysis/rrbsQc.R @@ -0,0 +1,109 @@ +## +## The MIT License +## +## Copyright (c) 2013 The Broad Institute +## +## Permission is hereby granted, free of charge, to any person obtaining a copy +## of this software and associated documentation files (the "Software"), to deal +## in the Software without restriction, including without limitation the rights +## to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +## copies of the Software, and to permit persons to whom the Software is +## furnished to do so, subject to the following conditions: +## +## The above copyright notice and this permission notice shall be included in +## all copies or substantial portions of the Software. +## +## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +## OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +## THE SOFTWARE. +## + +args = commandArgs(trailingOnly=TRUE) +opt = list(details.fn=args[1], summary.fn=args[2], output.fn=args[3]) + +read_metrics_file = function(metrics.fn) { + contents = read.delim(metrics.fn, comment.char="#", stringsAsFactors=FALSE) + return(contents) +} + +equals_or_is_na = function(x1, x2) { + if (is.na(x1)) { + return(is.na(x2)) + } else { + return(x1 == x2) + } +} + +details = read_metrics_file(opt$details.fn) +summary = read_metrics_file(opt$summary.fn) + +pdf(opt$output.fn) +par(mfrow=c(2,2), oma=c(0,0,2,0)) + +for (i in seq_len(nrow(summary))) { + cur_summary = summary[i, ] + cur_sample = cur_summary[1, "SAMPLE"] + cur_library = cur_summary[1, "LIBRARY"] + cur_read_group = cur_summary[1, "READ_GROUP"] + cur_details = details[which((equals_or_is_na(cur_library, details[, "LIBRARY"]) & + (equals_or_is_na(cur_sample, details[, "SAMPLE"])) & + (equals_or_is_na(cur_read_group, details[, "READ_GROUP"])))), ] + + + ## Plot conversion rates + cpg.converted = sum(cur_details$CONVERTED_SITES) + cpg.seen = sum(cur_details$TOTAL_SITES) + cpg.conversion = cpg.converted / cpg.seen + total.conversion = (cpg.converted + cur_summary$NON_CPG_CONVERTED_BASES) / (cpg.seen + cur_summary$NON_CPG_BASES) + + barplot(c("non-CpG"=cur_summary$PCT_NON_CPG_BASES_CONVERTED, "Combined"=total.conversion, "CpG"=cpg.conversion), + ylim=c(0.95, 1), ylab="% Conversion", xlab="Distribution", main="Bisulfite Conversion Rate", + col="blue", xpd=FALSE) + abline(h=0.995, col="grey") + + ## Plot histogram of CpG counts by conversion rate + hist(cur_details$PCT_CONVERTED, 10, xlab="Conversion Rate Of CpGs", ylab="# CpGs", + main="CpG Conversion Rate Distribution", col="blue") + + ## Plot pie chart showing distribution of CpG coverage + coverage_breaks = c(0, 1, 5, 10, 25, 50, 100, Inf) + coverage_cut = cut(cur_details$TOTAL_SITES, coverage_breaks) + cpg_coverage = split(cur_details$TOTAL_SITES, coverage_cut) + coverages = sapply(cpg_coverage, length)[2:7] + names(coverages) = paste(">=", c(1, 5, 10, 25, 50, 100), sep="") + ## If we have 0s all across the pie chart will be effectively meaningless but put in a 100% >= 0 field instead + ## to avoid an error on pie(). Normally it'd just be a pain to see these, but ... + if (all(coverages == 0)) { + coverages = c("No Coverage"=1) + } + color_ramp = colorRampPalette(c("white", "#538ED5", "blue"), bias=1, space="Lab") + colors = color_ramp(length(coverages))[2:length(coverages)] + pie(coverages, main="Distribution Of CpGs By Coverage", col=colors, clockwise=TRUE) + + discards = log10(c("Mismatches"=cur_summary$READS_IGNORED_MISMATCHES, "Size"=cur_summary$READS_IGNORED_SHORT)) + ## Protect against -Inf in the case where we had 0 discards + discards = ifelse(is.finite(discards), discards, 0) + barplot(discards, ylab="Number Discarded (log10)", xlab="Reason", + main="Reads Discarded", col="blue", ylim=c(0, ceiling(max(discards)))) + + header_txt = character() + if (!is.na(cur_sample) && cur_sample != "") { + header_txt = paste(header_txt, " SAMPLE=", cur_sample, sep="") + } + if (!is.na(cur_library) && cur_library != "") { + header_txt = paste(header_txt, " LIBRARY=", cur_library, sep="") + } + if (!is.na(cur_read_group) && cur_read_group != "") { + header_txt = paste(header_txt, " READ GROUP=", cur_read_group, sep="") + } + if (length(header_txt) > 0) { + mtext(header_txt, outer=TRUE, line=1) + } +} + +dev.off() + diff --git a/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetCommandLineFunction.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetCommandLineFunction.scala new file mode 100644 index 0000000000000000000000000000000000000000..254e783cb9e31718a8b0759e7ce0d55c152d203d --- /dev/null +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetCommandLineFunction.scala @@ -0,0 +1,290 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.core + +import java.io.{ PrintWriter, File, FileInputStream } +import java.security.MessageDigest + +import nl.lumc.sasc.biopet.utils.Logging +import org.broadinstitute.gatk.utils.commandline.{ Output, Input } +import org.broadinstitute.gatk.utils.runtime.ProcessSettings +import org.ggf.drmaa.JobTemplate + +import scala.collection.mutable +import scala.io.Source +import scala.sys.process.{ Process, ProcessLogger } +import scala.util.matching.Regex +import scala.collection.JavaConversions._ + +/** Biopet command line trait to auto check executable and cluster values */ +trait BiopetCommandLineFunction extends CommandLineResources { biopetFunction => + analysisName = configName + + @Input(doc = "deps", required = false) + var deps: List[File] = Nil + + @Output + var outputFiles: List[File] = Nil + + var executable: String = _ + + /** This is the default shell for drmaa jobs */ + def defaultRemoteCommand = "bash" + private val remoteCommand: String = config("remote_command", default = defaultRemoteCommand) + + private def changeScript(file: File): Unit = { + val lines = Source.fromFile(file).getLines().toList + val writer = new PrintWriter(file) + writer.println("set -eubf") + writer.println("set -o pipefail") + lines.foreach(writer.println(_)) + writer.close() + } + + // This overrides the default "sh" from queue. For Biopet the default is "bash" + updateJobRun = { + case jt: JobTemplate => { + changeScript(new File(jt.getArgs.head.toString)) + jt.setRemoteCommand(remoteCommand) + } + case ps: ProcessSettings => { + changeScript(new File(ps.getCommand.tail.head)) + ps.setCommand(Array(remoteCommand) ++ ps.getCommand.tail) + } + } + + /** + * Can override this method. This is executed just before the job is ready to run. + * Can check on run time files from pipeline here + */ + def beforeCmd() {} + + /** Can override this method. This is executed after the script is done en queue starts to generate the graph */ + def beforeGraph() {} + + override def freezeFieldValues() { + preProcessExecutable() + beforeGraph() + internalBeforeGraph() + + super.freezeFieldValues() + } + + /** Set default output file, threads and vmem for current job */ + final def internalBeforeGraph(): Unit = { + + pipesJobs.foreach(_.beforeGraph()) + pipesJobs.foreach(_.internalBeforeGraph()) + + } + + /** can override this value is executable may not be converted to CanonicalPath */ + val executableToCanonicalPath = true + + /** + * Checks executable. Follow full CanonicalPath, checks if it is existing and do a md5sum on it to store in job report + */ + protected[core] def preProcessExecutable() { + if (!BiopetCommandLineFunction.executableMd5Cache.contains(executable)) { + if (executable != null) { + if (!BiopetCommandLineFunction.executableCache.contains(executable)) { + try { + val oldExecutable = executable + val buffer = new StringBuffer() + val cmd = Seq("which", executable) + val process = Process(cmd).run(ProcessLogger(buffer.append(_))) + if (process.exitValue == 0) { + executable = buffer.toString + val file = new File(executable) + if (executableToCanonicalPath) executable = file.getCanonicalPath + else executable = file.getAbsolutePath + } else Logging.addError("executable: '" + executable + "' not found, please check config") + BiopetCommandLineFunction.executableCache += oldExecutable -> executable + BiopetCommandLineFunction.executableCache += executable -> executable + } catch { + case ioe: java.io.IOException => + logger.warn(s"Could not use 'which' on '$executable', check on executable skipped: " + ioe) + } + } else executable = BiopetCommandLineFunction.executableCache(executable) + + if (!BiopetCommandLineFunction.executableMd5Cache.contains(executable)) { + if (new File(executable).exists()) { + val is = new FileInputStream(executable) + val cnt = is.available + val bytes = Array.ofDim[Byte](cnt) + is.read(bytes) + is.close() + val temp = MessageDigest.getInstance("MD5").digest(bytes).map("%02X".format(_)).mkString.toLowerCase + BiopetCommandLineFunction.executableMd5Cache += executable -> temp + } else BiopetCommandLineFunction.executableMd5Cache += executable -> "file_does_not_exist" + } + } + } + val md5 = BiopetCommandLineFunction.executableMd5Cache.get(executable) + addJobReportBinding("md5sum_exe", md5.getOrElse("None")) + } + + /** executes checkExecutable method and fill job report */ + final protected def preCmdInternal() { + preProcessExecutable() + beforeCmd() + + addJobReportBinding("cores", nCoresRequest match { + case Some(n) if n > 0 => n + case _ => 1 + }) + addJobReportBinding("version", getVersion) + } + + /** Command to get version of executable */ + protected[core] def versionCommand: String = null + + /** Regex to get version from version command output */ + protected[core] def versionRegex: Regex = null + + /** Allowed exit codes for the version command */ + protected[core] def versionExitcode = List(0) + + /** Executes the version command */ + private[core] def getVersionInternal: Option[String] = { + if (versionCommand == null || versionRegex == null) None + else getVersionInternal(versionCommand, versionRegex) + } + + /** Executes the version command */ + private[core] def getVersionInternal(versionCommand: String, versionRegex: Regex): Option[String] = { + if (versionCommand == null || versionRegex == null) return None + val exe = new File(versionCommand.trim.split(" ")(0)) + if (!exe.exists()) return None + val stdout = new StringBuffer() + val stderr = new StringBuffer() + def outputLog = "Version command: \n" + versionCommand + + "\n output log: \n stdout: \n" + stdout.toString + + "\n stderr: \n" + stderr.toString + val process = Process(versionCommand).run(ProcessLogger(stdout append _ + "\n", stderr append _ + "\n")) + if (!versionExitcode.contains(process.exitValue())) { + logger.warn("getVersion give exit code " + process.exitValue + ", version not found \n" + outputLog) + return None + } + for (line <- stdout.toString.split("\n") ++ stderr.toString.split("\n")) { + line match { + case versionRegex(m) => return Some(m) + case _ => + } + } + logger.warn("getVersion give a exit code " + process.exitValue + " but no version was found, executable correct? \n" + outputLog) + None + } + + /** Get version from cache otherwise execute the version command */ + def getVersion: Option[String] = { + if (!BiopetCommandLineFunction.executableCache.contains(executable)) + preProcessExecutable() + if (!BiopetCommandLineFunction.versionCache.contains(versionCommand)) + getVersionInternal match { + case Some(version) => BiopetCommandLineFunction.versionCache += versionCommand -> version + case _ => + } + BiopetCommandLineFunction.versionCache.get(versionCommand) + } + + private[core] var _inputAsStdin = false + def inputAsStdin = _inputAsStdin + private[core] var _outputAsStdout = false + def outputAsStsout = _outputAsStdout + + /** + * This operator sends stdout to `that` and combine this into 1 command line function + * @param that Function that will read from stdin + * @return BiopetPipe function + */ + def |(that: BiopetCommandLineFunction): BiopetCommandLineFunction = { + this._outputAsStdout = true + that._inputAsStdin = true + this.beforeGraph() + this.internalBeforeGraph() + that.beforeGraph() + that.internalBeforeGraph() + this match { + case p: BiopetPipe => { + p.commands.last._outputAsStdout = true + new BiopetPipe(p.commands ::: that :: Nil) + } + case _ => new BiopetPipe(List(this, that)) + } + } + + /** + * This operator can be used to give a program a file as stdin + * @param file File that will become stdin for this program + * @return It's own class + */ + def :<:(file: File): BiopetCommandLineFunction = { + this._inputAsStdin = true + this.stdinFile = Some(file) + this + } + + /** + * This operator can be used to give a program a file write it's atdout + * @param file File that will become stdout for this program + * @return It's own class + */ + def >(file: File): BiopetCommandLineFunction = { + this._outputAsStdout = true + this.stdoutFile = Some(file) + this + } + + @Output(required = false) + private[core] var stdoutFile: Option[File] = None + + @Input(required = false) + private[core] var stdinFile: Option[File] = None + + /** + * This function needs to be implemented to define the command that is executed + * @return Command to run + */ + protected[core] def cmdLine: String + + /** + * implementing a final version of the commandLine from org.broadinstitute.gatk.queue.function.CommandLineFunction + * User needs to implement cmdLine instead + * @return Command to run + */ + override final def commandLine: String = { + preCmdInternal() + val cmd = cmdLine + + stdinFile.map(file => " < " + required(file.getAbsoluteFile)).getOrElse("") + + stdoutFile.map(file => " > " + required(file.getAbsoluteFile)).getOrElse("") + addJobReportBinding("command", cmd) + cmd + } + + private[core] var pipesJobs: List[BiopetCommandLineFunction] = Nil + def addPipeJob(job: BiopetCommandLineFunction) { + pipesJobs :+= job + pipesJobs = pipesJobs.distinct + } +} + +/** stores global caches */ +object BiopetCommandLineFunction { + private[core] val versionCache: mutable.Map[String, String] = mutable.Map() + private[core] val executableMd5Cache: mutable.Map[String, String] = mutable.Map() + private[core] val executableCache: mutable.Map[String, String] = mutable.Map() +} diff --git a/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetFifoPipe.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetFifoPipe.scala new file mode 100644 index 0000000000000000000000000000000000000000..ee06edb0a666a543c108c891092d5ac8ca7e23a6 --- /dev/null +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetFifoPipe.scala @@ -0,0 +1,150 @@ +package nl.lumc.sasc.biopet.core + +import java.io.File + +import nl.lumc.sasc.biopet.utils.config.Configurable + +/** + * Created by pjvan_thof on 9/29/15. + */ +class BiopetFifoPipe(val root: Configurable, + protected var commands: List[BiopetCommandLineFunction]) extends BiopetCommandLineFunction { + + def fifos: List[File] = { + val outputs: Map[BiopetCommandLineFunction, Seq[File]] = try { + commands.map(x => x -> x.outputs).toMap + } catch { + case e: NullPointerException => Map() + } + + val inputs: Map[BiopetCommandLineFunction, Seq[File]] = try { + commands.map(x => x -> x.inputs).toMap + } catch { + case e: NullPointerException => Map() + } + + for ( + cmdOutput <- commands; + cmdInput <- commands if cmdOutput != cmdInput && outputs.contains(cmdOutput); + outputFile <- outputs(cmdOutput) if inputs.contains(cmdInput); + inputFile <- inputs(cmdInput) if outputFile == inputFile + ) yield outputFile + } + + override def beforeGraph(): Unit = { + val outputs: Map[BiopetCommandLineFunction, Seq[File]] = try { + commands.map(x => x -> x.outputs).toMap + } catch { + case e: NullPointerException => Map() + } + + val inputs: Map[BiopetCommandLineFunction, Seq[File]] = try { + commands.map(x => x -> x.inputs).toMap + } catch { + case e: NullPointerException => Map() + } + + val fifoFiles = fifos + + outputFiles :::= outputs.values.toList.flatten.filter(!fifoFiles.contains(_)) + outputFiles = outputFiles.distinct + + deps :::= inputs.values.toList.flatten.filter(!fifoFiles.contains(_)) + deps = deps.distinct + } + + override def beforeCmd(): Unit = { + commands.foreach { cmd => + cmd.beforeGraph() + cmd.internalBeforeGraph() + cmd.beforeCmd() + } + } + + def cmdLine = { + val fifosFiles = this.fifos + fifosFiles.filter(_.exists()).map(required("rm", _)).mkString("\n\n", " \n", " \n\n") + + fifosFiles.map(required("mkfifo", _)).mkString("\n\n", "\n", "\n\n") + + commands.map(_.commandLine).mkString("\n\n", " & \n", " & \n\n") + + BiopetFifoPipe.waitScript + + fifosFiles.map(required("rm", _)).mkString("\n\n", " \n", " \n\n") + + BiopetFifoPipe.endScript + } + + override def setResources(): Unit = { + combineResources(commands) + } + + override def setupRetry(): Unit = { + super.setupRetry() + commands.foreach(_.setupRetry()) + combineResources(commands) + } + + override def freezeFieldValues(): Unit = { + super.freezeFieldValues() + commands.foreach(_.qSettings = qSettings) + } +} + +object BiopetFifoPipe { + val waitScript = + """ + | + |allJobs=`jobs -p` + |jobs=$allJobs + | + |echo [`date`] pids: $jobs + | + |FAIL="0" + | + |while echo $jobs | grep -e "\w" > /dev/null + |do + | for job in $jobs + | do + | if ps | grep "$job " | grep -v grep > /dev/null + | then + | echo [`date`] $job still running > /dev/null + | else + | jobs=`echo $jobs | sed "s/${job}//"` + | wait $job || FAIL=$? + | if echo $FAIL | grep -ve "^0$" > /dev/null + | then + | echo [`date`] $job fails with exitcode: $FAIL + | break + | fi + | echo [`date`] $job done + | fi + | done + | if echo $FAIL | grep -ve "^0$" > /dev/null + | then + | break + | fi + | sleep 1 + |done + | + |if echo $FAIL | grep -ve "^0$" > /dev/null + |then + | echo [`date`] kill other pids: $jobs + | kill $jobs + |fi + | + |echo [`date`] Done + | + | + """.stripMargin + + val endScript = + """ + | + |if [ "$FAIL" == "0" ]; + |then + |echo [`date`] "BiopetFifoPipe Done" + |else + |echo [`date`] BiopetFifoPipe "FAIL! ($FAIL)" + |exit $FAIL + |fi + | + | + """.stripMargin +} \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetJavaCommandLineFunction.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetJavaCommandLineFunction.scala similarity index 67% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetJavaCommandLineFunction.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetJavaCommandLineFunction.scala index d51442e3a25fbd030c16e3e126802e58cc6c2064..3f75cb92128b26bfb2681a2a25cfb5436c9ac5f6 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetJavaCommandLineFunction.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetJavaCommandLineFunction.scala @@ -18,38 +18,49 @@ package nl.lumc.sasc.biopet.core import org.broadinstitute.gatk.queue.function.JavaCommandLineFunction /** Biopet commandline class for java based programs */ -trait BiopetJavaCommandLineFunction extends JavaCommandLineFunction with BiopetCommandLineFunctionTrait { +trait BiopetJavaCommandLineFunction extends JavaCommandLineFunction with BiopetCommandLineFunction { executable = config("java", default = "java", submodule = "java", freeVar = false) javaGCThreads = config("java_gc_threads") javaGCHeapFreeLimit = config("java_gc_heap_freelimit") javaGCTimeLimit = config("java_gc_timelimit") - override protected def defaultVmemFactor: Double = 2.0 + override def defaultVmemFactor: Double = 2.0 /** Constructs java opts, this adds scala threads */ override def javaOpts = super.javaOpts + - optional("-Dscala.concurrent.context.numThreads=", threads, spaceSeparated = false, escape = false) + optional("-Dscala.concurrent.context.numThreads=", threads, spaceSeparated = false) + + override def beforeGraph(): Unit = { + setResources() + if (javaMemoryLimit.isEmpty && memoryLimit.isDefined) + javaMemoryLimit = memoryLimit + + if (javaMainClass != null && javaClasspath.isEmpty) + javaClasspath = JavaCommandLineFunction.currentClasspath + + //threads = getThreads(defaultThreads) + } /** Creates command to execute extension */ - override def commandLine: String = { + def cmdLine: String = { preCmdInternal() - val cmd = super.commandLine - val finalCmd = executable + cmd.substring(cmd.indexOf(" ")) - cmd + required(executable) + + javaOpts + + javaExecutable } def javaVersionCommand: String = executable + " -version" def getJavaVersion: Option[String] = { - if (!BiopetCommandLineFunctionTrait.executableCache.contains(executable)) + if (!BiopetCommandLineFunction.executableCache.contains(executable)) preProcessExecutable() - if (!BiopetCommandLineFunctionTrait.versionCache.contains(javaVersionCommand)) + if (!BiopetCommandLineFunction.versionCache.contains(javaVersionCommand)) getVersionInternal(javaVersionCommand, """java version "(.*)"""".r) match { - case Some(version) => BiopetCommandLineFunctionTrait.versionCache += javaVersionCommand -> version + case Some(version) => BiopetCommandLineFunction.versionCache += javaVersionCommand -> version case _ => } - BiopetCommandLineFunctionTrait.versionCache.get(javaVersionCommand) + BiopetCommandLineFunction.versionCache.get(javaVersionCommand) } override def setupRetry(): Unit = { diff --git a/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetPipe.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetPipe.scala new file mode 100644 index 0000000000000000000000000000000000000000..8b5e9fbaefe84175828c9a3f844c2120bb9ba259 --- /dev/null +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetPipe.scala @@ -0,0 +1,74 @@ +package nl.lumc.sasc.biopet.core + +import java.io.File + +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +/** + * This class can pipe multiple BiopetCommandFunctions to 1 job + * + * Created by pjvanthof on 08/09/15. + */ +class BiopetPipe(val commands: List[BiopetCommandLineFunction]) extends BiopetCommandLineFunction { + + @Input + lazy val input: List[File] = try { + commands.flatMap(_.inputs) + } catch { + case e: Exception => Nil + } + + @Output + lazy val output: List[File] = try { + commands.flatMap(_.outputs) + } catch { + case e: Exception => Nil + } + + pipesJobs :::= commands + + override def beforeGraph() { + super.beforeGraph() + + stdoutFile = stdoutFile.map(_.getAbsoluteFile) + stdinFile = stdinFile.map(_.getAbsoluteFile) + + if (stdoutFile.isDefined || _outputAsStdout) { + commands.last.stdoutFile = None + commands.last._outputAsStdout = true + } + + if (commands.head.stdinFile.isDefined) commands.head._inputAsStdin = true + + val inputOutput = input.filter(x => output.contains(x)) + require(inputOutput.isEmpty, "File found as input and output in the same job, files: " + inputOutput.mkString(", ")) + } + + override def setResources(): Unit = { + combineResources(pipesJobs) + } + + override def setupRetry(): Unit = { + super.setupRetry() + commands.foreach(_.setupRetry()) + combineResources(commands) + } + + override def defaultCoreMemory = 0.0 + override def defaultThreads = 0 + + val root: Configurable = commands.head.root + override def configName = commands.map(_.configName).mkString("-") + def cmdLine: String = { + "(" + commands.head.cmdLine + (if (commands.head.stdinFile.isDefined) { + " < " + required(commands.head.stdinFile.map(_.getAbsoluteFile)) + } else "") + " | " + commands.tail.map(_.cmdLine).mkString(" | ") + + (if (commands.last.stdoutFile.isDefined) " > " + required(commands.last.stdoutFile.map(_.getAbsoluteFile)) else "") + ")" + } + + override def freezeFieldValues(): Unit = { + super.freezeFieldValues() + commands.foreach(_.qSettings = qSettings) + } +} \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetQScript.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetQScript.scala similarity index 76% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetQScript.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetQScript.scala index 8bcb354588c52400cc2c6677839e5970149be191..1109f30da9fcb53cd622935b8451d6ab551d0205 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetQScript.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/BiopetQScript.scala @@ -17,8 +17,9 @@ package nl.lumc.sasc.biopet.core import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.report.ReportBuilderExtension +import nl.lumc.sasc.biopet.utils.Logging import org.broadinstitute.gatk.queue.QSettings import org.broadinstitute.gatk.queue.function.QFunction import org.broadinstitute.gatk.queue.function.scattergather.ScatterGatherableFunction @@ -47,6 +48,10 @@ trait BiopetQScript extends Configurable with GatkLogging { var outputFiles: Map[String, File] = Map() + type InputFile = BiopetQScript.InputFile + + var inputFiles: List[InputFile] = Nil + /** Get implemented from org.broadinstitute.gatk.queue.QScript */ var qSettings: QSettings @@ -74,20 +79,30 @@ trait BiopetQScript extends Configurable with GatkLogging { case _ => } for (function <- functions) function match { - case f: BiopetCommandLineFunctionTrait => + case f: BiopetCommandLineFunction => f.preProcessExecutable() f.beforeGraph() + f.internalBeforeGraph() f.commandLine case _ => } if (outputDir.getParentFile.canWrite || (outputDir.exists && outputDir.canWrite)) globalConfig.writeReport(qSettings.runName, new File(outputDir, ".log/" + qSettings.runName)) - else BiopetQScript.addError("Parent of output dir: '" + outputDir.getParent + "' is not writeable, outputdir can not be created") + else Logging.addError("Parent of output dir: '" + outputDir.getParent + "' is not writeable, outputdir can not be created") - reportClass.foreach(add(_)) + inputFiles.foreach { i => + if (!i.file.exists()) Logging.addError(s"Input file does not exist: ${i.file}") + else if (!i.file.canRead()) Logging.addError(s"Input file can not be read: ${i.file}") + } + + this match { + case q: MultiSampleQScript if q.onlySamples.nonEmpty && !q.samples.forall(x => q.onlySamples.contains(x._1)) => + logger.info("Write report is skipped because sample flag is used") + case _ => reportClass.foreach(add(_)) + } - BiopetQScript.checkErrors() + Logging.checkErrors() } /** Get implemented from org.broadinstitute.gatk.queue.QScript */ @@ -103,27 +118,6 @@ trait BiopetQScript extends Configurable with GatkLogging { } } -object BiopetQScript extends Logging { - private val errors: ListBuffer[Exception] = ListBuffer() - - def addError(error: String, debug: String = null): Unit = { - val msg = error + (if (debug != null && logger.isDebugEnabled) "; " + debug else "") - errors.append(new Exception(msg)) - } - - protected def checkErrors(): Unit = { - if (errors.nonEmpty) { - logger.error("*************************") - logger.error("Biopet found some errors:") - if (logger.isDebugEnabled) { - for (e <- errors) { - logger.error(e.getMessage) - logger.debug(e.getStackTrace.mkString("Stack trace:\n", "\n", "\n")) - } - } else { - errors.map(_.getMessage).sorted.distinct.foreach(logger.error(_)) - } - throw new IllegalStateException("Biopet found errors") - } - } +object BiopetQScript { + protected case class InputFile(file: File, md5: Option[String] = None) } diff --git a/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/CommandLineResources.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/CommandLineResources.scala new file mode 100644 index 0000000000000000000000000000000000000000..0fdc946e31808db3b41eb7e81bd5377c3ec563f3 --- /dev/null +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/CommandLineResources.scala @@ -0,0 +1,100 @@ +package nl.lumc.sasc.biopet.core + +import java.io.File + +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.queue.function.CommandLineFunction + +/** + * Created by pjvanthof on 01/10/15. + */ +trait CommandLineResources extends CommandLineFunction with Configurable { + + def defaultThreads = 1 + final def threads = nCoresRequest match { + case Some(i) => i + case _ => { + val t = getThreads + nCoresRequest = Some(t) + t + } + } + + var vmem: Option[String] = config("vmem") + def defaultCoreMemory: Double = 1.0 + def defaultVmemFactor: Double = 1.4 + var vmemFactor: Double = config("vmem_factor", default = defaultVmemFactor) + + var residentFactor: Double = config("resident_factor", default = 1.2) + + private var _coreMemory: Double = 2.0 + def coreMemeory = _coreMemory + + var retry = 0 + + override def freezeFieldValues(): Unit = { + setResources() + if (vmem.isDefined) jobResourceRequests :+= "h_vmem=" + vmem.get + super.freezeFieldValues() + } + + def getThreads: Int = getThreads(defaultThreads) + + /** + * Get threads from config + * @param default default when not found in config + * @return number of threads + */ + private def getThreads(default: Int): Int = { + val maxThreads: Int = config("maxthreads", default = 24) + val threads: Int = config("threads", default = default) + if (maxThreads > threads) threads + else maxThreads + } + + def setResources(): Unit = { + val firstOutput = try { + this.firstOutput + } catch { + case e: NullPointerException => null + } + + if (jobOutputFile == null && firstOutput != null) + jobOutputFile = new File(firstOutput.getAbsoluteFile.getParent, "." + firstOutput.getName + "." + configName + ".out") + + nCoresRequest = Option(threads) + + _coreMemory = config("core_memory", default = defaultCoreMemory).asDouble + + (0.5 * retry) + + if (config.contains("memory_limit")) memoryLimit = config("memory_limit") + else memoryLimit = Some(_coreMemory * threads) + + if (config.contains("resident_limit")) residentLimit = config("resident_limit") + else residentLimit = Some((_coreMemory + (0.5 * retry)) * residentFactor) + + if (!config.contains("vmem")) vmem = Some((_coreMemory * (vmemFactor + (0.5 * retry))) + "G") + jobName = configName + ":" + (if (firstOutput != null) firstOutput.getName else jobOutputFile) + } + + override def setupRetry(): Unit = { + super.setupRetry() + if (vmem.isDefined) jobResourceRequests = jobResourceRequests.filterNot(_.contains("h_vmem=")) + logger.info("Auto raise memory on retry") + retry += 1 + this.freeze() + } + + var threadsCorrection = 0 + + protected def combineResources(commands: List[CommandLineResources]): Unit = { + commands.foreach(_.setResources()) + nCoresRequest = Some(commands.map(_.threads).sum + threadsCorrection) + + _coreMemory = commands.map(cmd => cmd.coreMemeory * (cmd.threads.toDouble / threads.toDouble)).sum + memoryLimit = Some(_coreMemory * threads) + residentLimit = Some((_coreMemory + (0.5 * retry)) * residentFactor) + vmem = Some((_coreMemory * (vmemFactor + (0.5 * retry))) + "G") + } + +} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/MultiSampleQScript.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/MultiSampleQScript.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/MultiSampleQScript.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/MultiSampleQScript.scala index b449fa2f6abf98c3c31ce8cee8ac09c58b7c7a3b..17631709f12db8a3e3b25253afc49fadd3820e32 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/MultiSampleQScript.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/MultiSampleQScript.scala @@ -26,7 +26,7 @@ trait MultiSampleQScript extends SummaryQScript { qscript => @Argument(doc = "Only Sample", shortName = "s", required = false, fullName = "sample") - private val onlySamples: List[String] = Nil + private[core] val onlySamples: List[String] = Nil require(globalConfig.map.contains("samples"), "No Samples found in config") @@ -131,7 +131,7 @@ trait MultiSampleQScript extends SummaryQScript { /** Runs addAndTrackJobs method for each sample */ final def addSamplesJobs() { - if (onlySamples.isEmpty) { + if (onlySamples.isEmpty || samples.forall(x => onlySamples.contains(x._1))) { samples.foreach { case (sampleId, sample) => sample.addAndTrackJobs() } addMultiSampleJobs() } else onlySamples.foreach(sampleId => samples.get(sampleId) match { @@ -152,7 +152,7 @@ trait MultiSampleQScript extends SummaryQScript { private var currentLib: Option[String] = None /** Prefix full path with sample and library for jobs that's are created in current state */ - override protected[core] def configFullPath: List[String] = { + override def configFullPath: List[String] = { val sample = currentSample match { case Some(s) => "samples" :: s :: Nil case _ => Nil diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/PipelineCommand.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/PipelineCommand.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/PipelineCommand.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/PipelineCommand.scala index 612a3753d3cf6d49a86e7c577bf51da0c5880948..dcede52573eb901026c7e62deb9d071d486df9fb 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/PipelineCommand.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/PipelineCommand.scala @@ -17,8 +17,9 @@ package nl.lumc.sasc.biopet.core import java.io.{ File, PrintWriter } -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import nl.lumc.sasc.biopet.core.workaround.BiopetQCommandLine +import nl.lumc.sasc.biopet.utils.{ MainCommand, Logging } import org.apache.log4j.{ PatternLayout, WriterAppender } import org.broadinstitute.gatk.queue.util.{ Logging => GatkLogging } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/Reference.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/Reference.scala similarity index 92% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/Reference.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/Reference.scala index d3ae0023adada50480a9006a55c6f2cb8e5e346e..be479a8fa972fe896fb8d45cd74708944ba4ba64 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/Reference.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/Reference.scala @@ -18,7 +18,8 @@ package nl.lumc.sasc.biopet.core import java.io.File import htsjdk.samtools.reference.IndexedFastaSequenceFile -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.Logging +import nl.lumc.sasc.biopet.utils.config.Configurable import scala.collection.JavaConversions._ @@ -73,8 +74,8 @@ trait Reference extends Configurable { val fai = new File(file.getAbsolutePath + ".fai") this match { - case c: BiopetCommandLineFunctionTrait => c.deps :::= dict :: fai :: Nil - case _ => + case c: BiopetCommandLineFunction => c.deps :::= dict :: fai :: Nil + case _ => } file @@ -99,7 +100,7 @@ trait Reference extends Configurable { /** Check fasta file if file exist and index file are there */ def checkFasta(file: File): Unit = { if (!Reference.checked.contains(file)) { - if (!file.exists()) BiopetQScript.addError(s"Reference not found: $file, species: $referenceSpecies, name: $referenceName, configValue: " + config("reference_fasta")) + if (!file.exists()) Logging.addError(s"Reference not found: $file, species: $referenceSpecies, name: $referenceName, configValue: " + config("reference_fasta")) if (dictRequired) Reference.requireDict(file) if (faiRequired) Reference.requireFai(file) diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/SampleLibraryTag.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/SampleLibraryTag.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/SampleLibraryTag.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/SampleLibraryTag.scala index 996a51ce0e04280d4fe49367cdda3793ce2e8b4a..a3317faf604a9ae80c02ad0c3d9751fbc65849b9 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/SampleLibraryTag.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/SampleLibraryTag.scala @@ -15,7 +15,7 @@ */ package nl.lumc.sasc.biopet.core -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.Argument /** diff --git a/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/ToolCommandFuntion.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/ToolCommandFuntion.scala new file mode 100644 index 0000000000000000000000000000000000000000..81220f9cb9d56703dfd7bebcddc0c14fa9966e0d --- /dev/null +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/ToolCommandFuntion.scala @@ -0,0 +1,22 @@ +package nl.lumc.sasc.biopet.core + +import nl.lumc.sasc.biopet.FullVersion + +/** + * Created by pjvanthof on 11/09/15. + */ +trait ToolCommandFuntion extends BiopetJavaCommandLineFunction { + def toolObject: Object + + override def getVersion = Some("Biopet " + FullVersion) + + override def beforeGraph(): Unit = { + javaMainClass = toolObject.getClass.getName.takeWhile(_ != '$') + super.beforeGraph() + } + + override def freezeFieldValues(): Unit = { + javaMainClass = toolObject.getClass.getName.takeWhile(_ != '$') + super.freezeFieldValues() + } +} diff --git a/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/CheckChecksum.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/CheckChecksum.scala new file mode 100644 index 0000000000000000000000000000000000000000..0ae2587f7928bb8d8cfe3e157f79fec7afff031a --- /dev/null +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/CheckChecksum.scala @@ -0,0 +1,40 @@ +package nl.lumc.sasc.biopet.core.extensions + +import java.io.File + +import nl.lumc.sasc.biopet.core.summary.WriteSummary +import org.broadinstitute.gatk.queue.function.InProcessFunction +import org.broadinstitute.gatk.utils.commandline.{ Argument, Input } + +/** + * This class checks md5sums and give an exit code 1 when md5sum is not the same + * + * Created by pjvanthof on 16/08/15. + */ +class CheckChecksum extends InProcessFunction { + @Input(required = true) + var inputFile: File = _ + + @Input(required = true) + var checksumFile: File = _ + + @Argument(required = true) + var checksum: String = _ + + override def freezeFieldValues(): Unit = { + super.freezeFieldValues() + jobOutputFile = new File(checksumFile.getParentFile, checksumFile.getName + ".check.out") + } + + /** Exits whenever the input md5sum is not the same as the output md5sum */ + def run: Unit = { + val outputChecksum = WriteSummary.parseChecksum(checksumFile).toLowerCase + + if (outputChecksum != checksum.toLowerCase) { + logger.error(s"Input file: '$inputFile' md5sum is not as expected, aborting pipeline") + + // 130 Simulates a ctr-C + Runtime.getRuntime.halt(130) + } + } +} \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Md5sum.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/Md5sum.scala similarity index 95% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Md5sum.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/Md5sum.scala index 8ad2a31cbf782f77f00dd1b7b0415d84bc264a2b..90e577e01804ac1884d71de9613e3416d837fc92 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Md5sum.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/Md5sum.scala @@ -13,12 +13,12 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.extensions +package nl.lumc.sasc.biopet.core.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for md5sum */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/PythonCommandLineFunction.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/PythonCommandLineFunction.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/PythonCommandLineFunction.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/PythonCommandLineFunction.scala index 59e4ed04c39f66e88e2f02fc0fde9b954d3efa5e..044b43676cd66cc68bcaeb00ade521def63c62de 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/PythonCommandLineFunction.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/PythonCommandLineFunction.scala @@ -13,7 +13,7 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.extensions +package nl.lumc.sasc.biopet.core.extensions import java.io.{ File, FileOutputStream } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/Logging.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/RscriptCommandLineFunction.scala similarity index 58% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/Logging.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/RscriptCommandLineFunction.scala index 4566cd4f1b26f08f2af84e31735b6514dfd10d6a..c773de6155b5a771f242dbbe83a4a21f98089eaa 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/Logging.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/extensions/RscriptCommandLineFunction.scala @@ -13,24 +13,27 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core +package nl.lumc.sasc.biopet.core.extensions -import org.apache.log4j.Logger +import java.io.{ File, FileOutputStream } -/** - * Trait to implement logger function on local class/object - */ -trait Logging { - /** - * - * @return Global biopet logger - */ - def logger = Logging.logger -} +import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction +import nl.lumc.sasc.biopet.utils.rscript.Rscript + +import scala.sys.process._ /** - * Logger object, has a global logger + * General rscript extension + * + * Created by wyleung on 17-2-15. */ -object Logging { - val logger = Logger.getRootLogger -} \ No newline at end of file +trait RscriptCommandLineFunction extends BiopetCommandLineFunction with Rscript { + + executable = rscriptExecutable + + override def beforeGraph(): Unit = { + checkScript(Some(jobTempDir)) + } + + def cmdLine: String = repeat(cmd) +} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/report/MultisampleReportBuilder.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/report/MultisampleReportBuilder.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/report/MultisampleReportBuilder.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/report/MultisampleReportBuilder.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportBuilder.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportBuilder.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportBuilder.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportBuilder.scala index e52bb1f5a9074f779f74d643cc3238ed22443d38..ae7c08e8eb1d20e68d238a5fe4bb3a1f48306626 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportBuilder.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportBuilder.scala @@ -16,10 +16,9 @@ package nl.lumc.sasc.biopet.core.report import java.io._ - -import nl.lumc.sasc.biopet.core.summary.Summary -import nl.lumc.sasc.biopet.core.{ Logging, ToolCommand, ToolCommandFuntion } -import nl.lumc.sasc.biopet.utils.IoUtils +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.summary.Summary +import nl.lumc.sasc.biopet.utils.{ ToolCommand, Logging, IoUtils } import org.broadinstitute.gatk.utils.commandline.Input import org.fusesource.scalate.{ TemplateEngine, TemplateSource } import scala.collection.mutable @@ -34,6 +33,8 @@ trait ReportBuilderExtension extends ToolCommandFuntion { /** Report builder object */ val builder: ReportBuilder + def toolObject = builder + @Input(required = true) var summaryFile: File = _ @@ -53,8 +54,8 @@ trait ReportBuilderExtension extends ToolCommandFuntion { } /** Command to generate the report */ - override def commandLine: String = { - super.commandLine + + override def cmdLine: String = { + super.cmdLine + required("--summary", summaryFile) + required("--outputDir", outputDir) + args.map(x => required("-a", x._1 + "=" + x._2)).mkString @@ -230,6 +231,7 @@ object ReportBuilder { case Some(template) => template case _ => val tempFile = File.createTempFile("ssp-template", new File(location).getName) + tempFile.deleteOnExit() IoUtils.copyStreamToFile(getClass.getResourceAsStream(location), tempFile) templateCache += location -> tempFile tempFile diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportPage.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportPage.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportPage.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportPage.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportSection.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportSection.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportSection.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/report/ReportSection.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/Summarizable.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/summary/Summarizable.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/Summarizable.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/summary/Summarizable.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/SummaryQScript.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/summary/SummaryQScript.scala similarity index 83% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/SummaryQScript.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/summary/SummaryQScript.scala index c59438133af902b84edf83b87a5e02e03f0dc58a..ab2f64546a7f79e8432ec1ac7ff0ab19427cccfc 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/SummaryQScript.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/summary/SummaryQScript.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.core.summary import java.io.File import nl.lumc.sasc.biopet.core._ -import nl.lumc.sasc.biopet.extensions.Md5sum +import nl.lumc.sasc.biopet.core.extensions.{ CheckChecksum, Md5sum } import scala.collection.mutable @@ -27,7 +27,7 @@ import scala.collection.mutable * * Created by pjvan_thof on 2/14/15. */ -trait SummaryQScript extends BiopetQScript { +trait SummaryQScript extends BiopetQScript { qscript => /** Key is sample/library, None is sample or library is not applicable */ private[summary] var summarizables: Map[(String, Option[String], Option[String]), List[Summarizable]] = Map() @@ -116,10 +116,24 @@ trait SummaryQScript extends BiopetQScript { //TODO: add more checksums types } + for (inputFile <- inputFiles) { + inputFile.md5 match { + case Some(checksum) => { + val checkMd5 = new CheckChecksum + checkMd5.inputFile = inputFile.file + require(SummaryQScript.md5sumCache.contains(inputFile.file), "Md5 job is not executed, checksum file can't be found") + checkMd5.checksumFile = SummaryQScript.md5sumCache(inputFile.file) + checkMd5.checksum = checksum + add(checkMd5) + } + case _ => + } + } + for ((_, summarizableList) <- summarizables; summarizable <- summarizableList) { summarizable match { - case f: BiopetCommandLineFunctionTrait => f.beforeGraph() - case _ => + case f: BiopetCommandLineFunction => f.beforeGraph() + case _ => } } @@ -135,7 +149,11 @@ trait SummaryQScript extends BiopetQScript { for ((_, file) <- this.summaryFiles) addChecksum(file) - add(writeSummary) + this match { + case q: MultiSampleQScript if q.onlySamples.nonEmpty && !q.samples.forall(x => q.onlySamples.contains(x._1)) => + logger.info("Write summary is skipped because sample flag is used") + case _ => add(writeSummary) + } } } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/WriteSummary.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/summary/WriteSummary.scala similarity index 93% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/WriteSummary.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/summary/WriteSummary.scala index 0dad01e5bb8e46355f150fc406914b423391a72a..6e7f8248c0693ed0df4c3a372b8885e1908187a7 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/WriteSummary.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/summary/WriteSummary.scala @@ -17,8 +17,8 @@ package nl.lumc.sasc.biopet.core.summary import java.io.{ File, PrintWriter } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, BiopetCommandLineFunctionTrait, BiopetJavaCommandLineFunction, SampleLibraryTag } +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, BiopetJavaCommandLineFunction, SampleLibraryTag } import nl.lumc.sasc.biopet.utils.ConfigUtils import nl.lumc.sasc.biopet.{ LastCommitHash, Version } import org.broadinstitute.gatk.queue.function.{ InProcessFunction, QFunction } @@ -71,16 +71,16 @@ class WriteSummary(val root: Configurable) extends InProcessFunction with Config val files = parseFiles(qscript.summaryFiles) val settings = qscript.summarySettings val executables: Map[String, Any] = { - (for (f <- qscript.functions if f.isInstanceOf[BiopetCommandLineFunctionTrait]) yield { + (for (f <- qscript.functions if f.isInstanceOf[BiopetCommandLineFunction]) yield { f match { case f: BiopetJavaCommandLineFunction => f.configName -> Map("version" -> f.getVersion.getOrElse(None), - "java_md5" -> BiopetCommandLineFunctionTrait.executableMd5Cache.getOrElse(f.executable, None), + "java_md5" -> BiopetCommandLineFunction.executableMd5Cache.getOrElse(f.executable, None), "java_version" -> f.getJavaVersion, "jar_path" -> f.jarFile) case f: BiopetCommandLineFunction => f.configName -> Map("version" -> f.getVersion.getOrElse(None), - "md5" -> BiopetCommandLineFunctionTrait.executableMd5Cache.getOrElse(f.executable, None), + "md5" -> BiopetCommandLineFunction.executableMd5Cache.getOrElse(f.executable, None), "path" -> f.executable) case _ => throw new IllegalStateException("This should not be possible") } @@ -153,10 +153,11 @@ class WriteSummary(val root: Configurable) extends InProcessFunction with Config def parseFile(file: File): Map[String, Any] = { val map: mutable.Map[String, Any] = mutable.Map() map += "path" -> file.getAbsolutePath - if (md5sum) map += "md5" -> parseChecksum(SummaryQScript.md5sumCache(file)) + if (md5sum) map += "md5" -> WriteSummary.parseChecksum(SummaryQScript.md5sumCache(file)) map.toMap } - +} +object WriteSummary { /** Retrive checksum from file */ def parseChecksum(checksumFile: File): String = { Source.fromFile(checksumFile).getLines().toList.head.split(" ")(0) diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/workaround/BiopetQCommandLine.scala b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/workaround/BiopetQCommandLine.scala similarity index 99% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/workaround/BiopetQCommandLine.scala rename to public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/workaround/BiopetQCommandLine.scala index 2092ea7324cb62997debf195189169744de26281..a2c4b8c2507e95d83618fc54e3f238ccfa009769 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/workaround/BiopetQCommandLine.scala +++ b/public/biopet-core/src/main/scala/nl/lumc/sasc/biopet/core/workaround/BiopetQCommandLine.scala @@ -50,7 +50,7 @@ import java.util import java.util.ResourceBundle import nl.lumc.sasc.biopet.FullVersion -import nl.lumc.sasc.biopet.core.Logging +import nl.lumc.sasc.biopet.utils.Logging import org.broadinstitute.gatk.queue.engine.{ QGraph, QGraphSettings } import org.broadinstitute.gatk.queue.util.{ Logging => GatkLogging, ScalaCompoundArgumentTypeDescriptor, ClassFieldCache } import org.broadinstitute.gatk.queue.{ QCommandPlugin, QScript, QScriptManager } diff --git a/public/biopet-framework/src/test/resources/log4j.properties b/public/biopet-core/src/test/resources/log4j.properties similarity index 100% rename from public/biopet-framework/src/test/resources/log4j.properties rename to public/biopet-core/src/test/resources/log4j.properties diff --git a/public/biopet-core/src/test/scala/nl/lumc/sasc/biopet/core/BiopetPipeTest.scala b/public/biopet-core/src/test/scala/nl/lumc/sasc/biopet/core/BiopetPipeTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..bac6932ddecd94dcbeb6ff6648edd5461ddced13 --- /dev/null +++ b/public/biopet-core/src/test/scala/nl/lumc/sasc/biopet/core/BiopetPipeTest.scala @@ -0,0 +1,45 @@ +package nl.lumc.sasc.biopet.core + +import org.scalatest.Matchers +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +/** + * Created by pjvanthof on 09/09/15. + */ +class BiopetPipeTest extends TestNGSuite with Matchers { + class Pipe1 extends BiopetCommandLineFunction { + val root = null + def cmdLine = "pipe1" + + (if (!inputAsStdin) " input1 " else "") + + (if (!outputAsStsout) " output1 " + "") + } + + class Pipe2 extends BiopetCommandLineFunction { + val root = null + def cmdLine = "pipe2" + + (if (!inputAsStdin) " input2 " else "") + + (if (!outputAsStsout) " output2 " + "") + } + + @Test def testPipeCommands: Unit = { + val pipe1 = new Pipe1 + val pipe2 = new Pipe2 + pipe1.commandLine.contains("pipe1") shouldBe true + pipe1.commandLine.contains("input1") shouldBe true + pipe1.commandLine.contains("output1") shouldBe true + pipe2.commandLine.contains("pipe2") shouldBe true + pipe2.commandLine.contains("input2") shouldBe true + pipe2.commandLine.contains("output2") shouldBe true + } + + @Test def testPipe: Unit = { + val pipe = new Pipe1 | new Pipe2 + pipe.commandLine.contains("pipe1") shouldBe true + pipe.commandLine.contains("input1") shouldBe true + pipe.commandLine.contains("output1") shouldBe false + pipe.commandLine.contains("pipe2") shouldBe true + pipe.commandLine.contains("input2") shouldBe false + pipe.commandLine.contains("output2") shouldBe true + } +} diff --git a/public/biopet-extensions/pom.xml b/public/biopet-extensions/pom.xml new file mode 100644 index 0000000000000000000000000000000000000000..cb5afbe9b02186ef0cf213d1ebb46f29a4148de4 --- /dev/null +++ b/public/biopet-extensions/pom.xml @@ -0,0 +1,34 @@ +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <parent> + <artifactId>Biopet</artifactId> + <groupId>nl.lumc.sasc</groupId> + <version>0.5.0-SNAPSHOT</version> + </parent> + <modelVersion>4.0.0</modelVersion> + + <artifactId>BiopetExtensions</artifactId> + + <dependencies> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetCore</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>org.testng</groupId> + <artifactId>testng</artifactId> + <version>6.8</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.scalatest</groupId> + <artifactId>scalatest_2.10</artifactId> + <version>2.2.1</version> + <scope>test</scope> + </dependency> + </dependencies> + +</project> \ No newline at end of file diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/extensions/breakdancer/breakdancer2vcf.py b/public/biopet-extensions/src/main/resources/nl/lumc/sasc/biopet/extensions/breakdancer/breakdancer2vcf.py similarity index 100% rename from public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/extensions/breakdancer/breakdancer2vcf.py rename to public/biopet-extensions/src/main/resources/nl/lumc/sasc/biopet/extensions/breakdancer/breakdancer2vcf.py diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Bgzip.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Bgzip.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Bgzip.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Bgzip.scala index 2429702c83aa40193f25be6e6fe0f4b8b05fb62e..321cb8b9c5960936da9d8f5bcac0d2fdb9937627 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Bgzip.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Bgzip.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Wrapper for the bgzip command */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Bowtie.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Bowtie.scala similarity index 67% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Bowtie.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Bowtie.scala index 6cdea7db15affd6cf871f393d7de2cf0be3cc2fb..fc2f95e4e33f54dca689568bb48cb8489ed3e65d 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Bowtie.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Bowtie.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, Reference } import org.broadinstitute.gatk.utils.commandline.{ Input, Output } @@ -47,7 +47,7 @@ class Bowtie(val root: Configurable) extends BiopetCommandLineFunction with Refe override def defaultCoreMemory = 4.0 override def defaultThreads = 8 - var sam: Boolean = config("sam", default = true) + var sam: Boolean = config("sam", default = false) var sam_RG: Option[String] = config("sam-RG") var seedlen: Option[Int] = config("seedlen") var seedmms: Option[Int] = config("seedmms") @@ -58,34 +58,37 @@ class Bowtie(val root: Configurable) extends BiopetCommandLineFunction with Refe var strata: Boolean = config("strata", default = false) var maqerr: Option[Int] = config("maqerr") var maxins: Option[Int] = config("maxins") + var largeIndex: Boolean = config("large-index", default = false) override def beforeGraph() { super.beforeGraph() if (reference == null) reference = referenceFasta() + val basename = reference.getName.stripSuffix(".fasta").stripSuffix(".fa") + if (reference.getParentFile.list().toList.filter(_.startsWith(basename)).exists(_.endsWith(".ebwtl"))) + largeIndex = config("large-index", default = true) } /** return commandline to execute */ - def cmdLine = { - required(executable) + - optional("--threads", threads) + - conditional(sam, "--sam") + - conditional(best, "--best") + - conditional(strata, "--strata") + - optional("--sam-RG", sam_RG) + - optional("--seedlen", seedlen) + - optional("--seedmms", seedmms) + - optional("-k", k) + - optional("-m", m) + - optional("--maxbts", maxbts) + - optional("--maqerr", maqerr) + - optional("--maxins", maxins) + - required(reference) + - (R2 match { - case Some(r2) => - required("-1", R1) + - optional("-2", r2) - case _ => required(R1) - }) + - " > " + required(output) - } + def cmdLine = required(executable) + + optional("--threads", threads) + + conditional(sam, "--sam") + + conditional(largeIndex, "--large-index") + + conditional(best, "--best") + + conditional(strata, "--strata") + + optional("--sam-RG", sam_RG) + + optional("--seedlen", seedlen) + + optional("--seedmms", seedmms) + + optional("-k", k) + + optional("-m", m) + + optional("--maxbts", maxbts) + + optional("--maqerr", maqerr) + + optional("--maxins", maxins) + + required(reference.getAbsolutePath.stripSuffix(".fa").stripSuffix(".fasta")) + + (R2 match { + case Some(r2) => + required("-1", R1) + + optional("-2", r2) + case _ => required(R1) + }) + + " > " + required(output) } \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cat.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cat.scala similarity index 86% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cat.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cat.scala index 7f2493d570f3575fc62b47e1850e5c4b15772a81..695b5ca3ae5e6943a4da481714a6bf1c7aa7c855 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cat.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cat.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** @@ -34,13 +34,17 @@ class Cat(val root: Configurable) extends BiopetCommandLineFunction { executable = config("exe", default = "cat") /** return commandline to execute */ - def cmdLine = required(executable) + repeat(input) + " > " + required(output) + def cmdLine = required(executable) + + (if (inputAsStdin) "" else repeat(input)) + + (if (outputAsStsout) "" else " > " + required(output)) } /** * Object for constructors for cat */ object Cat { + def apply(root: Configurable): Cat = new Cat(root) + /** * Basis constructor * @param root root object for config diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cufflinks.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cufflinks.scala similarity index 99% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cufflinks.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cufflinks.scala index 64db7b8b7d85c6a66fe6088a793c8ec762d391cd..30a1ca0f418ac0e5f3495ef75ba7478a28f5adc7 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cufflinks.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cufflinks.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cuffquant.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cuffquant.scala similarity index 99% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cuffquant.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cuffquant.scala index 3f33b314852c0e0f6c1e1524395083a238df0a28..ffadc5d2dfc6fd8bbb4c22ea099d96be344b8cdf 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cuffquant.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cuffquant.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cutadapt.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cutadapt.scala similarity index 95% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cutadapt.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cutadapt.scala index 181b92c80d835ef2b33d92ea9afc3c4288004812..43dd6826912c4390f0ac30148081536ac3ac7d05 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Cutadapt.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Cutadapt.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } @@ -33,7 +33,7 @@ class Cutadapt(val root: Configurable) extends BiopetCommandLineFunction with Su @Input(doc = "Input fastq file") var fastq_input: File = _ - @Output(doc = "Output fastq file") + @Output var fastq_output: File = _ @Output(doc = "Output statistics file") @@ -63,8 +63,8 @@ class Cutadapt(val root: Configurable) extends BiopetCommandLineFunction with Su optional("-M", opt_maximum_length) + // input / output required(fastq_input) + - required("--output", fastq_output) + - " > " + required(stats_output) + (if (outputAsStsout) "" else required("--output", fastq_output) + + " > " + required(stats_output)) /** Output summary stats */ def summaryStats: Map[String, Any] = { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Fastqc.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Fastqc.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Fastqc.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Fastqc.scala index 9e2838f21aa38c0a56ab87e3823a8ba9b83bc4dc..6f0eea34bde472b7be68ba275244681974776b64 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Fastqc.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Fastqc.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Freebayes.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Freebayes.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Freebayes.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Freebayes.scala index bf43610e42767b1c875f714c9f86470a58c1d161..0e7cc1077a5a14f6ce8f829f6fb9009bd8930e40 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Freebayes.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Freebayes.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, Reference } import org.broadinstitute.gatk.utils.commandline.{ Input, Output } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Gsnap.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Gsnap.scala similarity index 99% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Gsnap.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Gsnap.scala index ec3d03615ca8c6d6efa74e256d40beb7f7c2c348..8604639800a0bf93620c9fe790a1cbccd5b32874 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Gsnap.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Gsnap.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, Reference } import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Gzip.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Gzip.scala similarity index 81% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Gzip.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Gzip.scala index 8513e207342b7ea618406fe3f259074ef50bebe0..98bd00339c83e2a955372da3344c7a1f5debb257 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Gzip.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Gzip.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } class Gzip(val root: Configurable) extends BiopetCommandLineFunction { @@ -28,15 +28,19 @@ class Gzip(val root: Configurable) extends BiopetCommandLineFunction { @Output(doc = "Unzipped file", required = true) var output: File = _ - executable = config("exe", default = "gzip") + executable = config("exe", default = "gzip", freeVar = false) override def versionRegex = """gzip (.*)""".r override def versionCommand = executable + " --version" - def cmdLine = required(executable) + " -c " + repeat(input) + " > " + required(output) + def cmdLine = required(executable) + " -c " + + (if (inputAsStdin) "" else repeat(input)) + + (if (outputAsStsout) "" else " > " + required(output)) } object Gzip { + def apply(root: Configurable): Gzip = new Gzip(root) + def apply(root: Configurable, input: List[File], output: File): Gzip = { val gzip = new Gzip(root) gzip.input = input diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/HtseqCount.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/HtseqCount.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/HtseqCount.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/HtseqCount.scala index 829b86a7354ff4c120522a375e10776938c24e37..adf25ba3cf1caf2b159950c0457c85d2ac4c71f0 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/HtseqCount.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/HtseqCount.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Ln.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Ln.scala similarity index 91% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Ln.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Ln.scala index dfb3d699e7fe678364b80da2f28e9b212aeebd74..c066b976f21808e982900f9e8a78fcd651cfe8f6 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Ln.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Ln.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.queue.function.InProcessFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } @@ -51,20 +51,14 @@ class Ln(val root: Configurable) extends InProcessFunction with Configurable { lazy val cmd: String = { lazy val inCanonical: String = { // need to remove "/~" to correctly expand path with tilde - input.getCanonicalPath.replace("/~", "") + input.getAbsolutePath.replace("/~", "") } - lazy val outCanonical: String = { - output.getCanonicalPath.replace("/~", "") - } + lazy val outCanonical: String = output.getAbsolutePath.replace("/~", "") - lazy val inToks: Array[String] = { - inCanonical.split(File.separator) - } + lazy val inToks: Array[String] = inCanonical.split(File.separator) - lazy val outToks: Array[String] = { - outCanonical.split(File.separator) - } + lazy val outToks: Array[String] = outCanonical.split(File.separator) lazy val commonPrefixLength: Int = { val maxLength = scala.math.min(inToks.length, outToks.length) diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Pbzip2.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Pbzip2.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Pbzip2.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Pbzip2.scala index 2b042e35e95bd6c83022142aac4f51a38e313dbf..9943886d53127eca520fbd351c9cc66f9f3bba58 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Pbzip2.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Pbzip2.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for pbzip2 */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Raxml.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Raxml.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Raxml.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Raxml.scala index 71b4ec736449004a2644ec03c5403e6ebb70df63..f2f6eb21b9eaa2dc9956429906723f72f4400ee4 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Raxml.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Raxml.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } import scalaz.std.boolean.option @@ -74,7 +74,6 @@ class Raxml(val root: Configurable) extends BiopetCommandLineFunction { /** Sets correct output files to job */ override def beforeGraph() { require(w != null) - if (threads == 0) threads = getThreads(defaultThreads) executable = if (threads > 1 && executableThreads.isDefined) executableThreads.get else executableNonThreads super.beforeGraph() out :::= List(Some(getInfoFile), getBestTreeFile, getBootstrapFile, getBipartitionsFile).flatten diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/RunGubbins.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/RunGubbins.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/RunGubbins.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/RunGubbins.scala index 0f86bf810791c5c3e5106ebab025f6938e7a3b88..5334ab9cfe64a98f2d4c81745404c11006884044 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/RunGubbins.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/RunGubbins.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** @@ -34,9 +34,6 @@ class RunGubbins(val root: Configurable) extends BiopetCommandLineFunction { @Input(doc = "Fasta file", shortName = "FQ") var fastafile: File = _ - @Output(doc = "Output", shortName = "out") - var outputFiles: List[File] = Nil - @Argument(required = true) var outputDirectory: File = null diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Sha1sum.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Sha1sum.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Sha1sum.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Sha1sum.scala index 9b21f0afade3e8915ca0717f9510560e9e575a3f..ecc4a52e5d167bb48967f091a356955b91f09975 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Sha1sum.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Sha1sum.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for sha1sum */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Sickle.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Sickle.scala similarity index 92% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Sickle.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Sickle.scala index fffa435a7db5829da05b78736b62ca8d284b6be3..fe88be1adc2bda814157a0578ff9e4bfa622fb0a 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Sickle.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Sickle.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } @@ -36,7 +36,7 @@ class Sickle(val root: Configurable) extends BiopetCommandLineFunction with Summ @Input(doc = "R2 input", required = false) var input_R2: File = _ - @Output(doc = "R1 output") + @Output(doc = "R1 output", required = false) var output_R1: File = _ @Output(doc = "R2 output", required = false) @@ -48,8 +48,6 @@ class Sickle(val root: Configurable) extends BiopetCommandLineFunction with Summ @Output(doc = "stats output") var output_stats: File = _ - var fastqc: Fastqc = _ - executable = config("exe", default = "sickle", freeVar = false) var qualityType: Option[String] = config("qualitytype") var qualityThreshold: Option[Int] = config("qualityThreshold") @@ -76,15 +74,15 @@ class Sickle(val root: Configurable) extends BiopetCommandLineFunction with Summ required("-s", output_singles) } else cmd += required("se") cmd + - required("-f", input_R1) + + (if (inputAsStdin) required("-f", new File("/dev/stdin")) else required("-f", input_R1)) + required("-t", qualityType) + - required("-o", output_R1) + + (if (outputAsStsout) required("-o", new File("/dev/stdout")) else required("-o", output_R1)) + optional("-q", qualityThreshold) + optional("-l", lengthThreshold) + conditional(noFiveprime, "-x") + conditional(discardN, "-n") + - conditional(quiet, "--quiet") + - " > " + required(output_stats) + conditional(quiet || outputAsStsout, "--quiet") + + (if (outputAsStsout) "" else " > " + required(output_stats)) } /** returns stats map for summary */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Stampy.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Stampy.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Stampy.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Stampy.scala index 4db91af20df5293b400313f811904c40e8aff5b4..0ae011df060922f88fd22c22e20c45bdcee13801 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Stampy.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Stampy.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, Reference } import org.broadinstitute.gatk.utils.commandline.{ Input, Output } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Star.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Star.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Star.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Star.scala index e908a72704edc0d7acc5981c80e5a5de43a1f50d..84ec59eb9f5bc83b3c6e6b9980c4f29121a2d12d 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Star.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Star.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, Reference } import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } @@ -63,7 +63,7 @@ class Star(val root: Configurable) extends BiopetCommandLineFunction with Refere var outFileNamePrefix: String = _ var runThreadN: Option[Int] = config("runThreadN") - override def defaultCoreMemory = 4.0 + override def defaultCoreMemory = 6.0 override def defaultThreads = 8 /** Sets output files for the graph */ @@ -72,7 +72,7 @@ class Star(val root: Configurable) extends BiopetCommandLineFunction with Refere if (reference == null) reference = referenceFasta() genomeDir = config("genomeDir", new File(reference.getAbsoluteFile.getParent, "star")) if (outFileNamePrefix != null && !outFileNamePrefix.endsWith(".")) outFileNamePrefix += "." - val prefix = if (outFileNamePrefix != null) outputDir + outFileNamePrefix else outputDir + val prefix = if (outFileNamePrefix != null) outputDir + File.separator + outFileNamePrefix else outputDir + File.separator if (runmode == null) { outputSam = new File(prefix + "Aligned.out.sam") outputTab = new File(prefix + "SJ.out.tab") diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Tabix.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Tabix.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Tabix.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Tabix.scala index 3aee7ba2f859fed7fdd2f990454da2b9cf7b5e40..4a80600c52cb602c579c5b09ba27931c09f2c338 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Tabix.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Tabix.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Tophat.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Tophat.scala similarity index 95% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Tophat.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Tophat.scala index ed4ae3bf9d3880174bf36085967ce4088e549f6d..98379193eec3008a2545bd2dc3001713b1df0083 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Tophat.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Tophat.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.{ Reference, BiopetCommandLineFunction } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** @@ -264,6 +264,16 @@ class Tophat(val root: Configurable) extends BiopetCommandLineFunction with Refe var rg_platform: Option[String] = config("rg_platform") + override def beforeGraph: Unit = { + super.beforeGraph + if (bowtie1 && !new File(bowtie_index).getParentFile.list().toList + .filter(_.startsWith(new File(bowtie_index).getName)).exists(_.endsWith(".ebwt"))) + throw new IllegalArgumentException("No bowtie1 index found for tophat") + else if (!new File(bowtie_index).getParentFile.list().toList + .filter(_.startsWith(new File(bowtie_index).getName)).exists(_.endsWith(".bt2"))) + throw new IllegalArgumentException("No bowtie2 index found for tophat") + } + def cmdLine: String = required(executable) + optional("-o", output_dir) + conditional(bowtie1, "--bowtie1") + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/VariantEffectPredictor.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/VariantEffectPredictor.scala similarity index 99% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/VariantEffectPredictor.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/VariantEffectPredictor.scala index b124a08aa5944beebff8507c09646618c4bc0ab7..7a9efb0ff7087fcca7462dd6efc9d0ac7432f1b4 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/VariantEffectPredictor.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/VariantEffectPredictor.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, Reference } import org.broadinstitute.gatk.utils.commandline.{ Input, Output } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/WigToBigWig.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/WigToBigWig.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/WigToBigWig.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/WigToBigWig.scala index 34d5f899ea29599ac2a2ede30452593dc72f6c84..eaef86a39e2d02e22b49cb2a0a9a12073b841934 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/WigToBigWig.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/WigToBigWig.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Zcat.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Zcat.scala similarity index 70% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Zcat.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Zcat.scala index 96da2a82483746f628fff42d2beea4a06673164d..be5eb6700c6c1610c33c0e1f0fa132089db81bfc 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/Zcat.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/Zcat.scala @@ -18,15 +18,15 @@ package nl.lumc.sasc.biopet.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for zcat */ class Zcat(val root: Configurable) extends BiopetCommandLineFunction { - @Input(doc = "Zipped file") - var input: File = _ + @Input(doc = "Zipped file", required = true) + var input: List[File] = _ - @Output(doc = "Unzipped file") + @Output(doc = "Unzipped file", required = true) var output: File = _ executable = config("exe", default = "zcat") @@ -35,12 +35,24 @@ class Zcat(val root: Configurable) extends BiopetCommandLineFunction { override def versionCommand = executable + " --version" /** Returns command to execute */ - def cmdLine = required(executable) + required(input) + " > " + required(output) + def cmdLine = required(executable) + + (if (inputAsStdin) "" else repeat(input)) + + (if (outputAsStsout) "" else " > " + required(output)) } object Zcat { /** Returns a default zcat */ + def apply(root: Configurable): Zcat = new Zcat(root) + + /** Returns Zcat with input and output files */ def apply(root: Configurable, input: File, output: File): Zcat = { + val zcat = new Zcat(root) + zcat.input = input :: Nil + zcat.output = output + zcat + } + + def apply(root: Configurable, input: List[File], output: File): Zcat = { val zcat = new Zcat(root) zcat.input = input zcat.output = output diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bcftools/Bcftools.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bcftools/Bcftools.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bcftools/Bcftools.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bcftools/Bcftools.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bcftools/BcftoolsCall.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bcftools/BcftoolsCall.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bcftools/BcftoolsCall.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bcftools/BcftoolsCall.scala index f79d0fb562e30cfc9ebeb533feb038d854f56caa..537cd377daa4c3f731541717e5378159a9f4cdc7 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bcftools/BcftoolsCall.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bcftools/BcftoolsCall.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.bcftools import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** This extension is based on bcftools 1.1-134 */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/Bedtools.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/Bedtools.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/Bedtools.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/Bedtools.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsCoverage.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsCoverage.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsCoverage.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsCoverage.scala index 04a1525d767e6665b776ff4c7e08910fa1eb9dec..98024e79e25aa0c98624e2ff0e01783d37e35537 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsCoverage.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsCoverage.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.bedtools import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** Extension for bedtools coverage */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsGroupby.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsGroupby.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsGroupby.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsGroupby.scala index 6a7e7be7f0751f958f3d09d3c4d798f523fc100d..067a6a951b190a8fec019379a82ab7068d48aaeb 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsGroupby.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsGroupby.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.bedtools import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsIntersect.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsIntersect.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsIntersect.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsIntersect.scala index 2f01ca14d557493ad7bf22e9733ddb5171c5314d..75f179ad326939d65f01031a4e909c59565e4906 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsIntersect.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bedtools/BedtoolsIntersect.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.bedtools import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** Extension for bedtools intersect */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/Breakdancer.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/Breakdancer.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/Breakdancer.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/Breakdancer.scala index 9ccf70fa9a4d0310e65a3933933ae0150e16763b..9f662e3d7830f542a7361c11c5803385afef44ac 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/Breakdancer.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/Breakdancer.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.breakdancer import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ Reference, BiopetQScript, PipelineCommand } import org.broadinstitute.gatk.queue.QScript diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerCaller.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerCaller.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerCaller.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerCaller.scala index 4b2efd150947d8d49c21f4ee526bc26bb0177fd0..a760c10b1be47f7e5414dee9a2cf7f7aa9e4a416 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerCaller.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerCaller.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.breakdancer import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } class BreakdancerCaller(val root: Configurable) extends BiopetCommandLineFunction { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerConfig.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerConfig.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerConfig.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerConfig.scala index 6a67b3714527683ecc94505693c566164b71d55d..2b310aaf8c6b38933f4c11badedfbf7d57084bef 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerConfig.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerConfig.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.breakdancer import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } class BreakdancerConfig(val root: Configurable) extends BiopetCommandLineFunction { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerVCF.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerVCF.scala similarity index 92% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerVCF.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerVCF.scala index ce756711ac16e728f9c31a985eeb10c8623b330a..5174f66af523bf5fd3b84ade4d0cc49ce1903946 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerVCF.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/breakdancer/BreakdancerVCF.scala @@ -17,8 +17,8 @@ package nl.lumc.sasc.biopet.extensions.breakdancer import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.extensions.PythonCommandLineFunction +import nl.lumc.sasc.biopet.core.extensions.PythonCommandLineFunction +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } class BreakdancerVCF(val root: Configurable) extends PythonCommandLineFunction { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/Bwa.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/Bwa.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/Bwa.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/Bwa.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaAln.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaAln.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaAln.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaAln.scala index de4dbe5d34752cc42199818a19a946e9fedf013f..b4f69dd845db9487f85a5258deaf618cb687de46 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaAln.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaAln.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.bwa import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaMem.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaMem.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaMem.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaMem.scala index 9ba2f49c70676cc2287debfc62fc52f0f02e0a6e..5cbd73852e6a084a212a52052eebe1525bac3730 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaMem.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaMem.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.bwa import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** @@ -111,6 +111,6 @@ class BwaMem(val root: Configurable) extends Bwa with Reference { required(reference) + required(R1) + optional(R2) + - " > " + required(output) + (if (outputAsStsout) "" else " > " + required(output)) } } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSampe.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSampe.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSampe.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSampe.scala index c6afc21d0e26953632758f054d0d37f349ee0ab1..46632e010dd8ffd66ec295201a2a1d25c315f76a 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSampe.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSampe.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.bwa import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSamse.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSamse.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSamse.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSamse.scala index a2d979ca078257a4fb85f3d3591d89f6a49a74db..73ec165e16ad4f797ec5d2219c87d2a80b8f8697 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSamse.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/bwa/BwaSamse.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.bwa import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/clever/CleverCaller.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/clever/CleverCaller.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/clever/CleverCaller.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/clever/CleverCaller.scala index bb71bd5351231ec230c947d093e0adfd1ca4162c..02a0afec29287b3cda576842b36db9316ad0b401 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/clever/CleverCaller.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/clever/CleverCaller.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.clever import java.io.File import nl.lumc.sasc.biopet.core.{ Reference, BiopetCommandLineFunction } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } class CleverCaller(val root: Configurable) extends BiopetCommandLineFunction with Reference { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/Conifer.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/Conifer.scala similarity index 94% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/Conifer.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/Conifer.scala index 7d061ed819bc91c8d0fa641783ff89a50e58bd6f..1488cabc468071702f4b2c748e2a2363dac9d810 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/Conifer.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/Conifer.scala @@ -15,7 +15,7 @@ */ package nl.lumc.sasc.biopet.extensions.conifer -import nl.lumc.sasc.biopet.extensions.PythonCommandLineFunction +import nl.lumc.sasc.biopet.core.extensions.PythonCommandLineFunction abstract class Conifer extends PythonCommandLineFunction { override def subPath = "conifer" :: super.subPath diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferAnalyze.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferAnalyze.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferAnalyze.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferAnalyze.scala index d181d99b561c721b79f7f3b484d29aac07b58f93..284d0e059dea542f2550db22b086b4a4db9837da 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferAnalyze.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferAnalyze.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.conifer import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } class ConiferAnalyze(val root: Configurable) extends Conifer { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferCall.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferCall.scala similarity index 95% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferCall.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferCall.scala index e016fedc00f509ddb3cc2bd05da1d96274f4054a..7450ed1d31b184b690ae31b8c5071da2747d3bb2 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferCall.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferCall.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.conifer import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } class ConiferCall(val root: Configurable) extends Conifer { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferExport.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferExport.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferExport.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferExport.scala index afc22bd865cb8ac2d0fc82044ce471158468ebad..abc690ccd5d09881d9daf2161be8ca1b406f6934 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferExport.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferExport.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.conifer import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } class ConiferExport(val root: Configurable) extends Conifer { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferRPKM.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferRPKM.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferRPKM.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferRPKM.scala index ae4e1684e6f451ca7435a364227515cb7f67ac37..915b171d82cfba083cd7696ad487f5cebc1fe0ec 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferRPKM.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/conifer/ConiferRPKM.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.conifer import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } class ConiferRPKM(val root: Configurable) extends Conifer { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/Delly.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/Delly.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/Delly.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/Delly.scala index bac2fd131d016155b023eb059f57c3bfc4c2c8d2..b5fabe35e56555ba1723ab9270c04578b6bde23e 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/Delly.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/Delly.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.delly import java.io.File import nl.lumc.sasc.biopet.core.{ Reference, BiopetQScript, PipelineCommand } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.extensions.Ln import org.broadinstitute.gatk.queue.QScript import org.broadinstitute.gatk.queue.extensions.gatk.CatVariants diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/DellyCaller.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/DellyCaller.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/DellyCaller.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/DellyCaller.scala index da8815abb930aaff02bd2939d08ca9d5c338f1e2..8863baa50114a0c33c8d4e1d3df5c07101168284 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/DellyCaller.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/delly/DellyCaller.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.delly import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } class DellyCaller(val root: Configurable) extends BiopetCommandLineFunction { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/CombineVariants.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/CombineVariants.scala similarity index 95% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/CombineVariants.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/CombineVariants.scala index 1d2c90a5fae95869c89cd7caf8a49a83724884e1..144a91f21ae0479438fdb78f9432632d22584318 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/CombineVariants.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/CombineVariants.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.gatk import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** @@ -55,7 +55,7 @@ class CombineVariants(val root: Configurable) extends Gatk { } } - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + (for (file <- inputFiles) yield { inputMap.get(file) match { case Some(name) => required("-V:" + name, file) diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/Gatk.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/Gatk.scala similarity index 95% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/Gatk.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/Gatk.scala index a738a17100d211623eb70648ca96081532432e9c..108253762bab534ab7740512da5ec692158dd0d4 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/Gatk.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/gatk/Gatk.scala @@ -49,6 +49,8 @@ abstract class Gatk extends BiopetJavaCommandLineFunction with Reference { @Input(required = false) var pedigree: List[File] = config("pedigree", default = Nil) + var et: Option[String] = config("et") + override def versionRegex = """(.*)""".r override def versionExitcode = List(0, 1) override def versionCommand = executable + " -jar " + jarFile + " -version" @@ -61,10 +63,11 @@ abstract class Gatk extends BiopetJavaCommandLineFunction with Reference { if (reference == null) reference = referenceFasta() } - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("-T", analysisType) + required("-R", reference) + optional("-K", gatkKey) + + optional("-et", et) + repeat("-I", intervals) + repeat("-XL", excludeIntervals) + repeat("-ped", pedigree) diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/igvtools/IGVTools.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/igvtools/IGVTools.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/igvtools/IGVTools.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/igvtools/IGVTools.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/igvtools/IGVToolsCount.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/igvtools/IGVToolsCount.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/igvtools/IGVToolsCount.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/igvtools/IGVToolsCount.scala index cbca06001123644f490eb4fc598ab41151620194..7fff0430095b9ccc20123a71380eaff31c0e11c8 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/igvtools/IGVToolsCount.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/igvtools/IGVToolsCount.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.igvtools import java.io.{ File, FileNotFoundException } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/Kraken.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/Kraken.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/Kraken.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/Kraken.scala index da53b6841c64345420465d60edfd7d2a08fabeee..ae475e2ca9fe24775af0d017ca7df42e1905cd09 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/Kraken.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/Kraken.scala @@ -19,7 +19,7 @@ package nl.lumc.sasc.biopet.extensions.kraken import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for Kraken */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/KrakenReport.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/KrakenReport.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/KrakenReport.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/KrakenReport.scala index a07e315606a171eb5f3063d8fe900067b042cb61..01d0cb731e361c35c98ea515e2fbcb749025d4b4 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/KrakenReport.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/kraken/KrakenReport.scala @@ -19,7 +19,7 @@ package nl.lumc.sasc.biopet.extensions.kraken import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for Kraken */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/macs2/Macs2.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/macs2/Macs2.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/macs2/Macs2.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/macs2/Macs2.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/macs2/Macs2CallPeak.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/macs2/Macs2CallPeak.scala similarity index 99% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/macs2/Macs2CallPeak.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/macs2/Macs2CallPeak.scala index f96b9b80f452fead71e18598fcb871fe14a1c72b..10fb47713bff4ef480159bcfe4fe3eb787751d96 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/macs2/Macs2CallPeak.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/macs2/Macs2CallPeak.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.macs2 import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for macs2*/ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/AddOrReplaceReadGroups.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/AddOrReplaceReadGroups.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/AddOrReplaceReadGroups.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/AddOrReplaceReadGroups.scala index 0df7dffef9fe020014e2395bc90a1f5002791ce6..c9b30feeeb796c1b932c7aa61d4ccbd8a2c41837 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/AddOrReplaceReadGroups.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/AddOrReplaceReadGroups.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** Extension for picard AddOrReplaceReadGroups */ @@ -64,7 +64,7 @@ class AddOrReplaceReadGroups(val root: Configurable) extends Picard { var RGPI: Option[Int] = _ /** Returns command to execute */ - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) + required("SORT_ORDER=", sortOrder, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/BedToIntervalList.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/BedToIntervalList.scala similarity index 94% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/BedToIntervalList.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/BedToIntervalList.scala index 6562eb3483859e03d89783fef55d9d095919666c..5f3513fd7c22f0b81db0c739f6bb9b70237dc8d7 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/BedToIntervalList.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/BedToIntervalList.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** @@ -38,7 +38,7 @@ class BedToIntervalList(val root: Configurable) extends Picard with Reference { @Output(doc = "Output interval list", required = true) var output: File = null - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("SEQUENCE_DICTIONARY=", dict, spaceSeparated = false) + required("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) diff --git a/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/BuildBamIndex.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/BuildBamIndex.scala new file mode 100644 index 0000000000000000000000000000000000000000..392169043c9fa8e10852b4ff61694892c7531b3c --- /dev/null +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/BuildBamIndex.scala @@ -0,0 +1,24 @@ +package nl.lumc.sasc.biopet.extensions.picard + +import java.io.File + +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Output, Input } + +/** + * Created by sajvanderzeeuw on 6-10-15. + */ +class BuildBamIndex(val root: Configurable) extends Picard { + + javaMainClass = new picard.sam.BuildBamIndex().getClass.getName + + @Input(doc = "The input SAM or BAM files to analyze.", required = true) + var input: File = _ + + @Output(doc = "The output file to bam file to", required = true) + var output: File = _ + + override def cmdLine = super.cmdLine + + required("INPUT=", input, spaceSeparated = false) + + required("OUTPUT=", output, spaceSeparated = false) +} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CalculateHsMetrics.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CalculateHsMetrics.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CalculateHsMetrics.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CalculateHsMetrics.scala index ac69069e314999124488c18a4de7285c92c04d7b..ea7c37b5aef79dc59c942db3ae9127e90e853396 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CalculateHsMetrics.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CalculateHsMetrics.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } @@ -56,7 +56,7 @@ class CalculateHsMetrics(val root: Configurable) extends Picard with Summarizabl } /** Returns command to execute */ - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) + optional("REFERENCE_SEQUENCE=", reference, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetrics.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetrics.scala similarity index 91% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetrics.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetrics.scala index 3a456cede39d40a886619b6c74788426b8e1e91d..b2fb2097efefb107ab4f00c3cadde5e9285a2568 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetrics.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetrics.scala @@ -17,12 +17,13 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.core.Reference +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** Extension for picard CollectAlignmentSummaryMetrics */ -class CollectAlignmentSummaryMetrics(val root: Configurable) extends Picard with Summarizable { +class CollectAlignmentSummaryMetrics(val root: Configurable) extends Picard with Summarizable with Reference { javaMainClass = new picard.analysis.CollectAlignmentSummaryMetrics().getClass.getName @Input(doc = "The input SAM or BAM files to analyze. Must be coordinate sorted.", required = true) @@ -41,7 +42,7 @@ class CollectAlignmentSummaryMetrics(val root: Configurable) extends Picard with var output: File = _ @Argument(doc = "Reference file", required = false) - var reference: File = config("reference") + var reference: File = _ @Argument(doc = "ASSUME_SORTED", required = false) var assumeSorted: Boolean = config("assumeSorted", default = true) @@ -52,8 +53,13 @@ class CollectAlignmentSummaryMetrics(val root: Configurable) extends Picard with @Argument(doc = "STOP_AFTER", required = false) var stopAfter: Option[Long] = config("stopAfter") + override def beforeGraph(): Unit = { + super.beforeGraph() + if (reference == null) reference = referenceFasta() + } + /** Returns command to execute */ - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) + optional("REFERENCE_SEQUENCE=", reference, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectGcBiasMetrics.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectGcBiasMetrics.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectGcBiasMetrics.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectGcBiasMetrics.scala index 167aeca9b5515c3001f242f10ea7e2637c5ba383..0479bbb754d62a13e4e7d99bf6ac949b8cd69aec 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectGcBiasMetrics.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectGcBiasMetrics.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } @@ -53,6 +53,8 @@ class CollectGcBiasMetrics(val root: Configurable) extends Picard with Summariza @Argument(doc = "IS_BISULFITE_SEQUENCED", required = false) var isBisulfiteSequinced: Option[Boolean] = config("isbisulfitesequinced") + override def defaultCoreMemory = 8.0 + override def beforeGraph() { super.beforeGraph() if (outputChart == null) outputChart = new File(output + ".pdf") @@ -60,7 +62,7 @@ class CollectGcBiasMetrics(val root: Configurable) extends Picard with Summariza } /** Returns command to execute */ - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + repeat("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) + optional("CHART_OUTPUT=", outputChart, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetrics.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetrics.scala similarity index 93% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetrics.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetrics.scala index c5f4c5a8a91059efe094581bb77c262c61ccc269..88d2eab6091ef4dafa02b46395ac1c78188ee3c2 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetrics.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetrics.scala @@ -17,14 +17,15 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.core.Reference +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } import scala.collection.immutable.Nil /** Extension for picard CollectInsertSizeMetrics */ -class CollectInsertSizeMetrics(val root: Configurable) extends Picard with Summarizable { +class CollectInsertSizeMetrics(val root: Configurable) extends Picard with Summarizable with Reference { javaMainClass = new picard.analysis.CollectInsertSizeMetrics().getClass.getName @Input(doc = "The input SAM or BAM files to analyze. Must be coordinate sorted.", required = true) @@ -37,7 +38,7 @@ class CollectInsertSizeMetrics(val root: Configurable) extends Picard with Summa protected var outputHistogram: File = null @Argument(doc = "Reference file", required = false) - var reference: File = config("reference") + var reference: File = _ @Argument(doc = "DEVIATIONS", required = false) var deviations: Option[Double] = config("deviations") @@ -59,10 +60,11 @@ class CollectInsertSizeMetrics(val root: Configurable) extends Picard with Summa override def beforeGraph() { outputHistogram = new File(output + ".pdf") + if (reference == null) reference = referenceFasta() } /** Returns command to execute */ - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) + optional("HISTOGRAM_FILE=", outputHistogram, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectMultipleMetrics.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectMultipleMetrics.scala similarity index 94% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectMultipleMetrics.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectMultipleMetrics.scala index dd420840fba426c3db051ea13dd777c1db638cdb..fed464a7391ad3f618e3755c2b8addc3cfb608b1 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectMultipleMetrics.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectMultipleMetrics.scala @@ -18,7 +18,8 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File import nl.lumc.sasc.biopet.core.{ Reference, BiopetQScript } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.Logging +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.{ Summarizable, SummaryQScript } import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } @@ -32,7 +33,7 @@ class CollectMultipleMetrics(val root: Configurable) extends Picard with Summari javaMainClass = new picard.analysis.CollectMultipleMetrics().getClass.getName - override def defaultCoreMemory = 6.0 + override def defaultCoreMemory = 8.0 @Input(doc = "The input SAM or BAM files to analyze", required = true) var input: File = null @@ -53,9 +54,6 @@ class CollectMultipleMetrics(val root: Configurable) extends Picard with Summari @Argument(doc = "Stop after processing N reads", required = false) var stopAfter: Option[Long] = config("stop_after") - @Output - protected var outputFiles: List[File] = Nil - override def beforeGraph(): Unit = { super.beforeGraph() if (reference == null) reference = referenceFasta() @@ -74,11 +72,11 @@ class CollectMultipleMetrics(val root: Configurable) extends Picard with Summari case p if p == Programs.CollectBaseDistributionByCycle.toString => outputFiles :+= new File(outputName + ".base_distribution_by_cycle_metrics") outputFiles :+= new File(outputName + ".base_distribution_by_cycle.pdf") - case p => BiopetQScript.addError("Program '" + p + "' does not exist for 'CollectMultipleMetrics'") + case p => Logging.addError("Program '" + p + "' does not exist for 'CollectMultipleMetrics'") } } - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", outputName, spaceSeparated = false) + conditional(assumeSorted, "ASSUME_SORTED=true") + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectRnaSeqMetrics.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectRnaSeqMetrics.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectRnaSeqMetrics.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectRnaSeqMetrics.scala index 334741278863473708c1df29f858483d2db9c78c..2aca96da6ebe993381766f50fc699c1c04233b0d 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectRnaSeqMetrics.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectRnaSeqMetrics.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } import picard.analysis.directed.RnaSeqMetricsCollector.StrandSpecificity @@ -90,7 +90,7 @@ class CollectRnaSeqMetrics(val root: Configurable) extends Picard with Summariza "metrics" -> Picard.getMetrics(output).getOrElse(Map()), "histogram" -> Picard.getHistogram(output).getOrElse(Map())) - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("INPUT=", input, spaceSeparated = false) + required("REF_FLAT=", refFlat, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectTargetedPcrMetrics.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectTargetedPcrMetrics.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectTargetedPcrMetrics.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectTargetedPcrMetrics.scala index 07f90c676b508f2324bae8c88c8053eeb1b57e67..7ec9598e98326dd1477a9d079bf4fe09b31e75af 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectTargetedPcrMetrics.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectTargetedPcrMetrics.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } @@ -60,7 +60,7 @@ class CollectTargetedPcrMetrics(val root: Configurable) extends Picard with Summ if (reference == null) reference = referenceFasta() } - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) + required("REFERENCE_SEQUENCE=", reference, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectWgsMetrics.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectWgsMetrics.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectWgsMetrics.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectWgsMetrics.scala index 0f05a37a0fb3808b01bccade7c13d969e30abef9..6756f736623693564c344ca11da4a864d1c27977 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectWgsMetrics.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/CollectWgsMetrics.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } @@ -62,7 +62,7 @@ class CollectWgsMetrics(val root: Configurable) extends Picard with Summarizable if (reference == null) reference = referenceFasta() } - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) + required("REFERENCE_SEQUENCE=", reference, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/GatherBamFiles.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/GatherBamFiles.scala similarity index 92% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/GatherBamFiles.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/GatherBamFiles.scala index 5603024007ce18a4e979c31b1087e897820732d4..514c9394e8e6d8e663402cc03267fa70c6a2dc73 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/GatherBamFiles.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/GatherBamFiles.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } class GatherBamFiles(val root: Configurable) extends Picard { @@ -30,7 +30,7 @@ class GatherBamFiles(val root: Configurable) extends Picard { @Output(doc = "The output file to bam file to", required = true) var output: File = _ - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + repeat("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicates.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicates.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicates.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicates.scala index 33c2a0be9a6b8e65eec264c49919983397924bc4..04a61ae0f9af329ac49252f93ccad829c60b7088 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicates.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicates.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } @@ -80,7 +80,7 @@ class MarkDuplicates(val root: Configurable) extends Picard with Summarizable { } /** Returns command to execute */ - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + repeat("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) + required("METRICS_FILE=", outputMetrics, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MergeSamFiles.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MergeSamFiles.scala similarity index 89% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MergeSamFiles.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MergeSamFiles.scala index 19122634b79337d0ef67642f85700b8a75c0ad5a..5e90a5a625c992436a340a874bf60015c19b44ab 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MergeSamFiles.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/MergeSamFiles.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** Extension for picard MergeSamFiles */ @@ -45,8 +45,16 @@ class MergeSamFiles(val root: Configurable) extends Picard { @Argument(doc = "COMMENT", required = false) var comment: Option[String] = config("comment") + @Output(doc = "Bam Index", required = true) + private var outputIndex: File = _ + + override def beforeGraph() { + super.beforeGraph() + if (createIndex) outputIndex = new File(output.getAbsolutePath.stripSuffix(".bam") + ".bai") + } + /** Returns command to execute */ - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + repeat("INPUT=", input, spaceSeparated = false) + required("OUTPUT=", output, spaceSeparated = false) + required("SORT_ORDER=", sortOrder, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/Picard.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/Picard.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/Picard.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/Picard.scala index d5f3b2078b706a269e6ef4754ce35db447b39841..02678aac229d0113b19b8e2d3dfbb4eca53793ad 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/Picard.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/Picard.scala @@ -17,8 +17,8 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File -import nl.lumc.sasc.biopet.core.{ BiopetJavaCommandLineFunction, Logging } -import nl.lumc.sasc.biopet.utils.tryToParseNumber +import nl.lumc.sasc.biopet.core.BiopetJavaCommandLineFunction +import nl.lumc.sasc.biopet.utils.{ Logging, tryToParseNumber } import org.broadinstitute.gatk.utils.commandline.Argument import scala.io.Source @@ -68,7 +68,7 @@ abstract class Picard extends BiopetJavaCommandLineFunction { else super.getVersion } - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("TMP_DIR=" + jobTempDir) + optional("VERBOSITY=", verbosity, spaceSeparated = false) + conditional(quiet, "QUIET=TRUE") + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/ReorderSam.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/ReorderSam.scala similarity index 95% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/ReorderSam.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/ReorderSam.scala index 2acc114d04e5384906692967031db604687a72bd..39a4fd184fdb73ea853e0b4a0419151773a377dc 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/ReorderSam.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/ReorderSam.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } class ReorderSam(val root: Configurable) extends Picard with Reference { @@ -45,7 +45,7 @@ class ReorderSam(val root: Configurable) extends Picard with Reference { if (reference == null) reference = referenceFasta() } - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + conditional(allowIncompleteDictConcordance, "ALLOW_INCOMPLETE_DICT_CONCORDANCE=TRUE") + conditional(allowContigLengthDiscordance, "ALLOW_CONTIG_LENGTH_DISCORDANCE=TRUE") + required("REFERENCE=", reference, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SamToFastq.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SamToFastq.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SamToFastq.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SamToFastq.scala index 31445cf9ca1545318d77a4e021a9771d87fa15fb..b686e80225de8203af40b765764a0c1021e4446d 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SamToFastq.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SamToFastq.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** Extension for picard SamToFastq */ @@ -73,7 +73,7 @@ class SamToFastq(val root: Configurable) extends Picard { var includeNonPrimaryAlignments: Boolean = config("includeNonPrimaryAlignments", default = false) /** Returns command to execute */ - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + required("INPUT=", input, spaceSeparated = false) + required("FASTQ=", fastqR1, spaceSeparated = false) + optional("SECOND_END_FASTQ=", fastqR2, spaceSeparated = false) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SortSam.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SortSam.scala similarity index 75% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SortSam.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SortSam.scala index 918ad656168c537644bc9a30c1b288fcdf2e38df..d9094ee5fa15c8caacd9a2c4823ccda7734c97e6 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SortSam.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/picard/SortSam.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.picard import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** Extension for picard SortSam */ @@ -31,31 +31,32 @@ class SortSam(val root: Configurable) extends Picard { var output: File = _ @Argument(doc = "Sort order of output file Required. Possible values: {unsorted, queryname, coordinate} ", required = true) - var sortOrder: String = _ + var sortOrder: String = config("sort_order", default = "coordinate") @Output(doc = "Bam Index", required = true) private var outputIndex: File = _ override def beforeGraph() { super.beforeGraph() + if (outputAsStsout) createIndex = false if (createIndex) outputIndex = new File(output.getAbsolutePath.stripSuffix(".bam") + ".bai") } /** Returns command to execute */ - override def commandLine = super.commandLine + - required("INPUT=", input, spaceSeparated = false) + - required("OUTPUT=", output, spaceSeparated = false) + + override def cmdLine = super.cmdLine + + (if (inputAsStdin) required("INPUT=", new File("/dev/stdin"), spaceSeparated = false) + else required("INPUT=", input, spaceSeparated = false)) + + (if (outputAsStsout) required("OUTPUT=", new File("/dev/stdout"), spaceSeparated = false) + else required("OUTPUT=", output, spaceSeparated = false)) + required("SORT_ORDER=", sortOrder, spaceSeparated = false) } object SortSam { /** Returns default SortSam */ - def apply(root: Configurable, input: File, output: File, sortOrder: String = null): SortSam = { + def apply(root: Configurable, input: File, output: File): SortSam = { val sortSam = new SortSam(root) sortSam.input = input sortSam.output = output - if (sortOrder == null) sortSam.sortOrder = "coordinate" - else sortSam.sortOrder = sortOrder sortSam } } \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/Sambamba.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/Sambamba.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/Sambamba.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/Sambamba.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaFlagstat.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaFlagstat.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaFlagstat.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaFlagstat.scala index c79ca9bd313015bd3ba845d9ed26abe134f2438a..e6ee2e0c1c369ea99f58a41bf273406d04fdbc33 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaFlagstat.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaFlagstat.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.sambamba import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for sambemba flagstat */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaIndex.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaIndex.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaIndex.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaIndex.scala index 9100127039f55ea12b4e59fbf6b5b6d12216170b..7be1ce5272bead43f9300a63e05834355e998325 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaIndex.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaIndex.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.sambamba import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for sambemba index */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMarkdup.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMarkdup.scala similarity index 95% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMarkdup.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMarkdup.scala index be27127bf32525c65f34fe5968de9897d8d1d76d..2f89774db35e5f9d6f22518aa2ff3588f70d053f 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMarkdup.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMarkdup.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.sambamba import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for sambemba markdup */ @@ -60,7 +60,7 @@ object SambambaMarkdup { } def apply(root: Configurable, input: File): SambambaMarkdup = { - apply(root, input, new File(swapExtension(input.getCanonicalPath))) + apply(root, input, new File(swapExtension(input.getAbsolutePath))) } private def swapExtension(inputFile: String) = inputFile.stripSuffix(".bam") + ".dedup.bam" diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMerge.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMerge.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMerge.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMerge.scala index 7f3f567b30b908123da709861c36bb14e16da3fe..83464fa4972e6f1aa3b9f74733ff8589985e91d8 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMerge.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaMerge.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.sambamba import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for sambemba merge */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaView.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaView.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaView.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaView.scala index ee1dca3bd4a32d7f7f7ad0c5d934e2186614b1d1..4a012d22950898d662c6285ddf47fbec06e6bce1 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaView.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/sambamba/SambambaView.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.sambamba import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for sambamba flagstat */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/Samtools.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/Samtools.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/Samtools.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/Samtools.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsFlagstat.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsFlagstat.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsFlagstat.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsFlagstat.scala index 4a86970d6c93b05d0313f694f71e7b56469c7e22..2035cbdb6231b01fe611310928abbf2053e4d04e 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsFlagstat.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsFlagstat.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.samtools import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for samtools flagstat */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsMpileup.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsMpileup.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsMpileup.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsMpileup.scala index efb851cca152a4fd76d1f7fa4502e774f47b4d66..449b49cf0fbc3e1620a804347ce3d61790210183 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsMpileup.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsMpileup.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.samtools import java.io.File import nl.lumc.sasc.biopet.core.Reference -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for samtools mpileup */ diff --git a/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsSort.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsSort.scala new file mode 100644 index 0000000000000000000000000000000000000000..5df2a3a7b47f5771dcdbd033ef2c273ba70ed89f --- /dev/null +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsSort.scala @@ -0,0 +1,37 @@ +package nl.lumc.sasc.biopet.extensions.samtools + +import java.io.File + +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +/** + * Created by pjvanthof on 22/09/15. + */ +class SamtoolsSort(val root: Configurable) extends Samtools { + + @Input(required = true) + var input: File = _ + + @Output + var output: File = _ + + var compresion: Option[Int] = config("l") + var outputFormat: Option[String] = config("O") + var sortByName: Boolean = config("sort_by_name", default = false) + var prefix: String = _ + + override def beforeGraph(): Unit = { + super.beforeGraph() + prefix = config("prefix", default = new File(System.getProperty("java.io.tmpdir"), output.getName)) + } + + def cmdLine = required(executable) + required("sort") + + optional("-m", (coreMemeory + "G")) + + optional("-@", threads) + + optional("-O", outputFormat) + + required("-T", prefix) + + conditional(sortByName, "-n") + + (if (outputAsStsout) "" else required("-o", output)) + + (if (inputAsStdin) "" else required(input)) +} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsView.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsView.scala similarity index 94% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsView.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsView.scala index f340dec710db4ad5297cb4ea8168e4896f44b97e..09e8bff4212602d0689ea4f43eeee947b29c870e 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsView.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/samtools/SamtoolsView.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.samtools import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Extension for samtools view */ @@ -28,7 +28,7 @@ class SamtoolsView(val root: Configurable) extends Samtools { @Output(doc = "output File") var output: File = null - var quality: Option[Int] = config("quality") + var q: Option[Int] = config("q") var b: Boolean = config("b", default = false) var h: Boolean = config("h", default = false) var f: List[String] = config("f", default = List.empty[String]) @@ -36,7 +36,7 @@ class SamtoolsView(val root: Configurable) extends Samtools { def cmdBase = required(executable) + required("view") + - optional("-q", quality) + + optional("-q", q) + repeat("-f", f) + repeat("-F", F) + conditional(b, "-b") + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/seqtk/Seqtk.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/seqtk/Seqtk.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/seqtk/Seqtk.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/seqtk/Seqtk.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/seqtk/SeqtkSeq.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/seqtk/SeqtkSeq.scala similarity index 91% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/seqtk/SeqtkSeq.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/seqtk/SeqtkSeq.scala index 04dfce893586d731a7fb07adf2d86e13822824cd..d69faa092303665a1c493686f3a1752010a9ef2a 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/seqtk/SeqtkSeq.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/seqtk/SeqtkSeq.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.seqtk import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } @@ -25,14 +25,14 @@ import org.broadinstitute.gatk.utils.commandline.{ Input, Output } * Wrapper for the seqtk seq subcommand. * Written based on seqtk version 1.0-r63-dirty. */ -class SeqtkSeq(val root: Configurable) extends Seqtk with Summarizable { +class SeqtkSeq(val root: Configurable) extends Seqtk { /** input file */ - @Input(doc = "Input file (FASTQ or FASTA)") + @Input(doc = "Input file (FASTQ or FASTA)", required = true) var input: File = _ /** output file */ - @Output(doc = "Output file") + @Output(doc = "Output file", required = true) var output: File = _ /** mask bases with quality lower than INT [0] */ @@ -106,8 +106,8 @@ class SeqtkSeq(val root: Configurable) extends Seqtk with Summarizable { conditional(flag1, "-1") + conditional(flag2, "-2") + conditional(V, "-V") + - required(input) + - " > " + required(output) + (if (inputAsStdin) "" else required(input)) + + (if (outputAsStsout) "" else " > " + required(output)) } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/varscan/Mpileup2cns.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/varscan/Mpileup2cns.scala similarity index 94% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/varscan/Mpileup2cns.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/varscan/Mpileup2cns.scala index 0bfd114d9c7c22faad81cdd57232de6ff08a695e..0379c36d9ace680b7833bf226912504bb619f8e2 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/varscan/Mpileup2cns.scala +++ b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/varscan/Mpileup2cns.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.extensions.varscan import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } class Mpileup2cns(val root: Configurable) extends Varscan { @@ -47,8 +47,8 @@ class Mpileup2cns(val root: Configurable) extends Varscan { variants.foreach { case v => require(validValues.contains(v), "variants value must be either 0 or 1") } } - override def commandLine = { - val baseCommand = super.commandLine + required("mpileup2cns") + + override def cmdLine = { + val baseCommand = super.cmdLine + required("mpileup2cns") + required("", input) + required("--min-coverage", minCoverage) + required("--min-reads2", minReads2) + diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/varscan/Varscan.scala b/public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/varscan/Varscan.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/varscan/Varscan.scala rename to public/biopet-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/varscan/Varscan.scala diff --git a/public/yamsvp/src/test/resources/log4j.properties b/public/biopet-extensions/src/test/resources/log4j.properties similarity index 100% rename from public/yamsvp/src/test/resources/log4j.properties rename to public/biopet-extensions/src/test/resources/log4j.properties diff --git a/public/biopet-framework/src/test/resources/picard.alignmentMetrics b/public/biopet-extensions/src/test/resources/nl/lumc/sasc/biopet/extensions/picard/picard.alignmentMetrics similarity index 100% rename from public/biopet-framework/src/test/resources/picard.alignmentMetrics rename to public/biopet-extensions/src/test/resources/nl/lumc/sasc/biopet/extensions/picard/picard.alignmentMetrics diff --git a/public/biopet-framework/src/test/resources/picard.dedup.metrics b/public/biopet-extensions/src/test/resources/nl/lumc/sasc/biopet/extensions/picard/picard.dedup.metrics similarity index 100% rename from public/biopet-framework/src/test/resources/picard.dedup.metrics rename to public/biopet-extensions/src/test/resources/nl/lumc/sasc/biopet/extensions/picard/picard.dedup.metrics diff --git a/public/biopet-framework/src/test/resources/picard.insertsizemetrics b/public/biopet-extensions/src/test/resources/nl/lumc/sasc/biopet/extensions/picard/picard.insertsizemetrics similarity index 100% rename from public/biopet-framework/src/test/resources/picard.insertsizemetrics rename to public/biopet-extensions/src/test/resources/nl/lumc/sasc/biopet/extensions/picard/picard.insertsizemetrics diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/GsnapTest.scala b/public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/GsnapTest.scala similarity index 75% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/GsnapTest.scala rename to public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/GsnapTest.scala index 1ffd257e0a53c23ce37812601e8b63464cc2dd37..8dceeb52502cf5b150bc3fa4e41ccdb9cb6aebae 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/GsnapTest.scala +++ b/public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/GsnapTest.scala @@ -15,7 +15,7 @@ */ package nl.lumc.sasc.biopet.extensions -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import org.scalatest.Matchers import org.scalatest.testng.TestNGSuite import org.testng.SkipException @@ -25,17 +25,10 @@ import scala.sys.process.{ Process, ProcessLogger } class GsnapTest extends TestNGSuite with Matchers { - private def setConfig(key: String, value: String): Map[String, Any] = { - val oldMap: Map[String, Any] = Config.global.map - Config.global.map += (key -> value) - oldMap - } - - private def restoreConfig(oldMap: Map[String, Any]): Unit = Config.global.map = oldMap - @BeforeClass def checkExecutable() = { - val oldMap = setConfig("db", "mock") - val wrapper = new Gsnap(null) + val wrapper = new Gsnap(null) { + override def globalConfig = new Config(Map("db" -> "mock")) + } val proc = Process(wrapper.versionCommand) val exitCode = try { @@ -47,13 +40,12 @@ class GsnapTest extends TestNGSuite with Matchers { } if (exitCode != 0) throw new SkipException("Skipping GSNAP test because the executable can not be found") - restoreConfig(oldMap) } @Test(description = "GSNAP version number capture from executable") def testVersion() = { - val oldMap = setConfig("db", "mock") - new Gsnap(null).getVersion should not be "N/A" - restoreConfig(oldMap) + new Gsnap(null) { + override def globalConfig = new Config(Map("db" -> "mock")) + }.getVersion should not be "N/A" } } diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/HtseqCountTest.scala b/public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/HtseqCountTest.scala similarity index 100% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/HtseqCountTest.scala rename to public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/HtseqCountTest.scala diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/LnTest.scala b/public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/LnTest.scala similarity index 100% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/LnTest.scala rename to public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/LnTest.scala diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetricsTest.scala b/public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetricsTest.scala similarity index 92% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetricsTest.scala rename to public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetricsTest.scala index 49edf90b2eaf52f3c48757c060a705a9a72d2f77..48a6bade41d2fdd65faacceb40c6dc750cac841f 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetricsTest.scala +++ b/public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectAlignmentSummaryMetricsTest.scala @@ -31,7 +31,7 @@ class CollectAlignmentSummaryMetricsTest extends TestNGSuite with Matchers { @Test def summaryData(): Unit = { - val file = new File(Paths.get(getClass.getResource("/picard.alignmentMetrics").toURI).toString) + val file = new File(Paths.get(getClass.getResource("picard.alignmentMetrics").toURI).toString) val job = new CollectAlignmentSummaryMetrics(null) job.output = file diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetricsTest.scala b/public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetricsTest.scala similarity index 92% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetricsTest.scala rename to public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetricsTest.scala index 52a0f2d3f39ad72151b9ff4d445aa609e82bbfc2..88f2b330f7e70d2da8f48c222408500d2cb83ea0 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetricsTest.scala +++ b/public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/CollectInsertSizeMetricsTest.scala @@ -31,7 +31,7 @@ class CollectInsertSizeMetricsTest extends TestNGSuite with Matchers { @Test def summaryData(): Unit = { - val file = new File(Paths.get(getClass.getResource("/picard.insertsizemetrics").toURI).toString) + val file = new File(Paths.get(getClass.getResource("picard.insertsizemetrics").toURI).toString) val job = new CollectInsertSizeMetrics(null) job.output = file diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicatesTest.scala b/public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicatesTest.scala similarity index 92% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicatesTest.scala rename to public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicatesTest.scala index 9b6d9b575ea5d0a2a5a69cb77f2fa575926a2294..23cfcddc5f3ff58a15e867a853e760684794d4c9 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicatesTest.scala +++ b/public/biopet-extensions/src/test/scala/nl/lumc/sasc/biopet/extensions/picard/MarkDuplicatesTest.scala @@ -31,7 +31,7 @@ class MarkDuplicatesTest extends TestNGSuite with Matchers { @Test def summaryData(): Unit = { - val file = new File(Paths.get(getClass.getResource("/picard.dedup.metrics").toURI).toString) + val file = new File(Paths.get(getClass.getResource("picard.dedup.metrics").toURI).toString) val job = new MarkDuplicates(null) job.outputMetrics = file diff --git a/public/biopet-framework/.gitignore b/public/biopet-framework/.gitignore deleted file mode 100644 index a6f89c2da7a029afa02b6e7a2bf80ad34958a311..0000000000000000000000000000000000000000 --- a/public/biopet-framework/.gitignore +++ /dev/null @@ -1 +0,0 @@ -/target/ \ No newline at end of file diff --git a/public/biopet-framework/README.md b/public/biopet-framework/README.md deleted file mode 100644 index 3dc04d5cb35ab582ccae7d07a8bf73abbb99db8b..0000000000000000000000000000000000000000 --- a/public/biopet-framework/README.md +++ /dev/null @@ -1,9 +0,0 @@ -Biopet Framework -======================= -Framework to build pipelines with - - -License -=== - -A dual licensing mode is applied. The source code within this project is freely available for non-commercial use under an AGPL license; For commercial users or users who do not want to follow the AGPL license, please contact sasc@lumc.nl to purchase a separate license. diff --git a/public/biopet-framework/examples/bam-metrics.json b/public/biopet-framework/examples/bam-metrics.json deleted file mode 100644 index 87ee73e04005c5847e414b544e9179f02a742eab..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/bam-metrics.json +++ /dev/null @@ -1,6 +0,0 @@ -{ - "reference": "bla", - "bedtools": { "exe": "test"}, - "samtools": { "exe": "test"}, - "targetBed": ["target_1", "target_2"] -} diff --git a/public/biopet-framework/examples/biopet-defaults.json b/public/biopet-framework/examples/biopet-defaults.json deleted file mode 100644 index 1f505ce90551039e629943430b55cae644b30517..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/biopet-defaults.json +++ /dev/null @@ -1,22 +0,0 @@ -{ - "genotypegvcfs": { "scattercount": 100 }, - "variantannotator": { "scattercount": 10 }, - "realignertargetcreator": { "scattercount": 30 }, - "combinevariants": { "scattercount": 10 }, - "printreads_temp": { "scattercount": 30 }, - "indelrealigner": { "scattercount": 30 }, - "haplotypecaller": { "scattercount": 100 }, - "unifiedgenotyper": { "scattercount": 100 }, - "baserecalibrator": { "scattercount": 30 }, - "basty": { - "haplotypecaller": { "scattercount": 20 }, - "unifiedgenotyper": { "scattercount": 1 }, - "multisample": { "unifiedgenotyper": { "scattercount": 100 } }, - "baserecalibrator": { "scattercount": 1 }, - "indelrealigner": { "scattercount": 1 }, - "printreads_temp": { "scattercount": 1 }, - "realignertargetcreator": { "scattercount": 1 }, - "genotypegvcfs": { "scattercount": 1 }, - "combinevariants": { "scattercount": 1 } - } -} diff --git a/public/biopet-framework/examples/flexiprep.json b/public/biopet-framework/examples/flexiprep.json deleted file mode 100644 index 177cfdd255b702361588cadd76215939d0ef6372..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/flexiprep.json +++ /dev/null @@ -1,8 +0,0 @@ -{ - "fastqc": { "exe": "/data/DIV5/SASC/common/programs/FastQC/fastqc_v0.11.2/fastqc" }, - "flexiprep": { - "seqtk": {"exe":"/data/DIV5/SASC/common/programs/seqtk/seqtk/seqtk"}, - "cutadapt": {"exe":"/home/pjvan_thof/.local/bin/cutadapt"}, - "sickle": {"exe":"/data/DIV5/SASC/pjvan_thof/bin/sickle"} - } -} diff --git a/public/biopet-framework/examples/gatk-benchmark-genotyping.json b/public/biopet-framework/examples/gatk-benchmark-genotyping.json deleted file mode 100644 index 89d0d7d92a4cb8fc86a9d0c8d2995227cf7876b0..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/gatk-benchmark-genotyping.json +++ /dev/null @@ -1,9 +0,0 @@ -{ - "gvcffiles": ["test4.vcf", "test5.vcf"], - "reference" : "/data/DIV5/SASC/common/gatk_bundle_2.8/hg19/ucsc.hg19.fasta", - "dbsnp": "bla", - "haplotypecaller": { - "stand_call_conf": 20, - "stand_emit_conf": 20 - } -} diff --git a/public/biopet-framework/examples/gatk-genotypeing.json b/public/biopet-framework/examples/gatk-genotypeing.json deleted file mode 100644 index 083b938251d1ef0601736b9f1d02a60aa119bff3..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/gatk-genotypeing.json +++ /dev/null @@ -1,8 +0,0 @@ -{ - "reference" : "/data/DIV5/SASC/common/gatk_bundle_2.8/hg19/ucsc.hg19.fasta", - "dbsnp": "bla", - "haplotypecaller": { - "stand_call_conf": 20, - "stand_emit_conf": 20 - } -} diff --git a/public/biopet-framework/examples/gatk-pipeline.json b/public/biopet-framework/examples/gatk-pipeline.json deleted file mode 100644 index b9b1ef2b6190bddedd5ae4c94570834ab4cb7fc5..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/gatk-pipeline.json +++ /dev/null @@ -1,40 +0,0 @@ -{ - "reference" : "/data/DIV5/SASC/common/gatk_bundle_2.8/hg19/ucsc.hg19.fasta", - "reference" : "/data/DIV5/SASC/common/gatk_bundle_2.8/hg19/ucsc.hg19.fasta", - "samtools": { "exe": "test"}, - "fastqc": { "exe": "/home/pjvan_thof/Downloads/FastQC/fastqc" }, - "flexiprep": { - "seqtk": {"exe":"/data/DIV5/SASC/common/programs/seqtk/seqtk/seqtk"}, - "cutadapt": {"exe":"/home/pjvan_thof/.local/bin/cutadapt"}, - "sickle": {"exe":"/data/DIV5/SASC/pjvan_thof/bin/sickle"} - }, - "star" : {"exe":"test"}, - "bwa" : {"exe":"test"}, - "gatk": { - "mapping": { - "flexiprep": { - } - }, - "reference" : "/data/DIV5/SASC/common/gatk_bundle_2.8/hg19/ucsc.hg19.fasta", - "dbsnp": "test", - "hapmap": "test", - "omni": "test", - "1000G": "test", - "mills": "test", - "haplotypecaller": { - "stand_call_conf": 20, - "stand_emit_conf": 20 - } - }, - "cutadapt": {"exe":"test"}, - "samples": { - "test": { - "libraries": { - "3" : { - "bam" : "/data/DIV5/SASC/project-072-vcf_Comparison/analysis/runs/01/losekoot_redo_FC59b_L5_I12_S41/run_01/losekoot_redo_FC59b_L5_I12_S41-01.dedup.bam" - } - } - } - }, - "correct_readgroups": true -} diff --git a/public/biopet-framework/examples/gatk-variantcalling.json b/public/biopet-framework/examples/gatk-variantcalling.json deleted file mode 100644 index 0cab6e62f5971859232cdb867cf33d5f2893fe8d..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/gatk-variantcalling.json +++ /dev/null @@ -1,9 +0,0 @@ -{ - "reference" : "/data/DIV5/SASC/common/gatk_bundle_2.8/hg19/ucsc.hg19.fasta", - "dbsnp": "bla", - "haplotypecaller": { - "stand_call_conf": 20, - "stand_emit_conf": 20 - }, - "scattercount": 10 -} diff --git a/public/biopet-framework/examples/gatk-vcf-sample-compare.json b/public/biopet-framework/examples/gatk-vcf-sample-compare.json deleted file mode 100644 index 89d0d7d92a4cb8fc86a9d0c8d2995227cf7876b0..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/gatk-vcf-sample-compare.json +++ /dev/null @@ -1,9 +0,0 @@ -{ - "gvcffiles": ["test4.vcf", "test5.vcf"], - "reference" : "/data/DIV5/SASC/common/gatk_bundle_2.8/hg19/ucsc.hg19.fasta", - "dbsnp": "bla", - "haplotypecaller": { - "stand_call_conf": 20, - "stand_emit_conf": 20 - } -} diff --git a/public/biopet-framework/examples/mapping.json b/public/biopet-framework/examples/mapping.json deleted file mode 100644 index 5b675451afec13dd438ae1283f1b98dd252e6a4f..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/mapping.json +++ /dev/null @@ -1,12 +0,0 @@ -{ - "samtools": {"exe": "test"}, - "reference": "/blabla/blabla.fa", - "fastqc": { "exe": "/home/pjvan_thof/Downloads/FastQC/fastqc" }, - "bwa": { "exe": "test" }, - "flexiprep": { - "fastqc": { "exe": "/home/pjvan_thof/Downloads/FastQC/fastqc" }, - "seqtk": {"exe":"/data/DIV5/SASC/common/programs/seqtk/seqtk/seqtk"}, - "cutadapt": {"exe":"/home/pjvan_thof/.local/bin/cutadapt"}, - "sickle": {"exe":"/data/DIV5/SASC/pjvan_thof/bin/sickle"} - } -} diff --git a/public/biopet-framework/examples/sage.json b/public/biopet-framework/examples/sage.json deleted file mode 100644 index 44d57a292f77d1a0dbdc659b1fab36543cfa3017..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/sage.json +++ /dev/null @@ -1,23 +0,0 @@ -{ - "bedtools": {"exe": "test"}, - "samtools": {"exe": "test"}, - "reference": "/blabla/blabla.fa", - "fastqc": { "exe": "/home/pjvan_thof/Downloads/FastQC/fastqc" }, - "bwa": { "exe": "test" }, - "flexiprep": { - "fastqc": { "exe": "/home/pjvan_thof/Downloads/FastQC/fastqc" }, - "seqtk": {"exe":"/data/DIV5/SASC/common/programs/seqtk/seqtk/seqtk"}, - "cutadapt": {"exe":"/home/pjvan_thof/.local/bin/cutadapt"}, - "sickle": {"exe":"/data/DIV5/SASC/pjvan_thof/bin/sickle"} - }, - "samples": { - "test": { - "libraries": { - "1": { - "R1": "test.fastq" - } - } - } - }, - "bowtie": {"exe": "test"} -} diff --git a/public/biopet-framework/examples/shark_apps.json b/public/biopet-framework/examples/shark_apps.json deleted file mode 100644 index d5bd63e020b98f0a01e6e13fc91f556b7578038d..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/shark_apps.json +++ /dev/null @@ -1,36 +0,0 @@ -{ - "bwa": { - "exe": "/usr/local/bwa/bwa-0.7.10/bwa" - }, - "seqtk": { - "exe":"/data/DIV5/SASC/common/programs/seqtk/seqtk/seqtk" - }, - "sickle": { - "exe":"/data/DIV5/SASC/common/programs/sickle/sickle-1.33/sickle" - }, - "clever": { - "exe": "/data/DIV5/SASC/common/programs/clever/clever-toolkit-v2.0rc3/bin/clever", - "version_exe": "/data/DIV5/SASC/common/programs/clever/clever-toolkit-v2.0rc3/bin/ctk-version" - }, - "pindel": { - "exe": "/data/DIV5/SASC/common/programs/pindel/pindel-0.2.5/pindel" - }, - "breakdancerconfig": { - "exe": "/data/DIV5/SASC/common/programs/breakdancer/breakdancer-v1.4.4/lib/breakdancer-max1.4.4/bam2cfg.pl" - }, - "breakdancercaller": { - "exe": "/data/DIV5/SASC/common/programs/breakdancer/breakdancer-v1.4.4/bin/breakdancer-max" - }, - "fastqc": { - "exe": "/usr/local/FastQC/FastQC_v0.10.1/fastqc" - }, - "seqstat": { - "exe": "/data/DIV5/SASC/common/programs/dQual/fastq-seqstat" - }, - "stampy": { - "exe": "/usr/local/stampy/stampy-1.0.23/stampy.py" - }, - "sambamba": { - "exe": "/data/DIV5/SASC/common/programs/sambamba/sambamba-0.4.7/build/sambamba" - } -} diff --git a/public/biopet-framework/examples/summaryformat.json b/public/biopet-framework/examples/summaryformat.json deleted file mode 100644 index be47260b2aec4da6c2b04c46006c6509fa6b8a09..0000000000000000000000000000000000000000 --- a/public/biopet-framework/examples/summaryformat.json +++ /dev/null @@ -1,33 +0,0 @@ -{ - "_meta": {}, - "stats": {}, - "resources": { - "res_key": { - "uri:": "", - "md5sum": "", - "sha256sum": "", - "adler32sum": "" - } - }, - "samples" :{ - "SampleID": { - "stats": {}, - "resources": {}, - "libraries": { - "libraryID": { - "stats": {}, - "resources": {} - } - } - }, "SampleID2": { - "stats": {}, - "resources": {}, - "libraries": { - "libraryID": { - "stats": {}, - "resources": {} - } - } - } - } -} diff --git a/public/biopet-framework/pom.xml b/public/biopet-framework/pom.xml deleted file mode 100644 index e329f6137a600ff733179603a791aa2967759d80..0000000000000000000000000000000000000000 --- a/public/biopet-framework/pom.xml +++ /dev/null @@ -1,123 +0,0 @@ -<!-- - - Biopet is built on top of GATK Queue for building bioinformatic - pipelines. It is mainly intended to support LUMC SHARK cluster which is running - SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - should also be able to execute Biopet tools and pipelines. - - Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - - Contact us at: sasc@lumc.nl - - A dual licensing mode is applied. The source code within this project that are - not part of GATK Queue is freely available for non-commercial use under an AGPL - license; For commercial users or users who do not want to follow the AGPL - license, please contact us to obtain a separate license. - ---> -<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" - xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> - <modelVersion>4.0.0</modelVersion> - - <artifactId>BiopetFramework</artifactId> - <packaging>jar</packaging> - - <parent> - <groupId>nl.lumc.sasc</groupId> - <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> - <relativePath>../</relativePath> - </parent> - - <inceptionYear>2014</inceptionYear> - <name>BiopetFramework</name> - - <repositories> - <repository> - <id>biojava-maven-repo</id> - <name>BioJava repository</name> - <url>http://www.biojava.org/download/maven/</url> - </repository> - </repositories> - <dependencies> - <dependency> - <groupId>org.testng</groupId> - <artifactId>testng</artifactId> - <version>6.8</version> - <scope>test</scope> - </dependency> - <dependency> - <groupId>org.mockito</groupId> - <artifactId>mockito-all</artifactId> - <version>1.9.5</version> - <scope>test</scope> - </dependency> - <dependency> - <groupId>org.scalatest</groupId> - <artifactId>scalatest_2.10</artifactId> - <version>2.2.1</version> - <scope>test</scope> - </dependency> - <dependency> - <groupId>org.scala-lang</groupId> - <artifactId>scala-library</artifactId> - <version>2.10.2</version> - </dependency> - <dependency> - <groupId>org.broadinstitute.gatk</groupId> - <artifactId>gatk-queue</artifactId> - <version>3.4</version> - </dependency> - <dependency> - <groupId>org.broadinstitute.gatk</groupId> - <artifactId>gatk-queue-extensions-public</artifactId> - <version>3.4</version> - </dependency> - <dependency> - <groupId>org.broadinstitute.gatk</groupId> - <artifactId>gatk-utils</artifactId> - <version>3.4</version> - <exclusions> - <exclusion> - <groupId>org.broadinstitute.gatk</groupId> - <artifactId>gsalib</artifactId> - </exclusion> - </exclusions> - </dependency> - <dependency> - <groupId>io.argonaut</groupId> - <artifactId>argonaut_2.10</artifactId> - <version>6.1-M4</version> - </dependency> - <dependency> - <groupId>org.biojava</groupId> - <artifactId>biojava3-core</artifactId> - <version>3.1.0</version> - </dependency> - <dependency> - <groupId>org.biojava</groupId> - <artifactId>biojava3-sequencing</artifactId> - <version>3.1.0</version> - </dependency> - <dependency> - <groupId>com.google.guava</groupId> - <artifactId>guava</artifactId> - <version>18.0</version> - </dependency> - <dependency> - <groupId>com.github.scopt</groupId> - <artifactId>scopt_2.10</artifactId> - <version>3.3.0</version> - </dependency> - <dependency> - <groupId>org.scalatra.scalate</groupId> - <artifactId>scalate-core_2.10</artifactId> - <version>1.7.0</version> - </dependency> - <dependency> - <groupId>org.yaml</groupId> - <artifactId>snakeyaml</artifactId> - <version>1.15</version> - </dependency> - </dependencies> -</project> diff --git a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/scripts/bed_squish.py b/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/scripts/bed_squish.py deleted file mode 100755 index 4ed536df31db98596f97be0d88836b75d77cf597..0000000000000000000000000000000000000000 --- a/public/biopet-framework/src/main/resources/nl/lumc/sasc/biopet/scripts/bed_squish.py +++ /dev/null @@ -1,229 +0,0 @@ -#!/usr/bin/env python2 -# -# Biopet is built on top of GATK Queue for building bioinformatic -# pipelines. It is mainly intended to support LUMC SHARK cluster which is running -# SGE. But other types of HPC that are supported by GATK Queue (such as PBS) -# should also be able to execute Biopet tools and pipelines. -# -# Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center -# -# Contact us at: sasc@lumc.nl -# -# A dual licensing mode is applied. The source code within this project that are -# not part of GATK Queue is freely available for non-commercial use under an AGPL -# license; For commercial users or users who do not want to follow the AGPL -# license, please contact us to obtain a separate license. -# - - -""" -Overlapping regions removal in a single BED file. - - -The script will adjust feature coordinates so that no overlaps are present. If -a feature is enveloped entirely by another feature, the smaller feature will be -removed and the enveloping feature split into two. - -Input BED files must be position-sorted and only contain the first six fields. -Strands are taken into account when removing regions. - -Requirements: - * Python == 2.7.x - * track >= 1.1.0 <http://xapple.github.io/track/> - -Copyright (c) 2013 Wibowo Arindrarto <w.arindrarto@lumc.nl> -Copyright (c) 2013 LUMC Sequencing Analysis Support Core <sasc@lumc.nl> -MIT License <http://opensource.org/licenses/MIT> - -""" - -import argparse -import os - -import track - - -class BEDCoord(object): - - """Class representing a BED feature coordinate.""" - - __slots__ = ('feature', 'kind', 'point', 'start', 'strand') - - def __init__(self, feature, kind, point, start, strand): - """ - - :param feature: name of BED feature - :type feature: str - :param kind: type of coordinate, 'start' or 'end' - :type kind: str - :param point: coordinate point - :type point: int - :param start: start coordinate of the coordinate with this feature - :type start: int - :param strand: strand of the feature, 1 or -1 - :type strand: int - - """ - self.feature = feature - assert kind in ('start', 'end') - self.kind = kind - self.point = point - self.start = start - self.strand = strand - - def __repr__(self): - return '{0}{1}'.format(self.point, self.kind[0].upper()) - - def __gt__(self, other): - if self.point == other.point: - return self.start > other.start - return self.point > other.point - - def __lt__(self, other): - if self.point == other.point: - return self.start < other.start - return self.point < other.point - - def __ge__(self, other): - return self.point >= other.point - - def __le__(self, other): - return self.point <= other.point - - -def squish_track_records(chrom_recs): - """Given an iterator for `track` records, yield squished `track` features. - - :param chrom_feats: iterator returning `track` records for one chromosome - :type chrom_feats: iterator - :returns: (generator) single `track` records - :rtype: `track.pyrow.SuperRow` - - """ - # Algorithm: - # 1. Flatten all coordinate points into a single list - # 2. Sort by point, resolve same point by comparing feature starts - # (already defined in BEDCoord's `__lt__` and `__gt__`) - # 3. Walk through the sorted points while keeping track of overlaps using - # a level' counter for each strand - # 4. Start coordinates increase level counters, end coordinates decrease - # them - # 5. Start coordinates of the squished features are: - # * start coordinates in the array when level == 1 - # * end coordinates in the array when level == 1 - # 6. End coordinates of the squished features are: - # * start coordinates in the array when level == 2 - # * end coordinates in the array when level == 0 - # 7. As additional checks, make sure that: - # * when yielding a record, its start coordinate <= its end coordinate - # * the level counter never falls below 0 (this doesn't make sense) - # * after all iterations are finished, the level counter == 0 - # Assumes: - # 1. Input BED file is position-sorted - # 2. Coordinate points all denote closed intervals (this is handled by - # `track` for BED files already) - flat_coords = [] - for rec in chrom_recs: - flat_coords.append(BEDCoord(rec[2], 'start', rec[0], rec[0], rec[4])) - flat_coords.append(BEDCoord(rec[2], 'end', rec[1], rec[0], rec[4])) - - flat_coords.sort() - - plus_level, minus_level = 0, 0 - plus_row = [0, 0, "", 0, 1] - minus_row = [0, 0, "", 0, -1] - - for coord in flat_coords: - - if coord.strand == 1: - - if coord.kind == 'start': - plus_level += 1 - if plus_level == 1: - plus_row[0] = coord.point - plus_row[2] = coord.feature - elif plus_level == 2: - plus_row[1] = coord.point - # track uses closed coordinates already - assert plus_row[0] <= plus_row[1] - yield plus_row - else: - plus_level -= 1 - if plus_level == 0: - plus_row[1] = coord.point - plus_row[2] = coord.feature - assert plus_row[0] <= plus_row[1] - yield plus_row - elif plus_level == 1: - plus_row[0] = coord.point - - assert plus_level >= 0, 'Unexpected feature level: {0}'.format( - plus_level) - - elif coord.strand == -1: - - if coord.kind == 'start': - minus_level += 1 - if minus_level == 1: - minus_row[0] = coord.point - minus_row[2] = coord.feature - elif minus_level == 2: - minus_row[1] = coord.point - assert minus_row[0] <= minus_row[1] - yield minus_row - else: - minus_level -= 1 - if minus_level == 0: - minus_row[1] = coord.point - minus_row[2] = coord.feature - assert minus_row[0] <= minus_row[1] - yield minus_row - elif minus_level == 1: - minus_row[0] = coord.point - - assert minus_level >= 0, 'Unexpected feature level: {0}'.format( - minus_level) - - assert plus_level == 0, 'Unexpected end plus feature level: ' \ - '{0}'.format(plus_level) - assert minus_level == 0, 'Unexpected end minus feature level: ' \ - '{0}'.format(minus_level) - - -def squish_bed(in_file, out_file): - """Removes all overlapping regions in the input BED file, writing to the - output BED file. - - :param in_file: path to input BED file - :type in_file: str - :param out_file: path to output BED file - :type out_file: str - - """ - # check for input file presence, remove output file if it already exists - assert os.path.exists(in_file), 'Required input file {0} does not ' \ - 'exist'.format(in_file) - if os.path.exists(out_file): - os.unlink(out_file) - - with track.load(in_file, readonly=True) as in_track, \ - track.new(out_file, format='bed') as out_track: - - for chrom in in_track.chromosomes: - chrom_rec = in_track.read(chrom) - out_track.write(chrom, squish_track_records(chrom_rec)) - - -if __name__ == '__main__': - - usage = __doc__.split('\n\n\n') - parser = argparse.ArgumentParser( - formatter_class=argparse.RawDescriptionHelpFormatter, - description=usage[0], epilog=usage[1]) - - parser.add_argument('input', type=str, help='Path to input BED file') - parser.add_argument('output', type=str, help='Path to output BED file') - - args = parser.parse_args() - - squish_bed(args.input, args.output) diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetCommandLineFunction.scala b/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetCommandLineFunction.scala deleted file mode 100644 index 30c400bc52d8ab439089f238864858fa32ef6832..0000000000000000000000000000000000000000 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetCommandLineFunction.scala +++ /dev/null @@ -1,39 +0,0 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.core - -/** - * This class is for commandline programs where the executable is a non JVM based program - */ -abstract class BiopetCommandLineFunction extends BiopetCommandLineFunctionTrait { - /** - * This function needs to be implemented to define the command that is executed - * @return Command to run - */ - protected def cmdLine: String - - /** - * implementing a final version of the commandLine from org.broadinstitute.gatk.queue.function.CommandLineFunction - * User needs to implement cmdLine instead - * @return Command to run - */ - final def commandLine: String = { - preCmdInternal() - val cmd = cmdLine - addJobReportBinding("command", cmd) - cmd - } -} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetCommandLineFunctionTrait.scala b/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetCommandLineFunctionTrait.scala deleted file mode 100644 index edb16ee762338f170cfe1ae3b9f081c7b226a0d0..0000000000000000000000000000000000000000 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetCommandLineFunctionTrait.scala +++ /dev/null @@ -1,233 +0,0 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.core - -import java.io.{ File, FileInputStream } -import java.security.MessageDigest - -import nl.lumc.sasc.biopet.core.config.Configurable -import org.broadinstitute.gatk.queue.function.CommandLineFunction -import org.broadinstitute.gatk.utils.commandline.Input - -import scala.collection.mutable -import scala.sys.process.{ Process, ProcessLogger } -import scala.util.matching.Regex - -/** Biopet command line trait to auto check executable and cluster values */ -trait BiopetCommandLineFunctionTrait extends CommandLineFunction with Configurable { - analysisName = configName - - @Input(doc = "deps", required = false) - var deps: List[File] = Nil - - var threads = 0 - def defaultThreads = 1 - - var vmem: Option[String] = config("vmem") - protected def defaultCoreMemory: Double = 1.0 - protected def defaultVmemFactor: Double = 1.4 - var vmemFactor: Double = config("vmem_factor", default = defaultVmemFactor) - - var residentFactor: Double = config("resident_factor", default = 1.2) - - private var coreMemory: Double = _ - - var executable: String = _ - - /** - * Can override this method. This is executed just before the job is ready to run. - * Can check on run time files from pipeline here - */ - protected[core] def beforeCmd() {} - - /** Can override this method. This is executed after the script is done en queue starts to generate the graph */ - protected[core] def beforeGraph() {} - - /** Set default output file, threads and vmem for current job */ - override def freezeFieldValues() { - preProcessExecutable() - beforeGraph() - if (jobOutputFile == null) jobOutputFile = new File(firstOutput.getAbsoluteFile.getParent, "." + firstOutput.getName + "." + configName + ".out") - - if (threads == 0) threads = getThreads(defaultThreads) - if (threads > 1) nCoresRequest = Option(threads) - - coreMemory = config("core_memory", default = defaultCoreMemory).asDouble + (0.5 * retry) - - if (config.contains("memory_limit")) memoryLimit = config("memory_limit") - else memoryLimit = Some(coreMemory * threads) - - if (config.contains("resident_limit")) residentLimit = config("resident_limit") - else residentLimit = Some((coreMemory + (0.5 * retry)) * residentFactor) - - if (!config.contains("vmem")) vmem = Some((coreMemory * (vmemFactor + (0.5 * retry))) + "G") - if (vmem.isDefined) jobResourceRequests :+= "h_vmem=" + vmem.get - jobName = configName + ":" + (if (firstOutput != null) firstOutput.getName else jobOutputFile) - - super.freezeFieldValues() - } - - var retry = 0 - - override def setupRetry(): Unit = { - super.setupRetry() - if (vmem.isDefined) jobResourceRequests = jobResourceRequests.filterNot(_.contains("h_vmem=")) - logger.info("Auto raise memory on retry") - retry += 1 - this.freeze() - } - - /** can override this value is executable may not be converted to CanonicalPath */ - val executableToCanonicalPath = true - - /** - * Checks executable. Follow full CanonicalPath, checks if it is existing and do a md5sum on it to store in job report - */ - protected[core] def preProcessExecutable() { - if (!BiopetCommandLineFunctionTrait.executableMd5Cache.contains(executable)) { - try if (executable != null) { - if (!BiopetCommandLineFunctionTrait.executableCache.contains(executable)) { - val oldExecutable = executable - val buffer = new StringBuffer() - val cmd = Seq("which", executable) - val process = Process(cmd).run(ProcessLogger(buffer.append(_))) - if (process.exitValue == 0) { - executable = buffer.toString - val file = new File(executable) - if (executableToCanonicalPath) executable = file.getCanonicalPath - else executable = file.getAbsolutePath - } else { - BiopetQScript.addError("executable: '" + executable + "' not found, please check config") - } - BiopetCommandLineFunctionTrait.executableCache += oldExecutable -> executable - BiopetCommandLineFunctionTrait.executableCache += executable -> executable - } else { - executable = BiopetCommandLineFunctionTrait.executableCache(executable) - } - - if (!BiopetCommandLineFunctionTrait.executableMd5Cache.contains(executable)) { - val is = new FileInputStream(executable) - val cnt = is.available - val bytes = Array.ofDim[Byte](cnt) - is.read(bytes) - is.close() - val temp = MessageDigest.getInstance("MD5").digest(bytes).map("%02X".format(_)).mkString.toLowerCase - BiopetCommandLineFunctionTrait.executableMd5Cache += executable -> temp - } - } catch { - case ioe: java.io.IOException => logger.warn("Could not use 'which', check on executable skipped: " + ioe) - } - } - val md5 = BiopetCommandLineFunctionTrait.executableMd5Cache.get(executable) - addJobReportBinding("md5sum_exe", md5.getOrElse("None")) - } - - /** executes checkExecutable method and fill job report */ - final protected def preCmdInternal() { - preProcessExecutable() - beforeCmd() - - addJobReportBinding("cores", nCoresRequest match { - case Some(n) if n > 0 => n - case _ => 1 - }) - addJobReportBinding("version", getVersion) - } - - /** Command to get version of executable */ - protected def versionCommand: String = null - - /** Regex to get version from version command output */ - protected def versionRegex: Regex = null - - /** Allowed exit codes for the version command */ - protected def versionExitcode = List(0) - - /** Executes the version command */ - private[core] def getVersionInternal: Option[String] = { - if (versionCommand == null || versionRegex == null) None - else getVersionInternal(versionCommand, versionRegex) - } - - /** Executes the version command */ - private[core] def getVersionInternal(versionCommand: String, versionRegex: Regex): Option[String] = { - if (versionCommand == null || versionRegex == null) return None - val exe = new File(versionCommand.trim.split(" ")(0)) - if (!exe.exists()) return None - val stdout = new StringBuffer() - val stderr = new StringBuffer() - def outputLog = "Version command: \n" + versionCommand + - "\n output log: \n stdout: \n" + stdout.toString + - "\n stderr: \n" + stderr.toString - val process = Process(versionCommand).run(ProcessLogger(stdout append _ + "\n", stderr append _ + "\n")) - if (!versionExitcode.contains(process.exitValue())) { - logger.warn("getVersion give exit code " + process.exitValue + ", version not found \n" + outputLog) - return None - } - for (line <- stdout.toString.split("\n") ++ stderr.toString.split("\n")) { - line match { - case versionRegex(m) => return Some(m) - case _ => - } - } - logger.warn("getVersion give a exit code " + process.exitValue + " but no version was found, executable correct? \n" + outputLog) - None - } - - /** Get version from cache otherwise execute the version command */ - def getVersion: Option[String] = { - if (!BiopetCommandLineFunctionTrait.executableCache.contains(executable)) - preProcessExecutable() - if (!BiopetCommandLineFunctionTrait.versionCache.contains(versionCommand)) - getVersionInternal match { - case Some(version) => BiopetCommandLineFunctionTrait.versionCache += versionCommand -> version - case _ => - } - BiopetCommandLineFunctionTrait.versionCache.get(versionCommand) - } - - /** - * Get threads from config - * @param default default when not found in config - * @return number of threads - */ - def getThreads(default: Int): Int = { - val maxThreads: Int = config("maxthreads", default = 8) - val threads: Int = config("threads", default = default) - if (maxThreads > threads) threads - else maxThreads - } - - /** - * Get threads from config - * @param default default when not found in config - * @param module Module when this is difrent from default - * @return number of threads - */ - def getThreads(default: Int, module: String): Int = { - val maxThreads: Int = config("maxthreads", default = 8, submodule = module) - val threads: Int = config("threads", default = default, submodule = module) - if (maxThreads > threads) threads - else maxThreads - } -} - -/** stores global caches */ -object BiopetCommandLineFunctionTrait { - private[core] val versionCache: mutable.Map[String, String] = mutable.Map() - private[core] val executableMd5Cache: mutable.Map[String, String] = mutable.Map() - private[core] val executableCache: mutable.Map[String, String] = mutable.Map() -} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/RscriptCommandLineFunction.scala b/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/RscriptCommandLineFunction.scala deleted file mode 100644 index cd45ff92b557b7be4f5fa8ca76030f8b8a2babaf..0000000000000000000000000000000000000000 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/RscriptCommandLineFunction.scala +++ /dev/null @@ -1,92 +0,0 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.extensions - -import java.io.{ File, FileOutputStream } - -import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction - -import scala.sys.process._ - -/** - * General rscript extension - * - * Created by wyleung on 17-2-15. - */ -trait RscriptCommandLineFunction extends BiopetCommandLineFunction { - - protected var script: File - - executable = config("exe", default = "Rscript", submodule = "Rscript") - - override def beforeGraph(): Unit = { - checkScript() - } - - /** - * If script not exist in file system it try to copy it from the jar - * @param local if true it use File.createTempFile instead of ".queue/tmp/" - */ - protected def checkScript(local: Boolean = false): Unit = { - if (script.exists()) { - script = script.getAbsoluteFile - } else { - val rScript: File = { - if (local) File.createTempFile(script.getName, ".R") - else new File(".queue/tmp/" + script) - } - if (!rScript.getParentFile.exists) rScript.getParentFile.mkdirs - - val is = getClass.getResourceAsStream(script.getPath) - val os = new FileOutputStream(rScript) - - org.apache.commons.io.IOUtils.copy(is, os) - os.close() - - script = rScript - } - } - - /** - * Execute rscript on local system - * @param logger How to handle stdout and stderr - */ - def runLocal(logger: ProcessLogger): Unit = { - checkScript(local = true) - - this.logger.info(cmdLine) - - val cmd = cmdLine.stripPrefix(" '").stripSuffix("' ").split("' *'") - - this.logger.info(cmd.mkString(" ")) - - val process = Process(cmd.toSeq).run(logger) - this.logger.info(process.exitValue()) - } - - /** - * Execute rscript on local system - * Stdout and stderr will go to biopet logger - */ - def runLocal(): Unit = { - runLocal(ProcessLogger(logger.info(_))) - } - - def cmdLine: String = { - required(executable) + - required(script) - } -} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/pipelines/MultisamplePipelineTemplate.scala b/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/pipelines/MultisamplePipelineTemplate.scala deleted file mode 100644 index 1d6390b257b27e896966734fdcbe1335a0026642..0000000000000000000000000000000000000000 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/pipelines/MultisamplePipelineTemplate.scala +++ /dev/null @@ -1,81 +0,0 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.pipelines - -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ MultiSampleQScript, PipelineCommand } -import org.broadinstitute.gatk.queue.QScript - -/** Template for a multisample pipeline */ -class MultisamplePipelineTemplate(val root: Configurable) extends QScript with MultiSampleQScript { - def this() = this(null) - - /** Location of summary file */ - def summaryFile: File = new File(outputDir, "MultisamplePipelineTemplate.summary.json") - - /** File to add to the summary */ - def summaryFiles: Map[String, File] = Map() - - /** Pipeline settings to add to the summary */ - def summarySettings: Map[String, Any] = Map() - - /** Function to make a sample */ - def makeSample(id: String) = new Sample(id) - - /** This class will contain jobs and libraries for a sample */ - class Sample(sampleId: String) extends AbstractSample(sampleId) { - /** Sample specific files for summary */ - def summaryFiles: Map[String, File] = Map() - - /** Sample specific stats for summary */ - def summaryStats: Map[String, Any] = Map() - - /** Function to make a library */ - def makeLibrary(id: String) = new Library(id) - - /** This class will contain all jobs for a library */ - class Library(libId: String) extends AbstractLibrary(libId) { - /** Library specific files for summary */ - def summaryFiles: Map[String, File] = Map() - - /** Library specific stats for summary */ - def summaryStats: Map[String, Any] = Map() - - /** Method to add library jobs */ - protected def addJobs(): Unit = { - } - } - - /** Method to add sample jobs */ - protected def addJobs(): Unit = { - } - } - - /** Method where multisample jobs are added */ - def addMultiSampleJobs(): Unit = { - } - - /** This is executed before the script starts */ - def init(): Unit = { - } - - /** Method where jobs must be added */ - def biopetScript() { - } -} - -/** Object to let to generate a main method */ -object MultisamplePipelineTemplate extends PipelineCommand \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SamplesTsvToJson.scala b/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SamplesTsvToJson.scala deleted file mode 100644 index 46cd2b40cd8645a432aee54081f54e9dd0ec3351..0000000000000000000000000000000000000000 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SamplesTsvToJson.scala +++ /dev/null @@ -1,70 +0,0 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.tools - -import java.io.File - -import nl.lumc.sasc.biopet.core.ToolCommand -import nl.lumc.sasc.biopet.utils.ConfigUtils._ - -import scala.io.Source - -/** - * This tool can convert a tsv to a json file - */ -object SamplesTsvToJson extends ToolCommand { - case class Args(inputFiles: List[File] = Nil) extends AbstractArgs - - class OptParser extends AbstractOptParser { - opt[File]('i', "inputFiles") required () unbounded () valueName "<file>" action { (x, c) => - c.copy(inputFiles = x :: c.inputFiles) - } text "Input must be a tsv file, first line is seen as header and must at least have a 'sample' column, 'library' column is optional, multiple files allowed" - } - - /** Executes SamplesTsvToJson */ - def main(args: Array[String]): Unit = { - val argsParser = new OptParser - val commandArgs: Args = argsParser.parse(args, Args()) getOrElse sys.exit(1) - - val fileMaps = for (inputFile <- commandArgs.inputFiles) yield { - val reader = Source.fromFile(inputFile) - val lines = reader.getLines().toList.filter(!_.isEmpty) - val header = lines.head.split("\t") - val sampleColumn = header.indexOf("sample") - val libraryColumn = header.indexOf("library") - if (sampleColumn == -1) throw new IllegalStateException("sample column does not exist in: " + inputFile) - - val librariesValues: List[Map[String, Any]] = for (tsvLine <- lines.tail) yield { - val values = tsvLine.split("\t") - val sample = values(sampleColumn) - val library = if (libraryColumn != -1) values(libraryColumn) else null - val valuesMap = (for ( - t <- 0 until values.size if !values(t).isEmpty && t != sampleColumn && t != libraryColumn - ) yield header(t) -> values(t)).toMap - val map: Map[String, Any] = if (library != null) { - Map("samples" -> Map(sample -> Map("libraries" -> Map(library -> valuesMap)))) - } else { - Map("samples" -> Map(sample -> valuesMap)) - } - map - } - librariesValues.foldLeft(Map[String, Any]())((acc, kv) => mergeMaps(acc, kv)) - } - val map = fileMaps.foldLeft(Map[String, Any]())((acc, kv) => mergeMaps(acc, kv)) - val json = mapToJson(map) - println(json.spaces2) - } -} diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfFilterTest.scala b/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfFilterTest.scala deleted file mode 100644 index 2b47405a46b9971aab59ff2c4eaa088efea7f052..0000000000000000000000000000000000000000 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfFilterTest.scala +++ /dev/null @@ -1,84 +0,0 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.tools - -import java.io.File -import java.nio.file.Paths - -import htsjdk.variant.variantcontext.GenotypeType -import htsjdk.variant.vcf.VCFFileReader -import org.scalatest.Matchers -import org.scalatest.mock.MockitoSugar -import org.scalatest.testng.TestNGSuite -import org.testng.annotations.Test - -import scala.util.Random - -/** - * Test class for [[VcfFilter]] - * - * Created by ahbbollen on 9-4-15. - */ -class VcfFilterTest extends TestNGSuite with MockitoSugar with Matchers { - - import VcfFilter._ - private def resourcePath(p: String): String = { - Paths.get(getClass.getResource(p).toURI).toString - } - - val vepped_path = resourcePath("/VEP_oneline.vcf") - val vepped = new File(vepped_path) - val rand = new Random() - - @Test def testOutputTypeVcf() = { - val tmp_path = "/tmp/VcfFilter_" + rand.nextString(10) + ".vcf" - val arguments: Array[String] = Array("-I", vepped_path, "-o", tmp_path) - main(arguments) - } - - @Test def testOutputTypeBcf() = { - val tmp_path = "/tmp/VcfFilter_" + rand.nextString(10) + ".bcf" - val arguments: Array[String] = Array("-I", vepped_path, "-o", tmp_path) - main(arguments) - } - - @Test def testOutputTypeVcfGz() = { - val tmp_path = "/tmp/VcfFilter_" + rand.nextString(10) + ".vcf.gz" - val arguments: Array[String] = Array("-I", vepped_path, "-o", tmp_path) - main(arguments) - } - - @Test def testHasGenotype() = { - val reader = new VCFFileReader(vepped, false) - val record = reader.iterator().next() - - hasGenotype(record, List(("Child_7006504", GenotypeType.HET))) shouldBe true - hasGenotype(record, List(("Child_7006504", GenotypeType.HOM_VAR))) shouldBe false - hasGenotype(record, List(("Child_7006504", GenotypeType.HOM_REF))) shouldBe false - hasGenotype(record, List(("Child_7006504", GenotypeType.NO_CALL))) shouldBe false - hasGenotype(record, List(("Child_7006504", GenotypeType.MIXED))) shouldBe false - - hasGenotype(record, List(("Mother_7006508", GenotypeType.HET))) shouldBe false - hasGenotype(record, List(("Mother_7006508", GenotypeType.HOM_VAR))) shouldBe false - hasGenotype(record, List(("Mother_7006508", GenotypeType.HOM_REF))) shouldBe true - hasGenotype(record, List(("Mother_7006508", GenotypeType.NO_CALL))) shouldBe false - hasGenotype(record, List(("Mother_7006508", GenotypeType.MIXED))) shouldBe false - - hasGenotype(record, List(("Mother_7006508", GenotypeType.HOM_REF), ("Child_7006504", GenotypeType.HET))) shouldBe true - hasGenotype(record, List(("Mother_7006508", GenotypeType.HET), ("Child_7006504", GenotypeType.HOM_REF))) shouldBe false - } - -} diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfStatsTest.scala b/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfStatsTest.scala deleted file mode 100644 index 0ffe4713b117f8797be7b075e5a948bb0f709722..0000000000000000000000000000000000000000 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfStatsTest.scala +++ /dev/null @@ -1,119 +0,0 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.tools - -import htsjdk.variant.variantcontext.Allele -import nl.lumc.sasc.biopet.tools.VcfStats._ -import org.scalatest.Matchers -import org.scalatest.testng.TestNGSuite -import org.testng.annotations.Test - -import scala.collection.mutable - -/** - * Test class for [[VcfStats]] - * - * Created by pjvan_thof on 2/5/15. - */ -class VcfStatsTest extends TestNGSuite with Matchers { - - @Test - def testSampleToSampleStats(): Unit = { - val s1 = SampleToSampleStats() - val s2 = SampleToSampleStats() - s1.alleleOverlap shouldBe 0 - s1.genotypeOverlap shouldBe 0 - s2.alleleOverlap shouldBe 0 - s2.genotypeOverlap shouldBe 0 - - s1 += s2 - s1.alleleOverlap shouldBe 0 - s1.genotypeOverlap shouldBe 0 - s2.alleleOverlap shouldBe 0 - s2.genotypeOverlap shouldBe 0 - - s2.alleleOverlap = 2 - s2.genotypeOverlap = 3 - - s1 += s2 - s1.alleleOverlap shouldBe 2 - s1.genotypeOverlap shouldBe 3 - s2.alleleOverlap shouldBe 2 - s2.genotypeOverlap shouldBe 3 - - s1 += s2 - s1.alleleOverlap shouldBe 4 - s1.genotypeOverlap shouldBe 6 - s2.alleleOverlap shouldBe 2 - s2.genotypeOverlap shouldBe 3 - } - - @Test - def testSampleStats(): Unit = { - val s1 = SampleStats() - val s2 = SampleStats() - - s1.sampleToSample += "s1" -> SampleToSampleStats() - s1.sampleToSample += "s2" -> SampleToSampleStats() - s2.sampleToSample += "s1" -> SampleToSampleStats() - s2.sampleToSample += "s2" -> SampleToSampleStats() - - s1.sampleToSample("s1").alleleOverlap = 1 - s2.sampleToSample("s2").alleleOverlap = 2 - - val bla1 = s1.genotypeStats.getOrElse("chr", mutable.Map[String, mutable.Map[Any, Int]]()) += "1" -> mutable.Map(1 -> 1) - s1.genotypeStats += "chr" -> bla1 - val bla2 = s2.genotypeStats.getOrElse("chr", mutable.Map[String, mutable.Map[Any, Int]]()) += "2" -> mutable.Map(2 -> 2) - s2.genotypeStats += "chr" -> bla2 - - val ss1 = SampleToSampleStats() - val ss2 = SampleToSampleStats() - - s1 += s2 - s1.genotypeStats.getOrElse("chr", mutable.Map[String, mutable.Map[Any, Int]]()) shouldBe mutable.Map("1" -> mutable.Map(1 -> 1), "2" -> mutable.Map(2 -> 2)) - ss1.alleleOverlap = 1 - ss2.alleleOverlap = 2 - s1.sampleToSample shouldBe mutable.Map("s1" -> ss1, "s2" -> ss2) - - s1 += s2 - s1.genotypeStats.getOrElse("chr", mutable.Map[String, mutable.Map[Any, Int]]()) shouldBe mutable.Map("1" -> mutable.Map(1 -> 1), "2" -> mutable.Map(2 -> 4)) - - s1 += s1 - s1.genotypeStats.getOrElse("chr", mutable.Map[String, mutable.Map[Any, Int]]()) shouldBe mutable.Map("1" -> mutable.Map(1 -> 2), "2" -> mutable.Map(2 -> 8)) - } - - @Test - def testAlleleOverlap(): Unit = { - - val a1 = Allele.create("G") - val a2 = Allele.create("A") - - alleleOverlap(List(a1, a1), List(a1, a1)) shouldBe 2 - alleleOverlap(List(a2, a2), List(a2, a2)) shouldBe 2 - alleleOverlap(List(a1, a2), List(a1, a2)) shouldBe 2 - alleleOverlap(List(a1, a2), List(a2, a1)) shouldBe 2 - alleleOverlap(List(a2, a1), List(a1, a2)) shouldBe 2 - alleleOverlap(List(a2, a1), List(a2, a1)) shouldBe 2 - - alleleOverlap(List(a1, a2), List(a1, a1)) shouldBe 1 - alleleOverlap(List(a2, a1), List(a1, a1)) shouldBe 1 - alleleOverlap(List(a1, a1), List(a1, a2)) shouldBe 1 - alleleOverlap(List(a1, a1), List(a2, a1)) shouldBe 1 - - alleleOverlap(List(a1, a1), List(a2, a2)) shouldBe 0 - alleleOverlap(List(a2, a2), List(a1, a1)) shouldBe 0 - } -} diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfWithVcfTest.scala b/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfWithVcfTest.scala deleted file mode 100644 index 9996dfed3befed4e8913ee1377d04953607dfbc9..0000000000000000000000000000000000000000 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfWithVcfTest.scala +++ /dev/null @@ -1,62 +0,0 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.tools - -import java.io.File -import java.nio.file.Paths - -import org.scalatest.Matchers -import org.scalatest.mock.MockitoSugar -import org.scalatest.testng.TestNGSuite -import org.testng.annotations.Test - -import scala.util.Random - -/** - * Test class for [[VcfWithVcfTest]] - * - * Created by ahbbollen on 10-4-15. - */ -class VcfWithVcfTest extends TestNGSuite with MockitoSugar with Matchers { - import VcfWithVcf._ - - private def resourcePath(p: String): String = { - Paths.get(getClass.getResource(p).toURI).toString - } - - val veppedPath = resourcePath("/VEP_oneline.vcf.gz") - val unveppedPath = resourcePath("/unvepped.vcf.gz") - val rand = new Random() - - @Test def testOutputTypeVcf() = { - val tmpPath = File.createTempFile("VcfWithVcf_", ".vcf").getAbsolutePath - val arguments = Array("-I", unveppedPath, "-s", veppedPath, "-o", tmpPath, "-f", "CSQ") - main(arguments) - } - - @Test def testOutputTypeVcfGz() = { - val tmpPath = File.createTempFile("VcfWithVcf_", ".vcf").getAbsolutePath - val arguments = Array("-I", unveppedPath, "-s", veppedPath, "-o", tmpPath, "-f", "CSQ") - main(arguments) - } - - @Test def testOutputTypeBcf() = { - val tmpPath = File.createTempFile("VcfWithVcf_", ".vcf").getAbsolutePath - val arguments = Array("-I", unveppedPath, "-s", veppedPath, "-o", tmpPath, "-f", "CSQ") - main(arguments) - } - -} diff --git a/public/biopet-public-package/pom.xml b/public/biopet-public-package/pom.xml index 5741d4cebab9106a91739c83d202c87dcda0f17c..aeed0caa9f6639b0a08f71a3fd0ac6ca646c506c 100644 --- a/public/biopet-public-package/pom.xml +++ b/public/biopet-public-package/pom.xml @@ -25,7 +25,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -35,13 +35,13 @@ <properties> <sting.shade.phase>package</sting.shade.phase> - <app.main.class>nl.lumc.sasc.biopet.core.BiopetExecutablePublic</app.main.class> + <app.main.class>nl.lumc.sasc.biopet.BiopetExecutablePublic</app.main.class> </properties> <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/public/biopet-public-package/src/main/scala/nl/lumc/sasc/biopet/core/BiopetExecutablePublic.scala b/public/biopet-public-package/src/main/scala/nl/lumc/sasc/biopet/BiopetExecutablePublic.scala similarity index 94% rename from public/biopet-public-package/src/main/scala/nl/lumc/sasc/biopet/core/BiopetExecutablePublic.scala rename to public/biopet-public-package/src/main/scala/nl/lumc/sasc/biopet/BiopetExecutablePublic.scala index 40012aa2759279307589ef9309474f9841d34921..b4109a45e70b0dd111463cddb3df7f2ebf74d14f 100644 --- a/public/biopet-public-package/src/main/scala/nl/lumc/sasc/biopet/core/BiopetExecutablePublic.scala +++ b/public/biopet-public-package/src/main/scala/nl/lumc/sasc/biopet/BiopetExecutablePublic.scala @@ -13,7 +13,9 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core +package nl.lumc.sasc.biopet + +import nl.lumc.sasc.biopet.utils.{ BiopetExecutable, MainCommand } object BiopetExecutablePublic extends BiopetExecutable { def publicPipelines: List[MainCommand] = List( @@ -25,8 +27,7 @@ object BiopetExecutablePublic extends BiopetExecutable { nl.lumc.sasc.biopet.pipelines.bamtobigwig.Bam2Wig, nl.lumc.sasc.biopet.pipelines.carp.Carp, nl.lumc.sasc.biopet.pipelines.toucan.Toucan, - nl.lumc.sasc.biopet.pipelines.shiva.ShivaSvCalling, - nl.lumc.sasc.biopet.pipelines.gears.Gears + nl.lumc.sasc.biopet.pipelines.shiva.ShivaSvCalling ) def pipelines: List[MainCommand] = List( diff --git a/public/biopet-tools-extensions/pom.xml b/public/biopet-tools-extensions/pom.xml new file mode 100644 index 0000000000000000000000000000000000000000..e8ef9d9a7b4f21a2dada57c3cb1cfcfe0299dd12 --- /dev/null +++ b/public/biopet-tools-extensions/pom.xml @@ -0,0 +1,32 @@ +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <parent> + <artifactId>Biopet</artifactId> + <groupId>nl.lumc.sasc</groupId> + <version>0.5.0-SNAPSHOT</version> + </parent> + <modelVersion>4.0.0</modelVersion> + + <artifactId>BiopetToolsExtensions</artifactId> + + <dependencies> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetCore</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetExtensions</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetTools</artifactId> + <version>${project.version}</version> + </dependency> + </dependencies> + +</project> \ No newline at end of file diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BastyGenerateFasta.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BastyGenerateFasta.scala new file mode 100644 index 0000000000000000000000000000000000000000..bc1d672269421ce7be6d64d15d0f8841c81332a1 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BastyGenerateFasta.scala @@ -0,0 +1,70 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.core.{ Reference, ToolCommandFuntion } +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +class BastyGenerateFasta(val root: Configurable) extends ToolCommandFuntion with Reference { + def toolObject = nl.lumc.sasc.biopet.tools.BastyGenerateFasta + + @Input(doc = "Input vcf file", required = false) + var inputVcf: File = _ + + @Input(doc = "Bam File", required = false) + var bamFile: File = _ + + @Input(doc = "reference", required = false) + var reference: File = _ + + @Output(doc = "Output fasta, variants only", required = false) + var outputVariants: File = _ + + @Output(doc = "Output fasta, variants only", required = false) + var outputConsensus: File = _ + + @Output(doc = "Output fasta, variants only", required = false) + var outputConsensusVariants: File = _ + + var snpsOnly: Boolean = config("snps_only", default = false) + var sampleName: String = _ + var minAD: Int = config("min_ad", default = 8) + var minDepth: Int = config("min_depth", default = 8) + var outputName: String = _ + + override def defaultCoreMemory = 4.0 + + override def beforeGraph(): Unit = { + super.beforeGraph() + reference = referenceFasta() + } + + override def cmdLine = super.cmdLine + + optional("--inputVcf", inputVcf) + + optional("--bamFile", bamFile) + + optional("--outputVariants", outputVariants) + + optional("--outputConsensus", outputConsensus) + + optional("--outputConsensusVariants", outputConsensusVariants) + + conditional(snpsOnly, "--snpsOnly") + + optional("--sampleName", sampleName) + + required("--outputName", outputName) + + optional("--minAD", minAD) + + optional("--minDepth", minDepth) + + optional("--reference", reference) +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BedToInterval.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BedToInterval.scala new file mode 100644 index 0000000000000000000000000000000000000000..f7f00d0f94d43dcc45724c3ed4b61715072b88c1 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BedToInterval.scala @@ -0,0 +1,52 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +/** + * @deprecated Use picard.util.BedToIntervalList instead + */ +class BedToInterval(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.BedToInterval + + @Input(doc = "Input Bed file", required = true) + var input: File = _ + + @Input(doc = "Bam File", required = true) + var bamFile: File = _ + + @Output(doc = "Output interval list", required = true) + var output: File = _ + + override def defaultCoreMemory = 1.0 + + override def cmdLine = super.cmdLine + required("-I", input) + required("-b", bamFile) + required("-o", output) +} + +object BedToInterval { + def apply(root: Configurable, inputBed: File, inputBam: File, output: File): BedToInterval = { + val bedToInterval = new BedToInterval(root) + bedToInterval.input = inputBed + bedToInterval.bamFile = inputBam + bedToInterval.output = output + bedToInterval + } +} \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/scripts/SquishBed.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BedtoolsCoverageToCounts.scala similarity index 56% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/scripts/SquishBed.scala rename to public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BedtoolsCoverageToCounts.scala index b033fb45ed88fc8cea258d7ca19db3a2553e4e5c..00a77a5e9d9ac4ec1e5c2d02ed350cdf53b10d5d 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/scripts/SquishBed.scala +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BedtoolsCoverageToCounts.scala @@ -13,33 +13,26 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.scripts +package nl.lumc.sasc.biopet.extensions.tools import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.extensions.PythonCommandLineFunction +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Input, Output } -class SquishBed(val root: Configurable) extends PythonCommandLineFunction { - setPythonScript("bed_squish.py") +class BedtoolsCoverageToCounts(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.BedtoolsCoverageToCounts - @Input(doc = "Input file") + @Input(doc = "Input fasta", shortName = "input", required = true) var input: File = _ - @Output(doc = "output File") + @Output(doc = "Output tag library", shortName = "output", required = true) var output: File = _ - def cmdLine = getPythonCommand + - required(input) + - required(output) -} + override def defaultCoreMemory = 3.0 -object SquishBed { - def apply(root: Configurable, input: File, outputDir: File): SquishBed = { - val squishBed = new SquishBed(root) - squishBed.input = input - squishBed.output = new File(outputDir, input.getName.stripSuffix(".bed") + ".squish.bed") - squishBed - } + override def cmdLine = super.cmdLine + + required("-I", input) + + required("-o", output) } diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BiopetFlagstat.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BiopetFlagstat.scala new file mode 100644 index 0000000000000000000000000000000000000000..5dc69cb07a34f04ed2cc32e7069875649c3aa32b --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/BiopetFlagstat.scala @@ -0,0 +1,57 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.core.summary.Summarizable +import nl.lumc.sasc.biopet.utils.ConfigUtils +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +class BiopetFlagstat(val root: Configurable) extends ToolCommandFuntion with Summarizable { + def toolObject = nl.lumc.sasc.biopet.tools.BiopetFlagstat + + @Input(doc = "Input bam", shortName = "input", required = true) + var input: File = _ + + @Output(doc = "Output flagstat", shortName = "output", required = true) + var output: File = _ + + @Output(doc = "summary output file", shortName = "output", required = false) + var summaryFile: File = _ + + override def defaultCoreMemory = 6.0 + + override def cmdLine = super.cmdLine + required("-I", input) + required("-s", summaryFile) + " > " + required(output) + + def summaryFiles: Map[String, File] = Map() + + def summaryStats: Map[String, Any] = { + ConfigUtils.fileToConfigMap(summaryFile) + } +} + +object BiopetFlagstat { + def apply(root: Configurable, input: File, outputDir: File): BiopetFlagstat = { + val flagstat = new BiopetFlagstat(root) + flagstat.input = input + flagstat.output = new File(outputDir, input.getName.stripSuffix(".bam") + ".biopetflagstat") + flagstat.summaryFile = new File(outputDir, input.getName.stripSuffix(".bam") + ".biopetflagstat.json") + flagstat + } +} \ No newline at end of file diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/FastqSplitter.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/FastqSplitter.scala new file mode 100644 index 0000000000000000000000000000000000000000..7348048447ca6c614b3b43ba915b228e042e9a37 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/FastqSplitter.scala @@ -0,0 +1,43 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +/** + * Queue extension for the FastqSplitter + * @param root Parent object + */ +class FastqSplitter(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.FastqSplitter + + @Input(doc = "Input fastq", shortName = "input", required = true) + var input: File = _ + + @Output(doc = "Output fastq files", shortName = "output", required = true) + var output: List[File] = Nil + + override def defaultCoreMemory = 4.0 + + /** * Generate command to execute */ + override def cmdLine = super.cmdLine + + required("-I", input) + + repeat("-o", output) +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/FastqSync.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/FastqSync.scala new file mode 100644 index 0000000000000000000000000000000000000000..f7829bb8e96cf7f09c318f61a59695d5c11a3a47 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/FastqSync.scala @@ -0,0 +1,104 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, ToolCommandFuntion } +import nl.lumc.sasc.biopet.core.summary.Summarizable +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +import scala.io.Source +import scala.util.matching.Regex + +/** + * FastqSync function class for usage in Biopet pipelines + * + * @param root Configuration object for the pipeline + */ +class FastqSync(val root: Configurable) extends ToolCommandFuntion with Summarizable { + + def toolObject = nl.lumc.sasc.biopet.tools.FastqSync + + /** Original FASTQ file (read 1 or 2) */ + @Input(required = true) + var refFastq: File = null + + /** "Input read 1 FASTQ file" */ + @Input(required = true) + var inputFastq1: File = null + + /** Input read 2 FASTQ file */ + @Input(required = true) + var inputFastq2: File = null + + /** Output read 1 FASTQ file */ + @Output(required = true) + var outputFastq1: File = null + + /** Output read 2 FASTQ file */ + @Output(required = true) + var outputFastq2: File = null + + /** Sync statistics */ + @Output(required = true) + var outputStats: File = null + + override def defaultCoreMemory = 4.0 + + override def cmdLine = + super.cmdLine + + required("-r", refFastq) + + required("-i", inputFastq1) + + required("-j", inputFastq2) + + required("-o", outputFastq1) + + required("-p", outputFastq2) + " > " + + required(outputStats) + + def summaryFiles: Map[String, File] = Map() + + def summaryStats: Map[String, Any] = { + val regex = new Regex("""Filtered (\d*) reads from first read file. + |Filtered (\d*) reads from second read file. + |Synced read files contain (\d*) reads.""".stripMargin, + "R1", "R2", "RL") + + val (countFilteredR1, countFilteredR2, countRLeft) = + if (outputStats.exists) { + val text = Source + .fromFile(outputStats) + .getLines() + .mkString("\n") + regex.findFirstMatchIn(text) match { + case None => (0, 0, 0) + case Some(rmatch) => (rmatch.group("R1").toInt, rmatch.group("R2").toInt, rmatch.group("RL").toInt) + } + } else (0, 0, 0) + + Map("num_reads_discarded_R1" -> countFilteredR1, + "num_reads_discarded_R2" -> countFilteredR2, + "num_reads_kept" -> countRLeft + ) + } + + override def resolveSummaryConflict(v1: Any, v2: Any, key: String): Any = { + (v1, v2) match { + case (v1: Int, v2: Int) => v1 + v2 + case _ => v1 + } + } +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/MergeAlleles.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/MergeAlleles.scala new file mode 100644 index 0000000000000000000000000000000000000000..504c7cb7c830cf8d1fa067b9048e7f23f4e3fe06 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/MergeAlleles.scala @@ -0,0 +1,59 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +class MergeAlleles(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.MergeAlleles + + @Input(doc = "Input vcf files", shortName = "input", required = true) + var input: List[File] = Nil + + @Output(doc = "Output vcf file", shortName = "output", required = true) + var output: File = _ + + @Output(doc = "Output vcf file index", shortName = "output", required = true) + private var outputIndex: File = _ + + var reference: File = config("reference") + + override def defaultCoreMemory = 1.0 + + override def beforeGraph() { + super.beforeGraph() + if (output.getName.endsWith(".gz")) outputIndex = new File(output.getAbsolutePath + ".tbi") + if (output.getName.endsWith(".vcf")) outputIndex = new File(output.getAbsolutePath + ".idx") + } + + override def cmdLine = super.cmdLine + + repeat("-I", input) + + required("-o", output) + + required("-R", reference) +} + +object MergeAlleles { + def apply(root: Configurable, input: List[File], output: File): MergeAlleles = { + val mergeAlleles = new MergeAlleles(root) + mergeAlleles.input = input + mergeAlleles.output = output + mergeAlleles + } +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/MergeTables.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/MergeTables.scala new file mode 100644 index 0000000000000000000000000000000000000000..15ca55b6a524b283418ad67476d41b384541beb7 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/MergeTables.scala @@ -0,0 +1,79 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +import scala.collection.mutable.{ Set => MutSet } + +/** + * Biopet wrapper for the [[MergeTables]] command line tool. + * + * @param root [[Configurable]] object + */ +class MergeTables(val root: Configurable) extends ToolCommandFuntion { + + def toolObject = nl.lumc.sasc.biopet.tools.MergeTables + + override def defaultCoreMemory = 6.0 + + /** List of input tabular files */ + @Input(doc = "Input table files", required = true) + var inputTables: List[File] = List.empty[File] + + /** Output file */ + @Output(doc = "Output merged table", required = true) + var output: File = null + + // TODO: should be List[Int] really + /** List of column indices to combine to make a unique identifier per row */ + var idColumnIndices: List[String] = config("id_column_indices", default = List("1")) + + /** Index of column from each tabular file containing the values to be put in the final merged table */ + var valueColumnIndex: Int = config("value_column_index", default = 2) + + /** Name of the identifier column in the output file */ + var idColumnName: Option[String] = config("id_column_name") + + /** Common file extension of all input files */ + var fileExtension: Option[String] = config("file_extension") + + /** Number of header lines from each input file to ignore */ + var numHeaderLines: Option[Int] = config("num_header_lines") + + /** String to use when a value is missing from an input file */ + var fallbackString: Option[String] = config("fallback_string") + + /** Column delimiter of each input file (used for splitting into columns */ + var delimiter: Option[String] = config("delimiter") + + // executed command line + override def cmdLine = + super.cmdLine + + required("-i", idColumnIndices.mkString(",")) + + required("-a", valueColumnIndex) + + optional("-n", idColumnName) + + optional("-e", fileExtension) + + optional("-m", numHeaderLines) + + optional("-f", fallbackString) + + optional("-d", delimiter) + + required("-o", output) + + required("", repeat(inputTables), escape = false) +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/MpileupToVcf.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/MpileupToVcf.scala new file mode 100644 index 0000000000000000000000000000000000000000..60797ce9ca7f491a9dff0cf3a2379570cc6d91f5 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/MpileupToVcf.scala @@ -0,0 +1,85 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import htsjdk.samtools.SamReaderFactory +import nl.lumc.sasc.biopet.core.{ Reference, ToolCommandFuntion } +import nl.lumc.sasc.biopet.extensions.samtools.SamtoolsMpileup +import nl.lumc.sasc.biopet.utils.ConfigUtils +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +import scala.collection.JavaConversions._ + +class MpileupToVcf(val root: Configurable) extends ToolCommandFuntion with Reference { + def toolObject = nl.lumc.sasc.biopet.tools.MpileupToVcf + + @Input(doc = "Input mpileup file", shortName = "mpileup", required = false) + var inputMpileup: File = _ + + @Input(doc = "Input bam file", shortName = "bam", required = false) + var inputBam: File = _ + + @Output(doc = "Output tag library", shortName = "output", required = true) + var output: File = _ + + var minDP: Option[Int] = config("min_dp") + var minAP: Option[Int] = config("min_ap") + var homoFraction: Option[Double] = config("homoFraction") + var ploidy: Option[Int] = config("ploidy") + var sample: String = _ + var reference: String = _ + + override def defaultCoreMemory = 3.0 + + override def defaults = ConfigUtils.mergeMaps(Map("samtoolsmpileup" -> Map("disable_baq" -> true, "min_map_quality" -> 1)), + super.defaults) + + override def beforeGraph() { + super.beforeGraph() + reference = referenceFasta().getAbsolutePath + val samtoolsMpileup = new SamtoolsMpileup(this) + } + + override def beforeCmd(): Unit = { + if (sample == null && inputBam.exists() && inputBam.length() > 0) { + val inputSam = SamReaderFactory.makeDefault.open(inputBam) + val readGroups = inputSam.getFileHeader.getReadGroups + val samples = readGroups.map(readGroup => readGroup.getSample).distinct + sample = samples.head + inputSam.close() + } + } + + override def cmdLine = { + (if (inputMpileup == null) { + val samtoolsMpileup = new SamtoolsMpileup(this) + samtoolsMpileup.reference = referenceFasta() + samtoolsMpileup.input = List(inputBam) + samtoolsMpileup.cmdPipe + " | " + } else "") + + super.cmdLine + + required("-o", output) + + optional("--minDP", minDP) + + optional("--minAP", minAP) + + optional("--homoFraction", homoFraction) + + optional("--ploidy", ploidy) + + required("--sample", sample) + + (if (inputBam == null) required("-I", inputMpileup) else "") + } +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/PrefixFastq.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/PrefixFastq.scala new file mode 100644 index 0000000000000000000000000000000000000000..dca36baf399c3f90725f61843ead5ed4416a08ab --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/PrefixFastq.scala @@ -0,0 +1,68 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } + +/** + * Queue class for PrefixFastq tool + * + * Created by pjvan_thof on 1/13/15. + */ +class PrefixFastq(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.PrefixFastq + + override def defaultCoreMemory = 1.0 + + @Input(doc = "Input fastq", shortName = "I", required = true) + var inputFastq: File = _ + + @Output(doc = "Output fastq", shortName = "o", required = true) + var outputFastq: File = _ + + @Argument(doc = "Prefix seq", required = true) + var prefixSeq: String = _ + + /** + * Creates command to execute extension + * @return + */ + override def cmdLine = super.cmdLine + + required("-i", inputFastq) + + required("-o", outputFastq) + + optional("-s", prefixSeq) +} + +object PrefixFastq { + /** + * Create a PrefixFastq class object with a sufix ".prefix.fastq" in the output folder + * + * @param root parent object + * @param input input file + * @param outputDir outputFolder + * @return PrefixFastq class object + */ + def apply(root: Configurable, input: File, outputDir: String): PrefixFastq = { + val prefixFastq = new PrefixFastq(root) + prefixFastq.inputFastq = input + prefixFastq.outputFastq = new File(outputDir, input.getName + ".prefix.fastq") + prefixFastq + } +} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/pipelines/PipelineTemplate.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SageCountFastq.scala similarity index 51% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/pipelines/PipelineTemplate.scala rename to public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SageCountFastq.scala index 2bfb9b10584683d798947da42ebd65fbe040ed3c..0e71324dba7091729f044ff6b0815c04d5c937dd 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/pipelines/PipelineTemplate.scala +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SageCountFastq.scala @@ -13,24 +13,26 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.pipelines +package nl.lumc.sasc.biopet.extensions.tools -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ BiopetQScript, PipelineCommand } -import org.broadinstitute.gatk.queue.QScript +import java.io.File -/** Template for a pipeline */ -class PipelineTemplate(val root: Configurable) extends QScript with BiopetQScript { - def this() = this(null) +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } - /** This is executed before the script starts */ - def init() { - } +class SageCountFastq(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.SageCountFastq - /** Method where jobs must be added */ - def biopetScript() { - } -} + @Input(doc = "Input fasta", shortName = "input", required = true) + var input: File = _ + + @Output(doc = "Output tag library", shortName = "output", required = true) + var output: File = _ -/** Object to let to generate a main method */ -object PipelineTemplate extends PipelineCommand + override def defaultCoreMemory = 3.0 + + override def cmdLine = super.cmdLine + + required("-I", input) + + required("-o", output) +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SageCreateLibrary.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SageCreateLibrary.scala new file mode 100644 index 0000000000000000000000000000000000000000..a2d79430f008d0a5818d98ae6ded6258ab74be0a --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SageCreateLibrary.scala @@ -0,0 +1,54 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +class SageCreateLibrary(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.SageCreateLibrary + + @Input(doc = "Input fasta", shortName = "input", required = true) + var input: File = _ + + @Output(doc = "Output tag library", shortName = "output", required = true) + var output: File = _ + + @Output(doc = "Output no tags", shortName = "noTagsOutput", required = false) + var noTagsOutput: File = _ + + @Output(doc = "Output no anti tags library", shortName = "noAntiTagsOutput", required = false) + var noAntiTagsOutput: File = _ + + @Output(doc = "Output file all genes", shortName = "allGenes", required = false) + var allGenesOutput: File = _ + + var tag: String = config("tag", default = "CATG") + var length: Option[Int] = config("length", default = 17) + + override def defaultCoreMemory = 3.0 + + override def cmdLine = super.cmdLine + + required("-I", input) + + optional("--tag", tag) + + optional("--length", length) + + optional("--noTagsOutput", noTagsOutput) + + optional("--noAntiTagsOutput", noAntiTagsOutput) + + required("-o", output) +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SageCreateTagCounts.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SageCreateTagCounts.scala new file mode 100644 index 0000000000000000000000000000000000000000..30e7f524f65000789653085e4c6443a4e8a6d4a8 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SageCreateTagCounts.scala @@ -0,0 +1,54 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +class SageCreateTagCounts(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.SageCreateTagCounts + + @Input(doc = "Raw count file", shortName = "input", required = true) + var input: File = _ + + @Input(doc = "tag library", shortName = "taglib", required = true) + var tagLib: File = _ + + @Output(doc = "Sense count file", shortName = "sense", required = true) + var countSense: File = _ + + @Output(doc = "Sense all count file", shortName = "allsense", required = true) + var countAllSense: File = _ + + @Output(doc = "AntiSense count file", shortName = "antisense", required = true) + var countAntiSense: File = _ + + @Output(doc = "AntiSense all count file", shortName = "allantisense", required = true) + var countAllAntiSense: File = _ + + override def defaultCoreMemory = 3.0 + + override def cmdLine = super.cmdLine + + required("-I", input) + + required("--tagLib", tagLib) + + optional("--countSense", countSense) + + optional("--countAllSense", countAllSense) + + optional("--countAntiSense", countAntiSense) + + optional("--countAllAntiSense", countAllAntiSense) +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SeqStat.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SeqStat.scala new file mode 100644 index 0000000000000000000000000000000000000000..3f1cf95a447f3f13827a27b82073fbf98e628495 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SeqStat.scala @@ -0,0 +1,64 @@ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.core.summary.Summarizable +import nl.lumc.sasc.biopet.utils.ConfigUtils +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Output, Input } + +/** + * Seqstat function class for usage in Biopet pipelines + * + * @param root Configuration object for the pipeline + */ +class SeqStat(val root: Configurable) extends ToolCommandFuntion with Summarizable { + def toolObject = nl.lumc.sasc.biopet.tools.SeqStat + + @Input(doc = "Input FASTQ", shortName = "input", required = true) + var input: File = null + + @Output(doc = "Output JSON", shortName = "output", required = true) + var output: File = null + + override def defaultCoreMemory = 2.5 + + override def cmdLine = super.cmdLine + required("-i", input) + required("-o", output) + + def summaryStats: Map[String, Any] = { + val map = ConfigUtils.fileToConfigMap(output) + + ConfigUtils.any2map(map.getOrElse("stats", Map())) + } + + def summaryFiles: Map[String, File] = Map() + + override def resolveSummaryConflict(v1: Any, v2: Any, key: String): Any = { + (v1, v2) match { + case (v1: Array[_], v2: Array[_]) => v1.zip(v2).map(v => resolveSummaryConflict(v._1, v._2, key)) + case (v1: List[_], v2: List[_]) => v1.zip(v2).map(v => resolveSummaryConflict(v._1, v._2, key)) + case (v1: Int, v2: Int) if key == "len_min" => if (v1 < v2) v1 else v2 + case (v1: Int, v2: Int) if key == "len_max" => if (v1 > v2) v1 else v2 + case (v1: Int, v2: Int) => v1 + v2 + case (v1: Long, v2: Long) => v1 + v2 + case _ => v1 + } + } +} + +object SeqStat { + def apply(root: Configurable, input: File, output: File): SeqStat = { + val seqstat = new SeqStat(root) + seqstat.input = input + seqstat.output = new File(output, input.getName.substring(0, input.getName.lastIndexOf(".")) + ".seqstats.json") + seqstat + } + + def apply(root: Configurable, fastqfile: File, outDir: String): SeqStat = { + val seqstat = new SeqStat(root) + seqstat.input = fastqfile + seqstat.output = new File(outDir, fastqfile.getName.substring(0, fastqfile.getName.lastIndexOf(".")) + ".seqstats.json") + seqstat + } +} \ No newline at end of file diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SquishBed.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SquishBed.scala new file mode 100644 index 0000000000000000000000000000000000000000..f42f635be866d88fd92c9991c93f93afb9e880cd --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/SquishBed.scala @@ -0,0 +1,27 @@ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +/** + * Created by pjvanthof on 22/08/15. + */ +class SquishBed(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.SquishBed + + @Input(doc = "Input Bed file", required = true) + var input: File = _ + + @Output(doc = "Output interval list", required = true) + var output: File = _ + + var strandSensitive: Boolean = config("strandSensitive", default = false) + + override def cmdLine = super.cmdLine + + required("-I", input) + + required("-o", output) + + conditional(strandSensitive, "-s") +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VcfFilter.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VcfFilter.scala new file mode 100644 index 0000000000000000000000000000000000000000..06af56f025692658ad19c4e8375e7b81fbe599fc --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VcfFilter.scala @@ -0,0 +1,49 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +class VcfFilter(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.VcfFilter + + @Input(doc = "Input vcf", shortName = "I", required = true) + var inputVcf: File = _ + + @Output(doc = "Output vcf", shortName = "o", required = false) + var outputVcf: File = _ + + var minSampleDepth: Option[Int] = config("min_sample_depth") + var minTotalDepth: Option[Int] = config("min_total_depth") + var minAlternateDepth: Option[Int] = config("min_alternate_depth") + var minSamplesPass: Option[Int] = config("min_samples_pass") + var filterRefCalls: Boolean = config("filter_ref_calls", default = false) + + override def defaultCoreMemory = 3.0 + + override def cmdLine = super.cmdLine + + required("-I", inputVcf) + + required("-o", outputVcf) + + optional("--minSampleDepth", minSampleDepth) + + optional("--minTotalDepth", minTotalDepth) + + optional("--minAlternateDepth", minAlternateDepth) + + optional("--minSamplesPass", minSamplesPass) + + conditional(filterRefCalls, "--filterRefCalls") +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VcfStats.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VcfStats.scala new file mode 100644 index 0000000000000000000000000000000000000000..d0024681b727e6f72fdc31da7483bb7b1c98b4f6 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VcfStats.scala @@ -0,0 +1,113 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.summary.{ Summarizable, SummaryQScript } +import nl.lumc.sasc.biopet.core.{ Reference, ToolCommandFuntion } +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +import scala.io.Source + +/** + * This tool will generate statistics from a vcf file + * + * Created by pjvan_thof on 1/10/15. + */ +class VcfStats(val root: Configurable) extends ToolCommandFuntion with Summarizable with Reference { + def toolObject = nl.lumc.sasc.biopet.tools.VcfStats + + @Input(doc = "Input fastq", shortName = "I", required = true) + var input: File = _ + + @Input + protected var index: File = null + + @Output + protected var generalStats: File = null + + @Output + protected var genotypeStats: File = null + + override def defaultCoreMemory = 3.0 + override def defaultThreads = 3 + + protected var outputDir: File = _ + + var infoTags: List[String] = Nil + var genotypeTags: List[String] = Nil + var allInfoTags = false + var allGenotypeTags = false + var reference: File = _ + + override def beforeGraph(): Unit = { + reference = referenceFasta() + index = new File(input.getAbsolutePath + ".tbi") + } + + /** Set output dir and a output file */ + def setOutputDir(dir: File): Unit = { + outputDir = dir + generalStats = new File(dir, "general.tsv") + genotypeStats = new File(dir, "genotype-general.tsv") + jobOutputFile = new File(dir, ".vcfstats.out") + } + + /** Creates command to execute extension */ + override def cmdLine = super.cmdLine + + required("-I", input) + + required("-o", outputDir) + + repeat("--infoTag", infoTags) + + repeat("--genotypeTag", genotypeTags) + + conditional(allInfoTags, "--allInfoTags") + + conditional(allGenotypeTags, "--allGenotypeTags") + + required("-R", reference) + + /** Returns general stats to the summary */ + def summaryStats: Map[String, Any] = { + Map("info" -> (for ( + line <- Source.fromFile(generalStats).getLines().toList.tail; + values = line.split("\t") if values.size >= 2 && !values(0).isEmpty + ) yield values(0) -> values(1).toInt + ).toMap) + } + + /** return only general files to summary */ + def summaryFiles: Map[String, File] = Map( + "general_stats" -> generalStats, + "genotype_stats" -> genotypeStats + ) + + override def addToQscriptSummary(qscript: SummaryQScript, name: String): Unit = { + val data = Source.fromFile(genotypeStats).getLines().map(_.split("\t")).toArray + + for (s <- 1 until data(0).size) { + val sample = data(0)(s) + val stats = Map("genotype" -> (for (f <- 1 until data.length) yield { + data(f)(0) -> data(f)(s) + }).toMap) + + val sum = new Summarizable { + override def summaryFiles: Map[String, File] = Map() + override def summaryStats: Map[String, Any] = stats + } + + qscript.addSummarizable(sum, name, Some(sample)) + } + } +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VcfWithVcf.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VcfWithVcf.scala new file mode 100644 index 0000000000000000000000000000000000000000..e956f5c93f5de3d9243de7bb5c9c1684de8f3fa4 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VcfWithVcf.scala @@ -0,0 +1,58 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +/** + * Biopet extension for tool VcfWithVcf + */ +class VcfWithVcf(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.VcfWithVcf + + @Input(doc = "Input vcf file", shortName = "input", required = true) + var input: File = _ + + @Input(doc = "Secondary vcf file", shortName = "secondary", required = true) + var secondaryVcf: File = _ + + @Output(doc = "Output vcf file", shortName = "output", required = true) + var output: File = _ + + @Output(doc = "Output vcf file index", shortName = "output", required = true) + private var outputIndex: File = _ + + var fields: List[(String, String, Option[String])] = List() + + override def defaultCoreMemory = 2.0 + + override def beforeGraph() { + super.beforeGraph() + if (output.getName.endsWith(".gz")) outputIndex = new File(output.getAbsolutePath + ".tbi") + if (output.getName.endsWith(".vcf")) outputIndex = new File(output.getAbsolutePath + ".idx") + if (fields.isEmpty) throw new IllegalArgumentException("No fields found for VcfWithVcf") + } + + override def cmdLine = super.cmdLine + + required("-I", input) + + required("-o", output) + + required("-s", secondaryVcf) + + repeat("-f", fields.map(x => x._1 + ":" + x._2 + ":" + x._3.getOrElse("none"))) +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VepNormalizer.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VepNormalizer.scala new file mode 100644 index 0000000000000000000000000000000000000000..229f39628c9d1f8f8e23400b96a4fea957028e71 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/VepNormalizer.scala @@ -0,0 +1,53 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +/** + * This tool parses a VEP annotated VCF into a standard VCF file. + * The VEP puts all its annotations for each variant in an CSQ string, where annotations per transcript are comma-separated + * Annotations are then furthermore pipe-separated. + * This tool has two modes: + * 1) explode - explodes all transcripts such that each is on a unique line + * 2) standard - parse as a standard VCF, where multiple transcripts occur in the same line + * Created by ahbbollen on 10/27/14. + */ + +class VepNormalizer(val root: Configurable) extends ToolCommandFuntion { + def toolObject = nl.lumc.sasc.biopet.tools.VepNormalizer + + @Input(doc = "Input VCF, may be indexed", shortName = "InputFile", required = true) + var inputVCF: File = null + + @Output(doc = "Output VCF", shortName = "OutputFile", required = true) + var outputVcf: File = null + + var mode: String = config("mode", default = "standard") + var doNotRemove: Boolean = config("do_not_remove", default = false) + + override def defaultCoreMemory = 4.0 + + override def cmdLine = super.cmdLine + + required("-I", inputVCF) + + required("-O", outputVcf) + + required("-m", mode) + + conditional(doNotRemove, "--do-not-remove") +} diff --git a/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/WipeReads.scala b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/WipeReads.scala new file mode 100644 index 0000000000000000000000000000000000000000..1e468bd36ac52bed889ccc0871a092316def63f0 --- /dev/null +++ b/public/biopet-tools-extensions/src/main/scala/nl/lumc/sasc/biopet/extensions/tools/WipeReads.scala @@ -0,0 +1,45 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.extensions.tools + +import java.io.File + +import nl.lumc.sasc.biopet.core.ToolCommandFuntion +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Input, Output } + +// TODO: finish implementation for usage in pipelines +/** + * WipeReads function class for usage in Biopet pipelines + * + * @param root Configuration object for the pipeline + */ +class WipeReads(val root: Configurable) extends ToolCommandFuntion { + + def toolObject = nl.lumc.sasc.biopet.tools.WipeReads + + @Input(doc = "Input BAM file (must be indexed)", shortName = "I", required = true) + var inputBam: File = null + + @Input(doc = "Interval file", shortName = "r", required = true) + var intervalFile: File = null + + @Output(doc = "Output BAM", shortName = "o", required = true) + var outputBam: File = null + + @Output(doc = "BAM containing discarded reads", shortName = "f", required = false) + var discardedBam: File = null +} diff --git a/public/biopet-tools-package/pom.xml b/public/biopet-tools-package/pom.xml new file mode 100644 index 0000000000000000000000000000000000000000..8801d9280163613168001738d1a6c44c2fc22d2c --- /dev/null +++ b/public/biopet-tools-package/pom.xml @@ -0,0 +1,61 @@ +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <parent> + <artifactId>Biopet</artifactId> + <groupId>nl.lumc.sasc</groupId> + <version>0.5.0-SNAPSHOT</version> + </parent> + <modelVersion>4.0.0</modelVersion> + + <artifactId>BiopetToolsPackage</artifactId> + + <properties> + <sting.shade.phase>package</sting.shade.phase> + <app.main.class>nl.lumc.sasc.biopet.BiopetToolsExecutable</app.main.class> + </properties> + + <dependencies> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetTools</artifactId> + <version>${project.version}</version> + </dependency> + </dependencies> + + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-shade-plugin</artifactId> + <version>2.4.1</version> + <configuration> + <!--suppress MavenModelInspection --> + <finalName>BiopetTools-${project.version}-${git.commit.id.abbrev}</finalName> + <transformers> + <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> + <manifestEntries> + <Main-Class>${app.main.class}</Main-Class> + <!--suppress MavenModelInspection --> + <X-Compile-Source-JDK>${maven.compile.source}</X-Compile-Source-JDK> + <!--suppress MavenModelInspection --> + <X-Compile-Target-JDK>${maven.compile.target}</X-Compile-Target-JDK> + </manifestEntries> + </transformer> + </transformers> + <filters> + </filters> + </configuration> + <executions> + <execution> + <phase>package</phase> + <goals> + <goal>shade</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> \ No newline at end of file diff --git a/public/biopet-tools-package/src/main/scala/nl/lumc/sasc/biopet/BiopetToolsExecutable.scala b/public/biopet-tools-package/src/main/scala/nl/lumc/sasc/biopet/BiopetToolsExecutable.scala new file mode 100644 index 0000000000000000000000000000000000000000..f3eae932c03aee22aa3867c243d93eeda479dcc5 --- /dev/null +++ b/public/biopet-tools-package/src/main/scala/nl/lumc/sasc/biopet/BiopetToolsExecutable.scala @@ -0,0 +1,48 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet + +import nl.lumc.sasc.biopet.utils.{ BiopetExecutable, MainCommand } + +object BiopetToolsExecutable extends BiopetExecutable { + + def pipelines: List[MainCommand] = Nil + + def tools: List[MainCommand] = List( + nl.lumc.sasc.biopet.tools.MergeTables, + nl.lumc.sasc.biopet.tools.WipeReads, + nl.lumc.sasc.biopet.tools.ExtractAlignedFastq, + nl.lumc.sasc.biopet.tools.FastqSync, + nl.lumc.sasc.biopet.tools.BiopetFlagstat, + nl.lumc.sasc.biopet.tools.CheckAllelesVcfInBam, + nl.lumc.sasc.biopet.tools.VcfToTsv, + nl.lumc.sasc.biopet.tools.VcfFilter, + nl.lumc.sasc.biopet.tools.VcfStats, + nl.lumc.sasc.biopet.tools.FindRepeatsPacBio, + nl.lumc.sasc.biopet.tools.MpileupToVcf, + nl.lumc.sasc.biopet.tools.FastqSplitter, + nl.lumc.sasc.biopet.tools.BedtoolsCoverageToCounts, + nl.lumc.sasc.biopet.tools.SageCountFastq, + nl.lumc.sasc.biopet.tools.SageCreateLibrary, + nl.lumc.sasc.biopet.tools.SageCreateTagCounts, + nl.lumc.sasc.biopet.tools.BastyGenerateFasta, + nl.lumc.sasc.biopet.tools.MergeAlleles, + nl.lumc.sasc.biopet.tools.SamplesTsvToJson, + nl.lumc.sasc.biopet.tools.SeqStat, + nl.lumc.sasc.biopet.tools.VepNormalizer, + nl.lumc.sasc.biopet.tools.AnnotateVcfWithBed, + nl.lumc.sasc.biopet.tools.VcfWithVcf) +} diff --git a/public/biopet-tools/pom.xml b/public/biopet-tools/pom.xml new file mode 100644 index 0000000000000000000000000000000000000000..40bd255bd5730293f1973be9a3db1e49fe909d08 --- /dev/null +++ b/public/biopet-tools/pom.xml @@ -0,0 +1,57 @@ +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <parent> + <artifactId>Biopet</artifactId> + <groupId>nl.lumc.sasc</groupId> + <version>0.5.0-SNAPSHOT</version> + <relativePath>../</relativePath> + </parent> + <modelVersion>4.0.0</modelVersion> + + <artifactId>BiopetTools</artifactId> + + <repositories> + <repository> + <id>biojava-maven-repo</id> + <name>BioJava repository</name> + <url>http://www.biojava.org/download/maven/</url> + </repository> + </repositories> + <dependencies> + <dependency> + <groupId>org.testng</groupId> + <artifactId>testng</artifactId> + <version>6.8</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.mockito</groupId> + <artifactId>mockito-all</artifactId> + <version>1.9.5</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.scalatest</groupId> + <artifactId>scalatest_2.10</artifactId> + <version>2.2.1</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetUtils</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>com.google.guava</groupId> + <artifactId>guava</artifactId> + <version>18.0</version> + </dependency> + <dependency> + <groupId>org.biojava</groupId> + <artifactId>biojava3-sequencing</artifactId> + <version>3.1.0</version> + </dependency> + </dependencies> +</project> \ No newline at end of file diff --git a/public/biopet-tools/src/main/resources/nl/lumc/sasc/biopet/tools/plotHeatmap.R b/public/biopet-tools/src/main/resources/nl/lumc/sasc/biopet/tools/plotHeatmap.R new file mode 100644 index 0000000000000000000000000000000000000000..7f7237e90f6593e3d6cf110da005cd89c154d466 --- /dev/null +++ b/public/biopet-tools/src/main/resources/nl/lumc/sasc/biopet/tools/plotHeatmap.R @@ -0,0 +1,35 @@ +library('gplots') +library('RColorBrewer') + +args <- commandArgs(TRUE) +inputArg <- args[1] +outputArg <- args[2] +outputArgClustering <- args[3] +outputArgDendrogram <- args[4] + + +heat<-read.table(inputArg, header = 1, sep= '\t', stringsAsFactors = F) +#heat[heat==1] <- NA +rownames(heat) <- heat[,1] +heat<- heat[,-1] +heat<- as.matrix(heat) + +colNumber <- 50 +col <- rev(colorRampPalette(brewer.pal(11, "Spectral"))(colNumber)) +for (i in (colNumber+1):(colNumber+round((dist(range(heat)) - dist(range(heat[heat < 1]))) / dist(range(heat[heat < 1])) * colNumber))) { + col[i] <- col[colNumber] +} +col[length(col)] <- "#00FF00" + +png(file = outputArg, width = 1200, height = 1200) +heatmap.2(heat, trace = 'none', col = col, Colv=NA, Rowv=NA, dendrogram="none", margins = c(12, 12), na.color="#00FF00") +dev.off() + +hc <- hclust(d = dist(heat)) +png(file = outputArgDendrogram, width = 1200, height = 1200) +plot(as.dendrogram(hc), horiz=TRUE, asp=0.02) +dev.off() + +png(file = outputArgClustering, width = 1200, height = 1200) +heatmap.2(heat, trace = 'none', col = col, Colv="Rowv", dendrogram="row",margins = c(12, 12), na.color="#00FF00") +dev.off() diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/AnnotateVcfWithBed.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/AnnotateVcfWithBed.scala similarity index 68% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/AnnotateVcfWithBed.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/AnnotateVcfWithBed.scala index 6136c146b91e45fa8ddb85e49d2ad94c903a6033..5eedb3db7c718a6b5293fab25bff182c629e3909 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/AnnotateVcfWithBed.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/AnnotateVcfWithBed.scala @@ -20,15 +20,12 @@ import java.io.File import htsjdk.variant.variantcontext.VariantContextBuilder import htsjdk.variant.variantcontext.writer.{ AsyncVariantContextWriter, VariantContextWriterBuilder } import htsjdk.variant.vcf.{ VCFFileReader, VCFHeaderLineCount, VCFHeaderLineType, VCFInfoHeaderLine } -import nl.lumc.sasc.biopet.core.ToolCommand +import nl.lumc.sasc.biopet.utils.ToolCommand +import nl.lumc.sasc.biopet.utils.intervals.{ BedRecord, BedRecordList } import scala.collection.JavaConversions._ -import scala.collection.mutable -import scala.io.Source -class AnnotateVcfWithBed { - // TODO: Queue wrapper -} +// TODO: Queue wrapper /** * This a tools to annotate a vcf file with values from a bed file @@ -83,19 +80,9 @@ object AnnotateVcfWithBed extends ToolCommand { logger.info("Start") val argsParser = new OptParser - val commandArgs: Args = argsParser.parse(args, Args()) getOrElse sys.exit(1) - - val bedRecords: mutable.Map[String, List[(Int, Int, String)]] = mutable.Map() - // Read bed file - /* - // function bedRecord.getName will not compile, not clear why - for (bedRecord <- asScalaIteratorConverter(getFeatureReader(commandArgs.bedFile.toPath.toString, new BEDCodec(), false).iterator()).asScala) { - logger.debug(bedRecord) - bedRecords(bedRecord.getChr) = (bedRecord.getStart, bedRecord.getEnd, bedRecord.getName) :: bedRecords.getOrElse(bedRecord.getChr, Nil) - } - */ + val cmdArgs: Args = argsParser.parse(args, Args()) getOrElse sys.exit(1) - val fieldType = commandArgs.fieldType match { + val fieldType = cmdArgs.fieldType match { case "Integer" => VCFHeaderLineType.Integer case "Flag" => VCFHeaderLineType.Flag case "Character" => VCFHeaderLineType.Character @@ -104,48 +91,32 @@ object AnnotateVcfWithBed extends ToolCommand { } logger.info("Reading bed file") - - for (line <- Source.fromFile(commandArgs.bedFile).getLines()) { - val values = line.split("\t") - if (values.size >= 4) - bedRecords(values(0)) = (values(1).toInt, values(2).toInt, values(3)) :: bedRecords.getOrElse(values(0), Nil) - else values.size >= 3 && fieldType == VCFHeaderLineType.Flag - bedRecords(values(0)) = (values(1).toInt, values(2).toInt, "") :: bedRecords.getOrElse(values(0), Nil) - } - - logger.info("Sorting bed records") - - // Sort records when needed - for ((chr, record) <- bedRecords) { - bedRecords(chr) = record.sortBy(x => (x._1, x._2)) - } + val bedRecords = BedRecordList.fromFile(cmdArgs.bedFile).sorted logger.info("Starting output file") - val reader = new VCFFileReader(commandArgs.inputFile, false) + val reader = new VCFFileReader(cmdArgs.inputFile, false) val header = reader.getFileHeader val writer = new AsyncVariantContextWriter(new VariantContextWriterBuilder(). - setOutputFile(commandArgs.outputFile). + setOutputFile(cmdArgs.outputFile). setReferenceDictionary(header.getSequenceDictionary). build) - header.addMetaDataLine(new VCFInfoHeaderLine(commandArgs.fieldName, - VCFHeaderLineCount.UNBOUNDED, fieldType, commandArgs.fieldDescription)) + header.addMetaDataLine(new VCFInfoHeaderLine(cmdArgs.fieldName, + VCFHeaderLineCount.UNBOUNDED, fieldType, cmdArgs.fieldDescription)) writer.writeHeader(header) logger.info("Start reading vcf records") for (record <- reader) { - val overlaps = bedRecords.getOrElse(record.getContig, Nil).filter(x => { - record.getStart <= x._2 && record.getEnd >= x._1 - }) + val overlaps = bedRecords.overlapWith(BedRecord(record.getContig, record.getStart, record.getEnd)) if (overlaps.isEmpty) { writer.add(record) } else { val builder = new VariantContextBuilder(record) - if (fieldType == VCFHeaderLineType.Flag) builder.attribute(commandArgs.fieldName, true) - else builder.attribute(commandArgs.fieldName, overlaps.map(_._3).mkString(",")) + if (fieldType == VCFHeaderLineType.Flag) builder.attribute(cmdArgs.fieldName, true) + else builder.attribute(cmdArgs.fieldName, overlaps.map(_.name).mkString(",")) writer.add(builder.make) } } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BastyGenerateFasta.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BastyGenerateFasta.scala similarity index 80% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BastyGenerateFasta.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BastyGenerateFasta.scala index 33c26353d93c44b80c9f8750de4695eb60e4592a..2ad247900d7a1537e93c3f20b039cf1962d1fa42 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BastyGenerateFasta.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BastyGenerateFasta.scala @@ -21,62 +21,12 @@ import htsjdk.samtools.SamReaderFactory import htsjdk.samtools.reference.IndexedFastaSequenceFile import htsjdk.variant.variantcontext.VariantContext import htsjdk.variant.vcf.VCFFileReader -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ Reference, ToolCommand, ToolCommandFuntion } +import nl.lumc.sasc.biopet.utils.ToolCommand import nl.lumc.sasc.biopet.utils.VcfUtils._ -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } import scala.collection.JavaConversions._ import scala.collection.mutable.ListBuffer -class BastyGenerateFasta(val root: Configurable) extends ToolCommandFuntion with Reference { - javaMainClass = getClass.getName - - @Input(doc = "Input vcf file", required = false) - var inputVcf: File = _ - - @Input(doc = "Bam File", required = false) - var bamFile: File = _ - - @Input(doc = "reference", required = false) - var reference: File = _ - - @Output(doc = "Output fasta, variants only", required = false) - var outputVariants: File = _ - - @Output(doc = "Output fasta, variants only", required = false) - var outputConsensus: File = _ - - @Output(doc = "Output fasta, variants only", required = false) - var outputConsensusVariants: File = _ - - var snpsOnly: Boolean = config("snps_only", default = false) - var sampleName: String = _ - var minAD: Int = config("min_ad", default = 8) - var minDepth: Int = config("min_depth", default = 8) - var outputName: String = _ - - override def defaultCoreMemory = 4.0 - - override def beforeGraph(): Unit = { - super.beforeGraph() - reference = referenceFasta() - } - - override def commandLine = super.commandLine + - optional("--inputVcf", inputVcf) + - optional("--bamFile", bamFile) + - optional("--outputVariants", outputVariants) + - optional("--outputConsensus", outputConsensus) + - optional("--outputConsensusVariants", outputConsensusVariants) + - conditional(snpsOnly, "--snpsOnly") + - optional("--sampleName", sampleName) + - required("--outputName", outputName) + - optional("--minAD", minAD) + - optional("--minDepth", minDepth) + - optional("--reference", reference) -} - object BastyGenerateFasta extends ToolCommand { case class Args(inputVcf: File = null, outputVariants: File = null, @@ -155,7 +105,7 @@ object BastyGenerateFasta extends ToolCommand { } } - protected var cmdArgs: Args = _ + protected implicit var cmdArgs: Args = _ private val chunkSize = 100000 /** @@ -165,11 +115,18 @@ object BastyGenerateFasta extends ToolCommand { val argsParser = new OptParser cmdArgs = argsParser.parse(args, Args()) getOrElse sys.exit(1) - if (cmdArgs.outputVariants != null) writeVariantsOnly() - if (cmdArgs.outputConsensus != null || cmdArgs.outputConsensusVariants != null) writeConsensus() + if (cmdArgs.outputVariants != null) { + writeVariantsOnly() + } + if (cmdArgs.outputConsensus != null || cmdArgs.outputConsensusVariants != null) { + writeConsensus() + } + + //FIXME: what to do if outputcConsensus is set, but not outputConsensusVariants (and vice versa)? } protected def writeConsensus() { + //FIXME: preferably split this up in functions, so that they can be unit tested val referenceFile = new IndexedFastaSequenceFile(cmdArgs.reference) val referenceDict = referenceFile.getSequenceDictionary @@ -253,7 +210,7 @@ object BastyGenerateFasta extends ToolCommand { } } - protected def writeVariantsOnly() { + protected[tools] def writeVariantsOnly() { val writer = new PrintWriter(cmdArgs.outputVariants) writer.println(">" + cmdArgs.outputName) val vcfReader = new VCFFileReader(cmdArgs.inputVcf, false) @@ -265,17 +222,34 @@ object BastyGenerateFasta extends ToolCommand { vcfReader.close() } - protected def getMaxAllele(vcfRecord: VariantContext): String = { + // TODO: what does this do? + // Seems to me it finds the allele in a sample with the highest AD value + // if this allele is shorter than the largest allele, it will append '-' to the string + protected[tools] def getMaxAllele(vcfRecord: VariantContext)(implicit cmdArgs: Args): String = { val maxSize = getLongestAllele(vcfRecord).getBases.length - if (cmdArgs.sampleName == null) return fillAllele(vcfRecord.getReference.getBaseString, maxSize) + if (cmdArgs.sampleName == null) { + return fillAllele(vcfRecord.getReference.getBaseString, maxSize) + } val genotype = vcfRecord.getGenotype(cmdArgs.sampleName) - if (genotype == null) return fillAllele("", maxSize) + + if (genotype == null) { + return fillAllele("", maxSize) + } + val AD = if (genotype.hasAD) genotype.getAD else Array.fill(vcfRecord.getAlleles.size())(cmdArgs.minAD) - if (AD == null) return fillAllele("", maxSize) + + if (AD == null) { + return fillAllele("", maxSize) + } + val maxADid = AD.zipWithIndex.maxBy(_._1)._2 - if (AD(maxADid) < cmdArgs.minAD) return fillAllele("", maxSize) + + if (AD(maxADid) < cmdArgs.minAD) { + return fillAllele("", maxSize) + } + fillAllele(vcfRecord.getAlleles()(maxADid).getBaseString, maxSize) } -} \ No newline at end of file +} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BedToInterval.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BedToInterval.scala similarity index 71% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BedToInterval.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BedToInterval.scala index 37f6e5ddcbc570dda20925fc517df0f38ae30391..f9646c4c9ebe7473e5ccb39e833d8ae87176b84a 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BedToInterval.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BedToInterval.scala @@ -18,43 +18,15 @@ package nl.lumc.sasc.biopet.tools import java.io.{ File, PrintWriter } import htsjdk.samtools.{ SAMSequenceRecord, SamReaderFactory } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand +import scala.collection.JavaConversions._ import scala.io.Source -/** - * @deprecated Use picard.util.BedToIntervalList instead - */ -class BedToInterval(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - @Input(doc = "Input Bed file", required = true) - var input: File = _ - - @Input(doc = "Bam File", required = true) - var bamFile: File = _ - - @Output(doc = "Output interval list", required = true) - var output: File = _ - - override def defaultCoreMemory = 1.0 - - override def commandLine = super.commandLine + required("-I", input) + required("-b", bamFile) + required("-o", output) -} - /** * @deprecated Use picard.util.BedToIntervalList instead */ object BedToInterval extends ToolCommand { - def apply(root: Configurable, inputBed: File, inputBam: File, output: File): BedToInterval = { - val bedToInterval = new BedToInterval(root) - bedToInterval.input = inputBed - bedToInterval.bamFile = inputBam - bedToInterval.output = output - bedToInterval - } case class Args(inputFile: File = null, outputFile: File = null, bamFile: File = null) extends AbstractArgs @@ -80,8 +52,7 @@ object BedToInterval extends ToolCommand { val writer = new PrintWriter(commandArgs.outputFile) val inputSam = SamReaderFactory.makeDefault.open(commandArgs.bamFile) - val refs = for (SQ <- inputSam.getFileHeader.getSequenceDictionary.getSequences.toArray) yield { - val record = SQ.asInstanceOf[SAMSequenceRecord] + val refs = for (record <- inputSam.getFileHeader.getSequenceDictionary.getSequences) yield { writer.write("@SQ\tSN:" + record.getSequenceName + "\tLN:" + record.getSequenceLength + "\n") record.getSequenceName -> record.getSequenceLength } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BedtoolsCoverageToCounts.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BedtoolsCoverageToCounts.scala similarity index 75% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BedtoolsCoverageToCounts.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BedtoolsCoverageToCounts.scala index 2d3fb2b1cadc16a9f0d9db21c261654dacd9d0a9..b9ef718165ff3a7f2b67853c83c3e7d0fbbbe297 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BedtoolsCoverageToCounts.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BedtoolsCoverageToCounts.scala @@ -17,29 +17,11 @@ package nl.lumc.sasc.biopet.tools import java.io.{ File, PrintWriter } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand -import scala.collection.{ mutable, SortedMap } +import scala.collection.{ SortedMap, mutable } import scala.io.Source -class BedtoolsCoverageToCounts(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - @Input(doc = "Input fasta", shortName = "input", required = true) - var input: File = _ - - @Output(doc = "Output tag library", shortName = "output", required = true) - var output: File = _ - - override def defaultCoreMemory = 3.0 - - override def commandLine = super.commandLine + - required("-I", input) + - required("-o", output) -} - object BedtoolsCoverageToCounts extends ToolCommand { case class Args(input: File = null, output: File = null) extends AbstractArgs diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BiopetFlagstat.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BiopetFlagstat.scala similarity index 86% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BiopetFlagstat.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BiopetFlagstat.scala index 963d73ebc3625292377bcbf7b3d725fbb47c9e3b..4d98bda24b5c4eb1072cd5d1739efbf242d4a1bf 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/BiopetFlagstat.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/BiopetFlagstat.scala @@ -18,55 +18,25 @@ package nl.lumc.sasc.biopet.tools import java.io.{ File, PrintWriter } import htsjdk.samtools.{ SAMRecord, SamReaderFactory } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.summary.Summarizable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import nl.lumc.sasc.biopet.utils.ConfigUtils -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.{ ToolCommand, ConfigUtils } import scala.collection.JavaConversions._ import scala.collection.mutable -class BiopetFlagstat(val root: Configurable) extends ToolCommandFuntion with Summarizable { - javaMainClass = getClass.getName - - @Input(doc = "Input bam", shortName = "input", required = true) - var input: File = _ - - @Output(doc = "Output flagstat", shortName = "output", required = true) - var output: File = _ - - @Output(doc = "summary output file", shortName = "output", required = false) - var summaryFile: File = _ - - override def defaultCoreMemory = 6.0 - - override def commandLine = super.commandLine + required("-I", input) + required("-s", summaryFile) + " > " + required(output) - - def summaryFiles: Map[String, File] = Map() - - def summaryStats: Map[String, Any] = { - ConfigUtils.fileToConfigMap(summaryFile) - } -} - object BiopetFlagstat extends ToolCommand { - import scala.collection.mutable.Map - - def apply(root: Configurable, input: File, outputDir: File): BiopetFlagstat = { - val flagstat = new BiopetFlagstat(root) - flagstat.input = input - flagstat.output = new File(outputDir, input.getName.stripSuffix(".bam") + ".biopetflagstat") - flagstat.summaryFile = new File(outputDir, input.getName.stripSuffix(".bam") + ".biopetflagstat.json") - flagstat - } - case class Args(inputFile: File = null, summaryFile: Option[File] = None, region: Option[String] = None) extends AbstractArgs + case class Args(inputFile: File = null, + outputFile: Option[File] = None, + summaryFile: Option[File] = None, + region: Option[String] = None) extends AbstractArgs class OptParser extends AbstractOptParser { opt[File]('I', "inputFile") required () valueName "<file>" action { (x, c) => c.copy(inputFile = x) } text "input bam file" + opt[File]('o', "outputFile") valueName "<file>" action { (x, c) => + c.copy(outputFile = Some(x)) + } text "output file" opt[File]('s', "summaryFile") valueName "<file>" action { (x, c) => c.copy(summaryFile = Some(x)) } text "summary output file" @@ -151,7 +121,14 @@ object BiopetFlagstat extends ToolCommand { writer.close() } - println(flagstatCollector.report) + commandArgs.outputFile match { + case Some(file) => { + val writer = new PrintWriter(file) + writer.println(flagstatCollector.report) + writer.close() + } + case _ => println(flagstatCollector.report) + } } class FlagstatCollector { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBam.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBam.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBam.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBam.scala index 2797ffc2de25d6045f7717947149cd3900437ced..141633c5358a21c931555d5efbd18c615b7c6f77 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBam.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBam.scala @@ -21,7 +21,7 @@ import htsjdk.samtools.{ QueryInterval, SAMRecord, SamReader, SamReaderFactory } import htsjdk.variant.variantcontext.{ VariantContext, VariantContextBuilder } import htsjdk.variant.variantcontext.writer.{ AsyncVariantContextWriter, VariantContextWriterBuilder } import htsjdk.variant.vcf.{ VCFFileReader, VCFHeaderLineCount, VCFHeaderLineType, VCFInfoHeaderLine } -import nl.lumc.sasc.biopet.core.ToolCommand +import nl.lumc.sasc.biopet.utils.ToolCommand import scala.collection.JavaConversions._ import scala.collection.mutable @@ -116,7 +116,7 @@ object CheckAllelesVcfInBam extends ToolCommand { } val counts = for (samRecord <- bamIter if !filterRead(samRecord)) { - checkAlles(samRecord, vcfRecord) match { + checkAlleles(samRecord, vcfRecord) match { case Some(a) => if (countReports(sample).aCounts.contains(a)) countReports(sample).aCounts(a) += 1 else countReports(sample).aCounts += (a -> 1) case _ => countReports(sample).notFound += 1 @@ -142,7 +142,7 @@ object CheckAllelesVcfInBam extends ToolCommand { writer.close() } - def checkAlles(samRecord: SAMRecord, vcfRecord: VariantContext): Option[String] = { + def checkAlleles(samRecord: SAMRecord, vcfRecord: VariantContext): Option[String] = { val readStartPos = List.range(0, samRecord.getReadBases.length) .find(x => samRecord.getReferencePositionAtReadPosition(x + 1) == vcfRecord.getStart) getOrElse { return None } val readBases = samRecord.getReadBases diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/ExtractAlignedFastq.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/ExtractAlignedFastq.scala similarity index 99% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/ExtractAlignedFastq.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/ExtractAlignedFastq.scala index 57885eedfa68c7ac2c31312f1579f3b5fcda850b..b02074875b2e27021e123139da7ef42d1bce8700 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/ExtractAlignedFastq.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/ExtractAlignedFastq.scala @@ -20,7 +20,7 @@ import java.io.File import htsjdk.samtools.{ QueryInterval, SamReaderFactory, ValidationStringency } import htsjdk.samtools.fastq.{ BasicFastqWriter, FastqReader, FastqRecord } import htsjdk.samtools.util.Interval -import nl.lumc.sasc.biopet.core.ToolCommand +import nl.lumc.sasc.biopet.utils.ToolCommand import scala.collection.JavaConverters._ import scala.collection.mutable.{ Set => MSet } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSplitter.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSplitter.scala similarity index 74% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSplitter.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSplitter.scala index 0639841bf0e7db21337fa268f446ee3c42117e97..46a4abd0cac604dedc239845565f5d1acdd85b72 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSplitter.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSplitter.scala @@ -18,30 +18,7 @@ package nl.lumc.sasc.biopet.tools import java.io.File import htsjdk.samtools.fastq.{ AsyncFastqWriter, BasicFastqWriter, FastqReader } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } - -/** - * Queue extension for the FastqSplitter - * @param root Parent object - */ -class FastqSplitter(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - @Input(doc = "Input fastq", shortName = "input", required = true) - var input: File = _ - - @Output(doc = "Output fastq files", shortName = "output", required = true) - var output: List[File] = Nil - - override def defaultCoreMemory = 4.0 - - /** * Generate command to execute */ - override def commandLine = super.commandLine + - required("-I", input) + - repeat("-o", output) -} +import nl.lumc.sasc.biopet.utils.ToolCommand object FastqSplitter extends ToolCommand { @@ -55,10 +32,10 @@ object FastqSplitter extends ToolCommand { class OptParser extends AbstractOptParser { opt[File]('I', "inputFile") required () valueName "<file>" action { (x, c) => c.copy(inputFile = x) - } text "out is a required file property" + } text "Path to input file" opt[File]('o', "output") required () unbounded () valueName "<file>" action { (x, c) => c.copy(outputFile = x :: c.outputFile) - } text "out is a required file property" + } text "Path to output file" } /** diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSync.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSync.scala similarity index 71% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSync.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSync.scala index 9c320e4143dfae64108ae0c0e90a1339681b85a6..d4e6996de89b9a62a3b35f9ea894907882b4484f 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSync.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/FastqSync.scala @@ -18,88 +18,10 @@ package nl.lumc.sasc.biopet.tools import java.io.File import htsjdk.samtools.fastq.{ AsyncFastqWriter, BasicFastqWriter, FastqReader, FastqRecord } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.summary.Summarizable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand import scala.annotation.tailrec import scala.collection.JavaConverters._ -import scala.io.Source -import scala.util.matching.Regex - -/** - * FastqSync function class for usage in Biopet pipelines - * - * @param root Configuration object for the pipeline - */ -class FastqSync(val root: Configurable) extends ToolCommandFuntion with Summarizable { - - javaMainClass = getClass.getName - - @Input(doc = "Original FASTQ file (read 1 or 2)", shortName = "r", required = true) - var refFastq: File = null - - @Input(doc = "Input read 1 FASTQ file", shortName = "i", required = true) - var inputFastq1: File = null - - @Input(doc = "Input read 2 FASTQ file", shortName = "j", required = true) - var inputFastq2: File = null - - @Output(doc = "Output read 1 FASTQ file", shortName = "o", required = true) - var outputFastq1: File = null - - @Output(doc = "Output read 2 FASTQ file", shortName = "p", required = true) - var outputFastq2: File = null - - @Output(doc = "Sync statistics", required = true) - var outputStats: File = null - - override def defaultCoreMemory = 4.0 - - // executed command line - override def commandLine = - super.commandLine + - required("-r", refFastq) + - required("-i", inputFastq1) + - required("-j", inputFastq2) + - required("-o", outputFastq1) + - required("-p", outputFastq2) + " > " + - required(outputStats) - - def summaryFiles: Map[String, File] = Map() - - def summaryStats: Map[String, Any] = { - val regex = new Regex("""Filtered (\d*) reads from first read file. - |Filtered (\d*) reads from second read file. - |Synced read files contain (\d*) reads.""".stripMargin, - "R1", "R2", "RL") - - val (countFilteredR1, countFilteredR2, countRLeft) = - if (outputStats.exists) { - val text = Source - .fromFile(outputStats) - .getLines() - .mkString("\n") - regex.findFirstMatchIn(text) match { - case None => (0, 0, 0) - case Some(rmatch) => (rmatch.group("R1").toInt, rmatch.group("R2").toInt, rmatch.group("RL").toInt) - } - } else (0, 0, 0) - - Map("num_reads_discarded_R1" -> countFilteredR1, - "num_reads_discarded_R2" -> countFilteredR2, - "num_reads_kept" -> countRLeft - ) - } - - override def resolveSummaryConflict(v1: Any, v2: Any, key: String): Any = { - (v1, v2) match { - case (v1: Int, v2: Int) => v1 + v2 - case _ => v1 - } - } -} object FastqSync extends ToolCommand { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/FindRepeatsPacBio.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/FindRepeatsPacBio.scala similarity index 81% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/FindRepeatsPacBio.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/FindRepeatsPacBio.scala index dafa21f20d18ce0c0e364e17778b3547f0d2ed67..f752e3be6863928122bf233b9b911b2d50b8a432 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/FindRepeatsPacBio.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/FindRepeatsPacBio.scala @@ -15,24 +15,29 @@ */ package nl.lumc.sasc.biopet.tools -import java.io.File +import java.io.{ PrintWriter, File } import htsjdk.samtools.{ QueryInterval, SAMRecord, SamReaderFactory, ValidationStringency } -import nl.lumc.sasc.biopet.core.ToolCommand +import nl.lumc.sasc.biopet.utils.ToolCommand import scala.collection.JavaConversions._ import scala.io.Source object FindRepeatsPacBio extends ToolCommand { - case class Args(inputBam: File = null, inputBed: File = null) extends AbstractArgs + case class Args(inputBam: File = null, + outputFile: Option[File] = None, + inputBed: File = null) extends AbstractArgs class OptParser extends AbstractOptParser { opt[File]('I', "inputBam") required () maxOccurs 1 valueName "<file>" action { (x, c) => c.copy(inputBam = x) - } + } text "Path to input file" + opt[File]('o', "outputFile") maxOccurs 1 valueName "<file>" action { (x, c) => + c.copy(outputFile = Some(x)) + } text "Path to input file" opt[File]('b', "inputBed") required () maxOccurs 1 valueName "<file>" action { (x, c) => c.copy(inputBed = x) - } text "output file, default to stdout" + } text "Path to bed file" } /** @@ -50,7 +55,6 @@ object FindRepeatsPacBio extends ToolCommand { val header = List("chr", "startPos", "stopPos", "Repeat_seq", "repeatLength", "original_Repeat_readLength", "Calculated_repeat_readLength", "minLength", "maxLength", "inserts", "deletions", "notSpan") - println(header.mkString("\t")) for ( bedLine <- Source.fromFile(commandArgs.inputBed).getLines(); @@ -84,9 +88,21 @@ object FindRepeatsPacBio extends ToolCommand { if (length < minLength || minLength == -1) minLength = length } } - println(List(chr, startPos, stopPos, typeRepeat, repeatLength, oriRepeatLength, calcRepeatLength.mkString(","), minLength, - maxLength, inserts.mkString("/"), deletions.mkString("/"), notSpan).mkString("\t")) bamIter.close() + commandArgs.outputFile match { + case Some(file) => { + val writer = new PrintWriter(file) + writer.println(header.mkString("\t")) + writer.println(List(chr, startPos, stopPos, typeRepeat, repeatLength, oriRepeatLength, calcRepeatLength.mkString(","), minLength, + maxLength, inserts.mkString("/"), deletions.mkString("/"), notSpan).mkString("\t")) + writer.close() + } + case _ => { + println(header.mkString("\t")) + println(List(chr, startPos, stopPos, typeRepeat, repeatLength, oriRepeatLength, calcRepeatLength.mkString(","), minLength, + maxLength, inserts.mkString("/"), deletions.mkString("/"), notSpan).mkString("\t")) + } + } } } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/MergeAlleles.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/MergeAlleles.scala similarity index 75% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/MergeAlleles.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/MergeAlleles.scala index 8a5b05f6e00575090f687436b7bd348ba34cb6a2..02f4b4709398e244a40ba13139993d2230d54fd3 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/MergeAlleles.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/MergeAlleles.scala @@ -18,52 +18,16 @@ package nl.lumc.sasc.biopet.tools import java.io.File import htsjdk.samtools.reference.FastaSequenceFile -import htsjdk.variant.variantcontext.{ Allele, VariantContext, VariantContextBuilder } import htsjdk.variant.variantcontext.writer.{ AsyncVariantContextWriter, VariantContextWriterBuilder } +import htsjdk.variant.variantcontext.{ Allele, VariantContext, VariantContextBuilder } import htsjdk.variant.vcf.{ VCFFileReader, VCFHeader } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand +import nl.lumc.sasc.biopet.utils.config.Configurable import scala.collection.JavaConversions._ -import scala.collection.{ mutable, SortedMap } -import scala.collection.mutable.{ Map, Set } - -class MergeAlleles(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - @Input(doc = "Input vcf files", shortName = "input", required = true) - var input: List[File] = Nil - - @Output(doc = "Output vcf file", shortName = "output", required = true) - var output: File = _ - - @Output(doc = "Output vcf file index", shortName = "output", required = true) - private var outputIndex: File = _ - - var reference: File = config("reference") - - override def defaultCoreMemory = 1.0 - - override def beforeGraph() { - super.beforeGraph() - if (output.getName.endsWith(".gz")) outputIndex = new File(output.getAbsolutePath + ".tbi") - if (output.getName.endsWith(".vcf")) outputIndex = new File(output.getAbsolutePath + ".idx") - } - - override def commandLine = super.commandLine + - repeat("-I", input) + - required("-o", output) + - required("-R", reference) -} +import scala.collection.{ SortedMap, mutable } object MergeAlleles extends ToolCommand { - def apply(root: Configurable, input: List[File], output: File): MergeAlleles = { - val mergeAlleles = new MergeAlleles(root) - mergeAlleles.input = input - mergeAlleles.output = output - mergeAlleles - } case class Args(inputFiles: List[File] = Nil, outputFile: File = null, reference: File = null) extends AbstractArgs diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/MergeTables.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/MergeTables.scala similarity index 80% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/MergeTables.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/MergeTables.scala index e587df068766514e05bfa7ea17ca626a733f03bb..680cba33443e7efbcda4e2acb5b42809a12b3f5c 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/MergeTables.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/MergeTables.scala @@ -17,68 +17,11 @@ package nl.lumc.sasc.biopet.tools import java.io.{ BufferedWriter, File, FileWriter, OutputStreamWriter } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand import scala.collection.mutable.{ Set => MutSet } import scala.io.{ BufferedSource, Source } -/** - * Biopet wrapper for the [[MergeTables]] command line tool. - * - * @param root [[Configurable]] object - */ -class MergeTables(val root: Configurable) extends ToolCommandFuntion { - - javaMainClass = getClass.getName - - override def defaultCoreMemory = 6.0 - - /** List of input tabular files */ - @Input(doc = "Input table files", required = true) - var inputTables: List[File] = List.empty[File] - - /** Output file */ - @Output(doc = "Output merged table", required = true) - var output: File = null - - // TODO: should be List[Int] really - /** List of column indices to combine to make a unique identifier per row */ - var idColumnIndices: List[String] = config("id_column_indices", default = List("1")) - - /** Index of column from each tabular file containing the values to be put in the final merged table */ - var valueColumnIndex: Int = config("value_column_index", default = 2) - - /** Name of the identifier column in the output file */ - var idColumnName: Option[String] = config("id_column_name") - - /** Common file extension of all input files */ - var fileExtension: Option[String] = config("file_extension") - - /** Number of header lines from each input file to ignore */ - var numHeaderLines: Option[Int] = config("num_header_lines") - - /** String to use when a value is missing from an input file */ - var fallbackString: Option[String] = config("fallback_string") - - /** Column delimiter of each input file (used for splitting into columns */ - var delimiter: Option[String] = config("delimiter") - - // executed command line - override def commandLine = - super.commandLine + - required("-i", idColumnIndices.mkString(",")) + - required("-a", valueColumnIndex) + - optional("-n", idColumnName) + - optional("-e", fileExtension) + - optional("-m", numHeaderLines) + - optional("-f", fallbackString) + - optional("-d", delimiter) + - required("-o", output) + - required("", repeat(inputTables), escape = false) -} - object MergeTables extends ToolCommand { /** Type alias for sample name */ diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/MpileupToVcf.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/MpileupToVcf.scala similarity index 75% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/MpileupToVcf.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/MpileupToVcf.scala index bed6e3b6ec2149f8ac4cc0d80420e02829a170f1..c93eaed3993830d9e1860b827d772b2c925b6983 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/MpileupToVcf.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/MpileupToVcf.scala @@ -17,77 +17,13 @@ package nl.lumc.sasc.biopet.tools import java.io.{ File, PrintWriter } -import htsjdk.samtools.SamReaderFactory -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ Reference, ToolCommand, ToolCommandFuntion } -import nl.lumc.sasc.biopet.extensions.samtools.SamtoolsMpileup -import nl.lumc.sasc.biopet.utils.ConfigUtils -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand -import scala.collection.JavaConversions._ import scala.collection.mutable import scala.collection.mutable.ArrayBuffer import scala.io.Source import scala.math.{ floor, round } -class MpileupToVcf(val root: Configurable) extends ToolCommandFuntion with Reference { - javaMainClass = getClass.getName - - @Input(doc = "Input mpileup file", shortName = "mpileup", required = false) - var inputMpileup: File = _ - - @Input(doc = "Input bam file", shortName = "bam", required = false) - var inputBam: File = _ - - @Output(doc = "Output tag library", shortName = "output", required = true) - var output: File = _ - - var minDP: Option[Int] = config("min_dp") - var minAP: Option[Int] = config("min_ap") - var homoFraction: Option[Double] = config("homoFraction") - var ploidy: Option[Int] = config("ploidy") - var sample: String = _ - var reference: String = _ - - override def defaultCoreMemory = 3.0 - - override def defaults = ConfigUtils.mergeMaps(Map("samtoolsmpileup" -> Map("disable_baq" -> true, "min_map_quality" -> 1)), - super.defaults) - - override def beforeGraph() { - super.beforeGraph() - reference = referenceFasta().getAbsolutePath - val samtoolsMpileup = new SamtoolsMpileup(this) - } - - override def beforeCmd(): Unit = { - if (sample == null && inputBam.exists()) { - val inputSam = SamReaderFactory.makeDefault.open(inputBam) - val readGroups = inputSam.getFileHeader.getReadGroups - val samples = readGroups.map(readGroup => readGroup.getSample).distinct - sample = samples.head - inputSam.close() - } - } - - override def commandLine = { - (if (inputMpileup == null) { - val samtoolsMpileup = new SamtoolsMpileup(this) - samtoolsMpileup.reference = referenceFasta() - samtoolsMpileup.input = List(inputBam) - samtoolsMpileup.cmdPipe + " | " - } else "") + - super.commandLine + - required("-o", output) + - optional("--minDP", minDP) + - optional("--minAP", minAP) + - optional("--homoFraction", homoFraction) + - optional("--ploidy", ploidy) + - required("--sample", sample) + - (if (inputBam == null) required("-I", inputMpileup) else "") - } -} - object MpileupToVcf extends ToolCommand { case class Args(input: File = null, output: File = null, sample: String = null, minDP: Int = 8, minAP: Int = 2, homoFraction: Double = 0.8, ploidy: Int = 2) extends AbstractArgs @@ -122,8 +58,6 @@ object MpileupToVcf extends ToolCommand { def main(args: Array[String]): Unit = { val argsParser = new OptParser val commandArgs: Args = argsParser.parse(args, Args()) getOrElse sys.exit(1) - - import scala.collection.mutable.Map if (commandArgs.input != null && !commandArgs.input.exists) throw new IllegalStateException("Input file does not exist") val writer = new PrintWriter(commandArgs.output) diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/PrefixFastq.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/PrefixFastq.scala similarity index 66% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/PrefixFastq.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/PrefixFastq.scala index cc4305d4b1708206c1d47d0dd60419cedd0ba7cf..a5040a24f735e88d0c0877b7e2e0923c1e5de2c8 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/PrefixFastq.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/PrefixFastq.scala @@ -18,54 +18,10 @@ package nl.lumc.sasc.biopet.tools import java.io.File import htsjdk.samtools.fastq.{ AsyncFastqWriter, BasicFastqWriter, FastqReader, FastqRecord } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } - -/** - * Queue class for PrefixFastq tool - * - * Created by pjvan_thof on 1/13/15. - */ -class PrefixFastq(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - override def defaultCoreMemory = 1.0 - - @Input(doc = "Input fastq", shortName = "I", required = true) - var inputFastq: File = _ - - @Output(doc = "Output fastq", shortName = "o", required = true) - var outputFastq: File = _ - - @Argument(doc = "Prefix seq", required = true) - var prefixSeq: String = _ - - /** - * Creates command to execute extension - * @return - */ - override def commandLine = super.commandLine + - required("-i", inputFastq) + - required("-o", outputFastq) + - optional("-s", prefixSeq) -} +import nl.lumc.sasc.biopet.utils.ToolCommand +import nl.lumc.sasc.biopet.utils.config.Configurable object PrefixFastq extends ToolCommand { - /** - * Create a PrefixFastq class object with a sufix ".prefix.fastq" in the output folder - * - * @param root parent object - * @param input input file - * @param outputDir outputFolder - * @return PrefixFastq class object - */ - def apply(root: Configurable, input: File, outputDir: String): PrefixFastq = { - val prefixFastq = new PrefixFastq(root) - prefixFastq.inputFastq = input - prefixFastq.outputFastq = new File(outputDir, input.getName + ".prefix.fastq") - prefixFastq - } /** * Args for commandline program diff --git a/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/RegionAfCount.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/RegionAfCount.scala new file mode 100644 index 0000000000000000000000000000000000000000..bdfb0be7f8e51495fed176ac7ab386104a04948e --- /dev/null +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/RegionAfCount.scala @@ -0,0 +1,152 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.tools + +import java.io.{ PrintWriter, InputStream, File } +import java.util + +import htsjdk.variant.vcf.VCFFileReader +import nl.lumc.sasc.biopet.utils.ToolCommand +import nl.lumc.sasc.biopet.utils.rscript.ScatterPlot +import nl.lumc.sasc.biopet.utils.intervals.{ BedRecord, BedRecordList } + +import scala.collection.JavaConversions._ +import scala.collection.mutable + +object RegionAfCount extends ToolCommand { + case class Args(bedFile: File = null, + outputPrefix: String = null, + scatterpPlot: Boolean = false, + vcfFiles: List[File] = Nil) extends AbstractArgs + + class OptParser extends AbstractOptParser { + opt[File]('b', "bedFile") unbounded () required () maxOccurs 1 valueName "<file>" action { (x, c) => + c.copy(bedFile = x) + } + opt[String]('o', "outputPrefix") unbounded () required () maxOccurs 1 valueName "<file prefix>" action { (x, c) => + c.copy(outputPrefix = x) + } + opt[Unit]('s', "scatterPlot") unbounded () action { (x, c) => + c.copy(scatterpPlot = true) + } + opt[File]('V', "vcfFile") unbounded () minOccurs 1 action { (x, c) => + c.copy(vcfFiles = c.vcfFiles ::: x :: Nil) + } + } + + def main(args: Array[String]): Unit = { + val argsParser = new OptParser + val cmdArgs: Args = argsParser.parse(args, Args()) getOrElse sys.exit(1) + + logger.info("Start") + logger.info("Reading bed file") + + val bedRecords = BedRecordList.fromFile(cmdArgs.bedFile).sorted + + logger.info(s"Combine ${bedRecords.allRecords.size} bed records") + + val combinedBedRecords = bedRecords.combineOverlap + + logger.info(s"${combinedBedRecords.allRecords.size} left") + logger.info(s"${combinedBedRecords.allRecords.size * cmdArgs.vcfFiles.size} query's to do") + logger.info("Reading vcf files") + + case class AfCounts(var names: Double = 0, + var namesExons: Double = 0, + var namesIntrons: Double = 0, + var namesCoding: Double = 0, + var utr: Double = 0, + var utr5: Double = 0, + var utr3: Double = 0) + + var c = 0 + val afCounts = (for (vcfFile <- cmdArgs.vcfFiles.par) yield vcfFile -> { + val reader = new VCFFileReader(vcfFile, true) + val afCounts: mutable.Map[String, AfCounts] = mutable.Map() + for (region <- combinedBedRecords.allRecords) yield { + val originals = region.originals() + for (variant <- reader.query(region.chr, region.start, region.end)) { + val sum = (variant.getAttribute("AF", 0) match { + case a: util.ArrayList[_] => a.map(_.toString.toDouble).toArray + case s => Array(s.toString.toDouble) + }).sum + val interval = BedRecord(variant.getContig, variant.getStart, variant.getEnd) + originals.foreach { x => + val name = x.name.getOrElse(s"${x.chr}:${x.start}-${x.end}") + if (!afCounts.contains(name)) afCounts += name -> AfCounts() + afCounts(name).names += sum + val exons = x.exons.getOrElse(Seq()).filter(_.overlapWith(interval)) + val introns = x.introns.getOrElse(Seq()).filter(_.overlapWith(interval)) + val utr5 = x.utr5.map(_.overlapWith(interval)) + val utr3 = x.utr3.map(_.overlapWith(interval)) + if (exons.nonEmpty) { + afCounts(name).namesExons += sum + if (!utr5.getOrElse(false) && !utr3.getOrElse(false)) afCounts(name).namesCoding += sum + } + if (introns.nonEmpty) afCounts(name).namesIntrons += sum + if (utr5.getOrElse(false) || utr3.getOrElse(false)) afCounts(name).utr += sum + if (utr5.getOrElse(false)) afCounts(name).utr5 += sum + if (utr3.getOrElse(false)) afCounts(name).utr3 += sum + } + } + c += 1 + if (c % 100 == 0) logger.info(s"$c regions done") + } + afCounts.toMap + }).toMap + + logger.info(s"Done reading, ${c} regions") + + logger.info("Writing output files") + + def writeOutput(tsvFile: File, function: AfCounts => Double): Unit = { + val writer = new PrintWriter(tsvFile) + writer.println("\t" + cmdArgs.vcfFiles.map(_.getName).mkString("\t")) + for (r <- cmdArgs.vcfFiles.foldLeft(Set[String]())((a, b) => a ++ afCounts(b).keySet)) { + writer.print(r + "\t") + writer.println(cmdArgs.vcfFiles.map(x => function(afCounts(x).getOrElse(r, AfCounts()))).mkString("\t")) + } + writer.close() + + if (cmdArgs.scatterpPlot) generatePlot(tsvFile) + } + + def generatePlot(tsvFile: File): Unit = { + logger.info(s"Generate plot for $tsvFile") + + val scatterPlot = new ScatterPlot(null) + scatterPlot.input = tsvFile + scatterPlot.output = new File(tsvFile.getAbsolutePath.stripSuffix(".tsv") + ".png") + scatterPlot.ylabel = Some("Sum of AFs") + scatterPlot.width = Some(1200) + scatterPlot.height = Some(1000) + scatterPlot.runLocal() + } + for ( + arg <- List[(File, AfCounts => Double)]( + (new File(cmdArgs.outputPrefix + ".names.tsv"), _.names), + (new File(cmdArgs.outputPrefix + ".names.exons_only.tsv"), _.namesExons), + (new File(cmdArgs.outputPrefix + ".names.introns_only.tsv"), _.namesIntrons), + (new File(cmdArgs.outputPrefix + ".names.coding.tsv"), _.namesCoding), + (new File(cmdArgs.outputPrefix + ".names.utr.tsv"), _.utr), + (new File(cmdArgs.outputPrefix + ".names.utr5.tsv"), _.utr5), + (new File(cmdArgs.outputPrefix + ".names.utr3.tsv"), _.utr3) + ).par + ) writeOutput(arg._1, arg._2) + + logger.info("Done") + } +} \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SageCountFastq.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SageCountFastq.scala similarity index 73% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SageCountFastq.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SageCountFastq.scala index c24134181ab3cc6d3cac9c2bd5576f643b51f688..2049a337670655d56c0846e3e64eb6274d6f8d7a 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SageCountFastq.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SageCountFastq.scala @@ -17,29 +17,10 @@ package nl.lumc.sasc.biopet.tools import java.io.{ File, FileReader, PrintWriter } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.biojava3.sequencing.io.fastq.{ Fastq, SangerFastqReader, StreamListener } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand +import org.biojava3.sequencing.io.fastq.{ Fastq, StreamListener, SangerFastqReader } -import scala.collection.{ mutable, SortedMap } -import scala.collection.mutable.Map - -class SageCountFastq(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - @Input(doc = "Input fasta", shortName = "input", required = true) - var input: File = _ - - @Output(doc = "Output tag library", shortName = "output", required = true) - var output: File = _ - - override def defaultCoreMemory = 3.0 - - override def commandLine = super.commandLine + - required("-I", input) + - required("-o", output) -} +import scala.collection.{ SortedMap, mutable } object SageCountFastq extends ToolCommand { case class Args(input: File = null, output: File = null) extends AbstractArgs @@ -73,7 +54,7 @@ object SageCountFastq extends ToolCommand { if (counts.contains(seq)) counts(seq) += 1 else counts += (seq -> 1) count += 1 - if (count % 1000000 == 0) System.err.println(count + " sequences done") + if (count % 1000000 == 0) logger.info(count + " sequences done") } }) logger.info(count + " sequences done") diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateLibrary.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateLibrary.scala similarity index 72% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateLibrary.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateLibrary.scala index 674717fca9e51d3b76a6e8991ad69f6de03ebb0d..f5ffa5e5a43e63c95d6610ca9e58c1170ae44369 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateLibrary.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateLibrary.scala @@ -17,48 +17,13 @@ package nl.lumc.sasc.biopet.tools import java.io.{ File, PrintWriter } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } +import nl.lumc.sasc.biopet.utils.ToolCommand import org.biojava3.core.sequence.DNASequence import org.biojava3.core.sequence.io.FastaReaderHelper -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } -import scala.collection.JavaConversions._ -import scala.collection.{ mutable, SortedMap } -import scala.collection.mutable.{ Map, Set } +import scala.collection.{ SortedMap, mutable } import scala.util.matching.Regex - -class SageCreateLibrary(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - @Input(doc = "Input fasta", shortName = "input", required = true) - var input: File = _ - - @Output(doc = "Output tag library", shortName = "output", required = true) - var output: File = _ - - @Output(doc = "Output no tags", shortName = "noTagsOutput", required = false) - var noTagsOutput: File = _ - - @Output(doc = "Output no anti tags library", shortName = "noAntiTagsOutput", required = false) - var noAntiTagsOutput: File = _ - - @Output(doc = "Output file all genes", shortName = "allGenes", required = false) - var allGenesOutput: File = _ - - var tag: String = config("tag", default = "CATG") - var length: Option[Int] = config("length", default = 17) - - override def defaultCoreMemory = 3.0 - - override def commandLine = super.commandLine + - required("-I", input) + - optional("--tag", tag) + - optional("--length", length) + - optional("--noTagsOutput", noTagsOutput) + - optional("--noAntiTagsOutput", noAntiTagsOutput) + - required("-o", output) -} +import scala.collection.JavaConversions._ object SageCreateLibrary extends ToolCommand { case class Args(input: File = null, tag: String = "CATG", length: Int = 17, output: File = null, noTagsOutput: File = null, @@ -77,10 +42,10 @@ object SageCreateLibrary extends ToolCommand { opt[Int]("length") required () unbounded () action { (x, c) => c.copy(length = x) } - opt[File]("noTagsOutput") required () unbounded () valueName "<file>" action { (x, c) => + opt[File]("noTagsOutput") unbounded () valueName "<file>" action { (x, c) => c.copy(noTagsOutput = x) } - opt[File]("noAntiTagsOutput") required () unbounded () valueName "<file>" action { (x, c) => + opt[File]("noAntiTagsOutput") unbounded () valueName "<file>" action { (x, c) => c.copy(noAntiTagsOutput = x) } opt[File]("allGenesOutput") unbounded () valueName "<file>" action { (x, c) => @@ -88,8 +53,7 @@ object SageCreateLibrary extends ToolCommand { } } - var tagRegex: Regex = null - var geneRegex = """ENSG[0-9]{11}""".r + val geneRegex = """ENSG[0-9]{11}""".r val tagGenesMap: mutable.Map[String, TagGenes] = mutable.Map() @@ -114,23 +78,24 @@ object SageCreateLibrary extends ToolCommand { if (!commandArgs.input.exists) throw new IllegalStateException("Input file not found, file: " + commandArgs.input) - tagRegex = (commandArgs.tag + "[CATG]{" + commandArgs.length + "}").r + val tagRegex = (commandArgs.tag + "[CATG]{" + commandArgs.length + "}").r var count = 0 - System.err.println("Reading fasta file") + logger.info("Reading fasta file") val reader = FastaReaderHelper.readFastaDNASequence(commandArgs.input) - System.err.println("Finding tags") + logger.info("Finding tags") for ((name, seq) <- reader) { - getTags(name, seq) + val result = getTags(name, seq, tagRegex) + addTagresultToTaglib(name, result) count += 1 - if (count % 10000 == 0) System.err.println(count + " transcripts done") + if (count % 10000 == 0) logger.info(count + " transcripts done") } - System.err.println(count + " transcripts done") + logger.info(count + " transcripts done") - System.err.println("Start sorting tags") + logger.info("Start sorting tags") val tagGenesMapSorted: SortedMap[String, TagGenes] = SortedMap(tagGenesMap.toArray: _*) - System.err.println("Writting output files") + logger.info("Writting output files") val writer = new PrintWriter(commandArgs.output) writer.println("#tag\tfirstTag\tAllTags\tFirstAntiTag\tAllAntiTags") for ((tag, genes) <- tagGenesMapSorted) { @@ -167,7 +132,7 @@ object SageCreateLibrary extends ToolCommand { } } - def addTagresultToTaglib(name: String, tagResult: TagResult) { + private def addTagresultToTaglib(name: String, tagResult: TagResult) { val id = name.split(" ").head //.stripPrefix("hg19_ensGene_") val geneID = geneRegex.findFirstIn(name).getOrElse("unknown_gene") allGenes.add(geneID) @@ -195,15 +160,13 @@ object SageCreateLibrary extends ToolCommand { } } - def getTags(name: String, seq: DNASequence): TagResult = { + def getTags(name: String, seq: DNASequence, tagRegex: Regex): TagResult = { val allTags: List[String] = for (tag <- tagRegex.findAllMatchIn(seq.getSequenceAsString).toList) yield tag.toString() val firstTag = if (allTags.isEmpty) null else allTags.last val allAntiTags: List[String] = for (tag <- tagRegex.findAllMatchIn(seq.getReverseComplement.getSequenceAsString).toList) yield tag.toString() val firstAntiTag = if (allAntiTags.isEmpty) null else allAntiTags.head val result = new TagResult(firstTag, allTags, firstAntiTag, allAntiTags) - addTagresultToTaglib(name, result) - result } } \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateTagCounts.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateTagCounts.scala similarity index 74% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateTagCounts.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateTagCounts.scala index 1392dcefefba1b5e28b0eb795c7d460c86a3abf2..9ff037eae73d81cd39345d6028e903af220ae35c 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateTagCounts.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SageCreateTagCounts.scala @@ -17,46 +17,11 @@ package nl.lumc.sasc.biopet.tools import java.io.{ File, PrintWriter } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand -import scala.collection.{ mutable, SortedMap } -import scala.collection.mutable.Map +import scala.collection.{ SortedMap, mutable } import scala.io.Source -class SageCreateTagCounts(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - @Input(doc = "Raw count file", shortName = "input", required = true) - var input: File = _ - - @Input(doc = "tag library", shortName = "taglib", required = true) - var tagLib: File = _ - - @Output(doc = "Sense count file", shortName = "sense", required = true) - var countSense: File = _ - - @Output(doc = "Sense all coun filet", shortName = "allsense", required = true) - var countAllSense: File = _ - - @Output(doc = "AntiSense count file", shortName = "antisense", required = true) - var countAntiSense: File = _ - - @Output(doc = "AntiSense all count file", shortName = "allantisense", required = true) - var countAllAntiSense: File = _ - - override def defaultCoreMemory = 3.0 - - override def commandLine = super.commandLine + - required("-I", input) + - required("--tagLib", tagLib) + - optional("--countSense", countSense) + - optional("--countAllSense", countAllSense) + - optional("--countAntiSense", countAntiSense) + - optional("--countAllAntiSense", countAllAntiSense) -} - object SageCreateTagCounts extends ToolCommand { case class Args(input: File = null, tagLib: File = null, countSense: File = null, countAllSense: File = null, countAntiSense: File = null, countAllAntiSense: File = null) extends AbstractArgs @@ -148,9 +113,18 @@ object SageCreateTagCounts extends ToolCommand { writer.close() } } - writeFile(commandArgs.countSense, senseCounts) - writeFile(commandArgs.countAllSense, allSenseCounts) - writeFile(commandArgs.countAntiSense, antiSenseCounts) - writeFile(commandArgs.countAllAntiSense, allAntiSenseCounts) + + if (commandArgs.countSense != null) { + writeFile(commandArgs.countSense, senseCounts) + } + if (commandArgs.countAllAntiSense != null) { + writeFile(commandArgs.countAllAntiSense, allAntiSenseCounts) + } + if (commandArgs.countAllSense != null) { + writeFile(commandArgs.countAllSense, allSenseCounts) + } + if (commandArgs.countAntiSense != null) { + writeFile(commandArgs.countAntiSense, antiSenseCounts) + } } } \ No newline at end of file diff --git a/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SamplesTsvToJson.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SamplesTsvToJson.scala new file mode 100644 index 0000000000000000000000000000000000000000..ff31439d9f0622ef73e3386e207971caa715a607 --- /dev/null +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SamplesTsvToJson.scala @@ -0,0 +1,95 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.tools + +import java.io.{ PrintWriter, File } + +import nl.lumc.sasc.biopet.utils.ConfigUtils._ +import nl.lumc.sasc.biopet.utils.ToolCommand +import scala.collection.mutable + +import scala.io.Source + +/** + * This tool can convert a tsv to a json file + */ +object SamplesTsvToJson extends ToolCommand { + case class Args(inputFiles: List[File] = Nil, outputFile: Option[File] = None) extends AbstractArgs + + class OptParser extends AbstractOptParser { + opt[File]('i', "inputFiles") required () unbounded () valueName "<file>" action { (x, c) => + c.copy(inputFiles = x :: c.inputFiles) + } text "Input must be a tsv file, first line is seen as header and must at least have a 'sample' column, 'library' column is optional, multiple files allowed" + opt[File]('o', "outputFile") unbounded () valueName "<file>" action { (x, c) => + c.copy(outputFile = Some(x)) + } + } + + /** Executes SamplesTsvToJson */ + def main(args: Array[String]): Unit = { + val argsParser = new OptParser + val commandArgs: Args = argsParser.parse(args, Args()) getOrElse sys.exit(1) + + val jsonString = stringFromInputs(commandArgs.inputFiles) + commandArgs.outputFile match { + case Some(file) => { + val writer = new PrintWriter(file) + writer.println(jsonString) + writer.close() + } + case _ => println(jsonString) + } + } + + def mapFromFile(inputFile: File): Map[String, Any] = { + val reader = Source.fromFile(inputFile) + val lines = reader.getLines().toList.filter(!_.isEmpty) + val header = lines.head.split("\t") + val sampleColumn = header.indexOf("sample") + val libraryColumn = header.indexOf("library") + if (sampleColumn == -1) throw new IllegalStateException("Sample column does not exist in: " + inputFile) + + val sampleLibCache: mutable.Set[(String, Option[String])] = mutable.Set() + + val librariesValues: List[Map[String, Any]] = for (tsvLine <- lines.tail) yield { + val values = tsvLine.split("\t") + require(header.length == values.length, "Number of columns is not the same as the header") + val sample = values(sampleColumn) + val library = if (libraryColumn != -1) Some(values(libraryColumn)) else None + + //FIXME: this is a workaround, should be removed after fixing #180 + if (sample.head.isDigit || library.forall(_.head.isDigit)) + throw new IllegalStateException("Sample or library may not start with a number") + + if (sampleLibCache.contains((sample, library))) + throw new IllegalStateException(s"Combination of $sample ${library.map("and " + _).getOrElse("")} is found multiple times") + else sampleLibCache.add((sample, library)) + val valuesMap = (for ( + t <- 0 until values.size if !values(t).isEmpty && t != sampleColumn && t != libraryColumn + ) yield header(t) -> values(t)).toMap + library match { + case Some(lib) => Map("samples" -> Map(sample -> Map("libraries" -> Map(lib -> valuesMap)))) + case _ => Map("samples" -> Map(sample -> valuesMap)) + } + } + librariesValues.foldLeft(Map[String, Any]())((acc, kv) => mergeMaps(acc, kv)) + } + + def stringFromInputs(inputs: List[File]): String = { + val map = inputs.map(f => mapFromFile(f)).foldLeft(Map[String, Any]())((acc, kv) => mergeMaps(acc, kv)) + mapToJson(map).spaces2 + } +} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SeqStat.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SeqStat.scala similarity index 77% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SeqStat.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SeqStat.scala index 1ec41952922c2058d02f680efb8a14a1757c85a2..74d2512b9e09f40430477ab42afd6410f5582514 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SeqStat.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SeqStat.scala @@ -15,14 +15,10 @@ */ package nl.lumc.sasc.biopet.tools -import java.io.File +import java.io.{ PrintWriter, File } import htsjdk.samtools.fastq.{ FastqReader, FastqRecord } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.summary.Summarizable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import nl.lumc.sasc.biopet.utils.ConfigUtils -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.{ ToolCommand, ConfigUtils } import scala.collection.JavaConverters._ import scala.collection.immutable.Map @@ -30,65 +26,9 @@ import scala.collection.mutable import scala.language.postfixOps /** - * Seqstat function class for usage in Biopet pipelines - * - * @param root Configuration object for the pipeline + * Created by pjvanthof on 11/09/15. */ -class SeqStat(val root: Configurable) extends ToolCommandFuntion with Summarizable { - javaMainClass = getClass.getName - - @Input(doc = "Input FASTQ", shortName = "input", required = true) - var input: File = null - - @Output(doc = "Output JSON", shortName = "output", required = true) - var output: File = null - - override def defaultCoreMemory = 2.5 - - override def commandLine = super.commandLine + required("-i", input) + " > " + required(output) - - def summaryStats: Map[String, Any] = { - val map = ConfigUtils.fileToConfigMap(output) - - ConfigUtils.any2map(map.getOrElse("stats", Map())) - } - - def summaryFiles: Map[String, File] = Map() - - override def resolveSummaryConflict(v1: Any, v2: Any, key: String): Any = { - (v1, v2) match { - case (v1: Array[_], v2: Array[_]) => v1.zip(v2).map(v => resolveSummaryConflict(v._1, v._2, key)) - case (v1: List[_], v2: List[_]) => v1.zip(v2).map(v => resolveSummaryConflict(v._1, v._2, key)) - case (v1: Int, v2: Int) if key == "len_min" => if (v1 < v2) v1 else v2 - case (v1: Int, v2: Int) if key == "len_max" => if (v1 > v2) v1 else v2 - case (v1: Int, v2: Int) => v1 + v2 - case (v1: Long, v2: Long) => v1 + v2 - case _ => v1 - } - } -} - -object FqEncoding extends Enumeration { - type FqEncoding = Value - val Sanger = Value(33, "Sanger") - val Solexa = Value(64, "Solexa") - val Unknown = Value(0, "Unknown") -} - object SeqStat extends ToolCommand { - def apply(root: Configurable, input: File, output: File): SeqStat = { - val seqstat = new SeqStat(root) - seqstat.input = input - seqstat.output = new File(output, input.getName.substring(0, input.getName.lastIndexOf(".")) + ".seqstats.json") - seqstat - } - - def apply(root: Configurable, fastqfile: File, outDir: String): SeqStat = { - val seqstat = new SeqStat(root) - seqstat.input = fastqfile - seqstat.output = new File(outDir, fastqfile.getName.substring(0, fastqfile.getName.lastIndexOf(".")) + ".seqstats.json") - seqstat - } import FqEncoding._ @@ -108,20 +48,23 @@ object SeqStat extends ToolCommand { private var baseQualHistoMap: mutable.Map[Int, Long] = mutable.Map(0 -> 0) private var readQualHistoMap: mutable.Map[Int, Long] = mutable.Map(0 -> 0) - case class Args(fastq: File = new File("")) extends AbstractArgs + case class Args(fastq: File = null, outputJson: Option[File] = None) extends AbstractArgs class OptParser extends AbstractOptParser { head( s""" - |$commandName - Summarize FastQ + |$commandName - Summarize FastQ """.stripMargin) - opt[File]('i', "fastq") required () valueName "<fastq>" action { (x, c) => + opt[File]('i', "fastq") required () unbounded () valueName "<fastq>" action { (x, c) => c.copy(fastq = x) } validate { x => if (x.exists) success else failure("FASTQ file not found") } text "FastQ file to generate stats from" + opt[File]('o', "output") unbounded () valueName "<json>" action { (x, c) => + c.copy(outputJson = Some(x)) + } text "File to write output to, if not supplied output go to stdout" } /** @@ -317,6 +260,21 @@ object SeqStat extends ToolCommand { )) ) - println(ConfigUtils.mapToJson(report)) + commandArgs.outputJson match { + case Some(file) => { + val writer = new PrintWriter(file) + writer.println(ConfigUtils.mapToJson(report)) + writer.close() + } + case _ => println(ConfigUtils.mapToJson(report)) + } } } + +object FqEncoding extends Enumeration { + type FqEncoding = Value + val Sanger = Value(33, "Sanger") + val Solexa = Value(64, "Solexa") + val Unknown = Value(0, "Unknown") +} + diff --git a/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SquishBed.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SquishBed.scala new file mode 100644 index 0000000000000000000000000000000000000000..74aad0081a547e9b0a25ab2e86c851dbafd4ba3b --- /dev/null +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SquishBed.scala @@ -0,0 +1,57 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File + +import nl.lumc.sasc.biopet.utils.ToolCommand +import nl.lumc.sasc.biopet.utils.intervals.BedRecordList + +/** + * Created by pjvanthof on 22/08/15. + */ +object SquishBed extends ToolCommand { + + case class Args(input: File = null, + output: File = null, + strandSensitive: Boolean = false) extends AbstractArgs + + class OptParser extends AbstractOptParser { + opt[File]('I', "input") required () valueName "<file>" action { (x, c) => + c.copy(input = x) + } + opt[File]('o', "output") required () unbounded () valueName "<file>" action { (x, c) => + c.copy(output = x) + } + opt[Unit]('s', "strandSensitive") unbounded () valueName "<file>" action { (x, c) => + c.copy(strandSensitive = true) + } + } + + /** + * @param args the command line arguments + */ + def main(args: Array[String]): Unit = { + val argsParser = new OptParser + val cmdArgs: Args = argsParser.parse(args, Args()) getOrElse sys.exit(1) + + if (!cmdArgs.input.exists) throw new IllegalStateException("Input file not found, file: " + cmdArgs.input) + + logger.info("Start") + + val records = BedRecordList.fromFile(cmdArgs.input) + val length = records.length + val refLength = records.combineOverlap.length + logger.info(s"Total bases: $length") + logger.info(s"Total bases on reference: $refLength") + logger.info("Start squishing") + val squishBed = records.squishBed(cmdArgs.strandSensitive).sorted + logger.info("Done squishing") + val squishLength = squishBed.length + val squishRefLength = squishBed.combineOverlap.length + logger.info(s"Total bases left: $squishLength") + logger.info(s"Total bases left on reference: $squishRefLength") + logger.info(s"Total bases removed from ref: ${refLength - squishRefLength}") + squishBed.writeToFile(cmdArgs.output) + + logger.info("Done") + } +} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SummaryToTsv.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SummaryToTsv.scala similarity index 53% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SummaryToTsv.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SummaryToTsv.scala index 398a9b73f733963960f7aa5ee55414a086d6127d..9c4629bd45d0111657f871e9587ed09faa6dd611 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/SummaryToTsv.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/SummaryToTsv.scala @@ -15,10 +15,10 @@ */ package nl.lumc.sasc.biopet.tools -import java.io.File +import java.io.{ PrintWriter, File } -import nl.lumc.sasc.biopet.core.ToolCommand -import nl.lumc.sasc.biopet.core.summary.Summary +import nl.lumc.sasc.biopet.utils.ToolCommand +import nl.lumc.sasc.biopet.utils.summary.Summary /** * This is a tools to extract values from a summary to a tsv file @@ -35,15 +35,27 @@ object SummaryToTsv extends ToolCommand { opt[File]('s', "summary") required () unbounded () maxOccurs 1 valueName "<file>" action { (x, c) => c.copy(summary = x) } - opt[File]('o', "output") maxOccurs 1 unbounded () valueName "<file>" action { (x, c) => + opt[File]('o', "outputFile") unbounded () maxOccurs 1 valueName "<file>" action { (x, c) => c.copy(outputFile = Some(x)) } - opt[String]('p', "path") required () unbounded () valueName "<value>" action { (x, c) => + opt[String]('p', "path") required () unbounded () valueName "<string>" action { (x, c) => c.copy(values = c.values ::: x :: Nil) - } + } text + """ + |String that determines the values extracted from the summary. Should be of the format: + |<header_name>=<namespace>:<lower_namespace>:<even_lower_namespace>... + """.stripMargin opt[String]('m', "mode") maxOccurs 1 unbounded () valueName "<root|sample|lib>" action { (x, c) => c.copy(mode = x) - } + } validate { + x => if (Set("root", "sample", "lib").contains(x)) success else failure("Unsupported mode") + } text + """ + |Determines on what level to aggregate data. + |root: at the root level + |sample: at the sample level + |lib: at the library level + """.stripMargin } @@ -56,14 +68,23 @@ object SummaryToTsv extends ToolCommand { val paths = cmdArgs.values.map(x => { val split = x.split("=", 2) split(0) -> split(1).split(":") - }) + }).toMap - val values = fetchValues(summary, paths.toMap, sample = cmdArgs.mode == "sample", lib = cmdArgs.mode == "lib") + val values = fetchValues(summary, paths, sample = cmdArgs.mode == "sample", lib = cmdArgs.mode == "lib") - println(paths.map(_._1).mkString("\t", "\t", "")) - - for (lineId <- values.head._2.keys) { - println(paths.map(x => values(x._1)(lineId).getOrElse("")).mkString(lineId + "\t", "\t", "")) + cmdArgs.outputFile match { + case Some(file) => { + val writer = new PrintWriter(file) + writer.println(createHeader(paths)) + for (lineId <- values.head._2.keys) + writer.println(createLine(paths, values, lineId)) + writer.close() + } + case _ => { + println(createHeader(paths)) + for (lineId <- values.head._2.keys) + println(createLine(paths, values, lineId)) + } } } @@ -71,9 +92,19 @@ object SummaryToTsv extends ToolCommand { sample: Boolean = false, lib: Boolean = false) = { for ((name, path) <- paths) yield name -> { - if (lib) summary.getLibraryValues(path: _*).map(a => (a._1._1 + "-" + a._1._2) -> a._2) - else if (sample) summary.getSampleValues(path: _*) + if (lib) { + summary.getLibraryValues(path: _*).map(a => (a._1._1 + "-" + a._1._2) -> a._2) + } else if (sample) summary.getSampleValues(path: _*) else Map("value" -> summary.getValue(path: _*)) } } + + def createHeader(paths: Map[String, Array[String]]): String = { + paths.map(_._1).mkString("\t", "\t", "") + } + + def createLine(paths: Map[String, Array[String]], + values: Map[String, Map[String, Option[Any]]], lineId: String): String = { + paths.map(x => values(x._1)(lineId).getOrElse("")).mkString(lineId + "\t", "\t", "") + } } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfFilter.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfFilter.scala similarity index 91% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfFilter.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfFilter.scala index 981203de4652d72e16b5a0b0018d05679943b984..2799fa38b78b6d2acb123b80d5ae5a3bdde3bf3a 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfFilter.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfFilter.scala @@ -20,43 +20,15 @@ import java.io.File import htsjdk.variant.variantcontext.{ GenotypeType, VariantContext } import htsjdk.variant.variantcontext.writer.{ AsyncVariantContextWriter, VariantContextWriterBuilder } import htsjdk.variant.vcf.VCFFileReader -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand +import nl.lumc.sasc.biopet.utils.config.Configurable import scala.collection.JavaConversions._ import scala.io.Source -class VcfFilter(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - @Input(doc = "Input vcf", shortName = "I", required = true) - var inputVcf: File = _ - - @Output(doc = "Output vcf", shortName = "o", required = false) - var outputVcf: File = _ - - var minSampleDepth: Option[Int] = config("min_sample_depth") - var minTotalDepth: Option[Int] = config("min_total_depth") - var minAlternateDepth: Option[Int] = config("min_alternate_depth") - var minSamplesPass: Option[Int] = config("min_samples_pass") - var filterRefCalls: Boolean = config("filter_ref_calls", default = false) - - override def defaultCoreMemory = 3.0 - - override def commandLine = super.commandLine + - required("-I", inputVcf) + - required("-o", outputVcf) + - optional("--minSampleDepth", minSampleDepth) + - optional("--minTotalDepth", minTotalDepth) + - optional("--minAlternateDepth", minAlternateDepth) + - optional("--minSamplesPass", minSamplesPass) + - conditional(filterRefCalls, "--filterRefCalls") -} - object VcfFilter extends ToolCommand { /** Container class for a trio */ - protected case class Trio(child: String, father: String, mother: String) { + protected[tools] case class Trio(child: String, father: String, mother: String) { def this(arg: String) = { this(arg.split(":")(0), arg.split(":")(1), arg.split(":")(2)) } @@ -208,9 +180,9 @@ object VcfFilter extends ToolCommand { } else invertedWriter.foreach(_.add(record)) counterTotal += 1 - if (counterTotal % 100000 == 0) logger.info(counterTotal + " variants processed, " + counterLeft + " left") + if (counterTotal % 100000 == 0) logger.info(s"$counterTotal variants processed, $counterLeft passed filter") } - logger.info(counterTotal + " variants processed, " + counterLeft + " left") + logger.info(s"$counterTotal variants processed, $counterLeft passed filter") reader.close() writer.close() invertedWriter.foreach(_.close()) @@ -278,7 +250,7 @@ object VcfFilter extends ToolCommand { } /** - * Checks if AD genotype field have a minimal value + * Checks if non-ref AD genotype field have a minimal value * @param record VCF record * @param minAlternateDepth minimal depth * @param minSamplesPass Minimal number of samples to pass filter diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfStats.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfStats.scala similarity index 78% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfStats.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfStats.scala index 47af91bf3dd14014e67b0082b399f5ca8f24f8b5..62b04375f630ab59fea18d9c1d974bdf038cb767 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfStats.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfStats.scala @@ -21,10 +21,9 @@ import htsjdk.samtools.reference.FastaSequenceFile import htsjdk.samtools.util.Interval import htsjdk.variant.variantcontext.{ Allele, Genotype, VariantContext } import htsjdk.variant.vcf.VCFFileReader -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.summary.{ Summarizable, SummaryQScript } -import nl.lumc.sasc.biopet.core.{ Reference, ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.utils.intervals.BedRecordList import scala.collection.JavaConversions._ import scala.collection.mutable @@ -37,89 +36,6 @@ import scala.util.Random * * Created by pjvan_thof on 1/10/15. */ -class VcfStats(val root: Configurable) extends ToolCommandFuntion with Summarizable with Reference { - javaMainClass = getClass.getName - - @Input(doc = "Input fastq", shortName = "I", required = true) - var input: File = _ - - @Input - protected var index: File = null - - @Output - protected var generalStats: File = null - - @Output - protected var genotypeStats: File = null - - override def defaultCoreMemory = 3.0 - override def defaultThreads = 3 - - protected var outputDir: File = _ - - var infoTags: List[String] = Nil - var genotypeTags: List[String] = Nil - var allInfoTags = false - var allGenotypeTags = false - var reference: File = _ - - override def beforeGraph(): Unit = { - reference = referenceFasta() - index = new File(input.getAbsolutePath + ".tbi") - } - - /** Set output dir and a output file */ - def setOutputDir(dir: File): Unit = { - outputDir = dir - generalStats = new File(dir, "general.tsv") - genotypeStats = new File(dir, "genotype-general.tsv") - jobOutputFile = new File(dir, ".vcfstats.out") - } - - /** Creates command to execute extension */ - override def commandLine = super.commandLine + - required("-I", input) + - required("-o", outputDir) + - repeat("--infoTag", infoTags) + - repeat("--genotypeTag", genotypeTags) + - conditional(allInfoTags, "--allInfoTags") + - conditional(allGenotypeTags, "--allGenotypeTags") + - required("-R", reference) - - /** Returns general stats to the summary */ - def summaryStats: Map[String, Any] = { - Map("info" -> (for ( - line <- Source.fromFile(generalStats).getLines().toList.tail; - values = line.split("\t") if values.size >= 2 && !values(0).isEmpty - ) yield values(0) -> values(1).toInt - ).toMap) - } - - /** return only general files to summary */ - def summaryFiles: Map[String, File] = Map( - "general_stats" -> generalStats, - "genotype_stats" -> genotypeStats - ) - - override def addToQscriptSummary(qscript: SummaryQScript, name: String): Unit = { - val data = Source.fromFile(genotypeStats).getLines().map(_.split("\t")).toArray - - for (s <- 1 until data(0).size) { - val sample = data(0)(s) - val stats = Map("genotype" -> (for (f <- 1 until data.length) yield { - data(f)(0) -> data(f)(s) - }).toMap) - - val sum = new Summarizable { - override def summaryFiles: Map[String, File] = Map() - override def summaryStats: Map[String, Any] = stats - } - - qscript.addSummarizable(sum, name, Some(sample)) - } - } -} - object VcfStats extends ToolCommand { /** Commandline argument */ case class Args(inputFile: File = null, @@ -135,47 +51,64 @@ object VcfStats extends ToolCommand { generalWiggle: List[String] = Nil, genotypeWiggle: List[String] = Nil) extends AbstractArgs + private val generalWiggleOptions = List("Total", "Biallelic", "ComplexIndel", "Filtered", "FullyDecoded", "Indel", "Mixed", + "MNP", "MonomorphicInSamples", "NotFiltered", "PointEvent", "PolymorphicInSamples", + "SimpleDeletion", "SimpleInsertion", "SNP", "StructuralIndel", "Symbolic", + "SymbolicOrSV", "Variant") + + private val genotypeWiggleOptions = List("Total", "Het", "HetNonRef", "Hom", "HomRef", "HomVar", "Mixed", "NoCall", "NonInformative", + "Available", "Called", "Filtered", "Variant") + /** Parsing commandline arguments */ class OptParser extends AbstractOptParser { - opt[File]('I', "inputFile") required () unbounded () valueName "<file>" action { (x, c) => + opt[File]('I', "inputFile") required () unbounded () maxOccurs 1 valueName "<file>" action { (x, c) => c.copy(inputFile = x) - } - opt[File]('R', "referenceFile") required () unbounded () valueName "<file>" action { (x, c) => + } validate { + x => if (x.exists) success else failure("Input VCF required") + } text "Input VCF file (required)" + opt[File]('R', "referenceFile") required () unbounded () maxOccurs 1 valueName "<file>" action { (x, c) => c.copy(referenceFile = x) - } - opt[File]('o', "outputDir") required () unbounded () valueName "<file>" action { (x, c) => + } validate { + x => if (x.exists) success else failure("Reference file required") + } text "Fasta reference which was used to call input VCF (required)" + opt[File]('o', "outputDir") required () unbounded () maxOccurs 1 valueName "<file>" action { (x, c) => c.copy(outputDir = x) - } - //TODO: add interval argument - /* + } validate { + x => if (x == null) failure("Output directory required") else success + } text "Path to directory for output (required)" opt[File]('i', "intervals") unbounded () valueName ("<file>") action { (x, c) => c.copy(intervals = Some(x)) - } - */ + } text "Path to interval (BED) file (optional)" opt[String]("infoTag") unbounded () valueName "<tag>" action { (x, c) => c.copy(infoTags = x :: c.infoTags) - } + } text "Summarize these info tags. Default is all tags" opt[String]("genotypeTag") unbounded () valueName "<tag>" action { (x, c) => c.copy(genotypeTags = x :: c.genotypeTags) - } + } text "Summarize these genotype tags. Default is all tags" opt[Unit]("allInfoTags") unbounded () action { (x, c) => c.copy(allInfoTags = true) - } + } text "Summarize all info tags. Default false" opt[Unit]("allGenotypeTags") unbounded () action { (x, c) => c.copy(allGenotypeTags = true) - } + } text "Summarize all genotype tags. Default false" opt[Int]("binSize") unbounded () action { (x, c) => c.copy(binSize = x) - } + } text "Binsize in estimated base pairs" opt[Unit]("writeBinStats") unbounded () action { (x, c) => c.copy(writeBinStats = true) - } + } text "Write bin statistics. Default False" opt[String]("generalWiggle") unbounded () action { (x, c) => c.copy(generalWiggle = x :: c.generalWiggle, writeBinStats = true) - } + } validate { + x => if (generalWiggleOptions.contains(x)) success else failure(s"""Nonexistent field $x""") + } text s"""Create a wiggle track with bin size <binSize> for any of the following statistics: + |${generalWiggleOptions.mkString(", ")}""".stripMargin opt[String]("genotypeWiggle") unbounded () action { (x, c) => c.copy(genotypeWiggle = x :: c.genotypeWiggle, writeBinStats = true) - } + } validate { + x => if (genotypeWiggleOptions.contains(x)) success else failure(s"""Non-existent field $x""") + } text s"""Create a wiggle track with bin size <binSize> for any of the following genotype fields: + |${genotypeWiggleOptions.mkString(", ")}""".stripMargin } /** @@ -258,7 +191,7 @@ object VcfStats extends ToolCommand { } } - protected var commandArgs: Args = _ + protected var cmdArgs: Args = _ val defaultGenotypeFields = List("DP", "GQ", "AD", "AD-ref", "AD-alt", "AD-used", "AD-not_used", "general") @@ -273,9 +206,9 @@ object VcfStats extends ToolCommand { def main(args: Array[String]): Unit = { logger.info("Started") val argsParser = new OptParser - commandArgs = argsParser.parse(args, Args()) getOrElse sys.exit(1) + cmdArgs = argsParser.parse(args, Args()) getOrElse sys.exit(1) - val reader = new VCFFileReader(commandArgs.inputFile, true) + val reader = new VCFFileReader(cmdArgs.inputFile, true) val header = reader.getFileHeader val samples = header.getSampleNamesInOrder.toList @@ -283,44 +216,36 @@ object VcfStats extends ToolCommand { val adInfoTags = { (for ( - infoTag <- commandArgs.infoTags if !defaultInfoFields.contains(infoTag) + infoTag <- cmdArgs.infoTags if !defaultInfoFields.contains(infoTag) ) yield { require(header.getInfoHeaderLine(infoTag) != null, "Info tag '" + infoTag + "' not found in header of vcf file") infoTag }) ::: (for ( - line <- header.getInfoHeaderLines if commandArgs.allInfoTags if !defaultInfoFields.contains(line.getID) if !commandArgs.infoTags.contains(line.getID) + line <- header.getInfoHeaderLines if cmdArgs.allInfoTags if !defaultInfoFields.contains(line.getID) if !cmdArgs.infoTags.contains(line.getID) ) yield { line.getID }).toList ::: defaultInfoFields } val adGenotypeTags = (for ( - genotypeTag <- commandArgs.genotypeTags if !defaultGenotypeFields.contains(genotypeTag) + genotypeTag <- cmdArgs.genotypeTags if !defaultGenotypeFields.contains(genotypeTag) ) yield { require(header.getFormatHeaderLine(genotypeTag) != null, "Format tag '" + genotypeTag + "' not found in header of vcf file") genotypeTag }) ::: (for ( - line <- header.getFormatHeaderLines if commandArgs.allGenotypeTags if !defaultGenotypeFields.contains(line.getID) if !commandArgs.genotypeTags.contains(line.getID) if line.getID != "PL" + line <- header.getFormatHeaderLines if cmdArgs.allGenotypeTags if !defaultGenotypeFields.contains(line.getID) if !cmdArgs.genotypeTags.contains(line.getID) if line.getID != "PL" ) yield { line.getID }).toList ::: defaultGenotypeFields - val referenceFile = new FastaSequenceFile(commandArgs.referenceFile, true) + val bedRecords = (cmdArgs.intervals match { + case Some(intervals) => BedRecordList.fromFile(intervals).validateContigs(cmdArgs.referenceFile) + case _ => BedRecordList.fromReference(cmdArgs.referenceFile) + }).combineOverlap.scatter(cmdArgs.binSize) - val intervals: List[Interval] = ( - for ( - seq <- referenceFile.getSequenceDictionary.getSequences; - chunks = (seq.getSequenceLength / commandArgs.binSize) + (if (seq.getSequenceLength % commandArgs.binSize == 0) 0 else 1); - i <- 1 to chunks - ) yield { - val size = seq.getSequenceLength / chunks - val begin = size * (i - 1) + 1 - val end = if (i >= chunks) seq.getSequenceLength else size * i - new Interval(seq.getSequenceName, begin, end) - } - ).toList + val intervals: List[Interval] = bedRecords.toSamIntervals.toList - val totalBases = intervals.foldRight(0L)(_.length() + _) + val totalBases = bedRecords.length // Reading vcf records logger.info("Start reading vcf records") @@ -352,7 +277,7 @@ object VcfStats extends ToolCommand { val stats = (for (intervals <- Random.shuffle(intervals).grouped(intervals.size / (if (intervals.size > 10) 4 else 1)).toList.par) yield { val chunkStats = for (intervals <- intervals.grouped(25)) yield { val binStats = for (interval <- intervals.par) yield { - val reader = new VCFFileReader(commandArgs.inputFile, true) + val reader = new VCFFileReader(cmdArgs.inputFile, true) var chunkCounter = 0 val stats = createStats logger.info("Starting on: " + interval) @@ -375,8 +300,8 @@ object VcfStats extends ToolCommand { } reader.close() - if (commandArgs.writeBinStats) { - val binOutputDir = new File(commandArgs.outputDir, "bins" + File.separator + interval.getContig) + if (cmdArgs.writeBinStats) { + val binOutputDir = new File(cmdArgs.outputDir, "bins" + File.separator + interval.getContig) writeGenotypeField(stats, samples, "general", binOutputDir, prefix = "genotype-" + interval.getStart + "-" + interval.getEnd) writeField(stats, "general", binOutputDir, prefix = interval.getStart + "-" + interval.getEnd) @@ -393,51 +318,52 @@ object VcfStats extends ToolCommand { logger.info("Done reading vcf records") // Writing info fields to tsv files - val infoOutputDir = new File(commandArgs.outputDir, "infotags") - writeField(stats, "general", commandArgs.outputDir) + val infoOutputDir = new File(cmdArgs.outputDir, "infotags") + writeField(stats, "general", cmdArgs.outputDir) for (field <- adInfoTags.distinct.par) { writeField(stats, field, infoOutputDir) - for (line <- referenceFile.getSequenceDictionary.getSequences) { + for (line <- new FastaSequenceFile(cmdArgs.referenceFile, true).getSequenceDictionary.getSequences) { val chr = line.getSequenceName writeField(stats, field, new File(infoOutputDir, "chrs" + File.separator + chr), chr = chr) } } // Write genotype field to tsv files - val genotypeOutputDir = new File(commandArgs.outputDir, "genotypetags") - writeGenotypeField(stats, samples, "general", commandArgs.outputDir, prefix = "genotype") + val genotypeOutputDir = new File(cmdArgs.outputDir, "genotypetags") + writeGenotypeField(stats, samples, "general", cmdArgs.outputDir, prefix = "genotype") for (field <- adGenotypeTags.distinct.par) { writeGenotypeField(stats, samples, field, genotypeOutputDir) - for (line <- referenceFile.getSequenceDictionary.getSequences) { + for (line <- new FastaSequenceFile(cmdArgs.referenceFile, true).getSequenceDictionary.getSequences) { val chr = line.getSequenceName writeGenotypeField(stats, samples, field, new File(genotypeOutputDir, "chrs" + File.separator + chr), chr = chr) } } // Write sample distributions to tsv files - val sampleDistributionsOutputDir = new File(commandArgs.outputDir, "sample_distributions") + val sampleDistributionsOutputDir = new File(cmdArgs.outputDir, "sample_distributions") for (field <- sampleDistributions) { writeField(stats, "SampleDistribution-" + field, sampleDistributionsOutputDir) } // Write general wiggle tracks - for (field <- commandArgs.generalWiggle) { - val file = new File(commandArgs.outputDir, "wigs" + File.separator + "general-" + field + ".wig") + for (field <- cmdArgs.generalWiggle) { + val file = new File(cmdArgs.outputDir, "wigs" + File.separator + "general-" + field + ".wig") writeWiggle(intervals, field, "count", file, genotype = false) } // Write sample wiggle tracks - for (field <- commandArgs.genotypeWiggle; sample <- samples) { - val file = new File(commandArgs.outputDir, "wigs" + File.separator + "genotype-" + sample + "-" + field + ".wig") + for (field <- cmdArgs.genotypeWiggle; sample <- samples) { + val file = new File(cmdArgs.outputDir, "wigs" + File.separator + "genotype-" + sample + "-" + field + ".wig") writeWiggle(intervals, field, sample, file, genotype = true) } - writeOverlap(stats, _.genotypeOverlap, commandArgs.outputDir + "/sample_compare/genotype_overlap", samples) - writeOverlap(stats, _.alleleOverlap, commandArgs.outputDir + "/sample_compare/allele_overlap", samples) + writeOverlap(stats, _.genotypeOverlap, cmdArgs.outputDir + "/sample_compare/genotype_overlap", samples) + writeOverlap(stats, _.alleleOverlap, cmdArgs.outputDir + "/sample_compare/allele_overlap", samples) logger.info("Done") } + //FIXME: does only work correct for reference and not with a bed file protected def writeWiggle(intervals: List[Interval], row: String, column: String, outputFile: File, genotype: Boolean): Unit = { val groupedIntervals = intervals.groupBy(_.getContig).map { case (k, v) => k -> v.sortBy(_.getStart) } outputFile.getParentFile.mkdirs() @@ -448,8 +374,8 @@ object VcfStats extends ToolCommand { writer.println(s"fixedStep chrom=$chr start=1 step=$length span=$length") for (interval <- intervals) { val file = { - if (genotype) new File(commandArgs.outputDir, "bins" + File.separator + chr + File.separator + "genotype-" + interval.getStart + "-" + interval.getEnd + "-general.tsv") - else new File(commandArgs.outputDir, "bins" + File.separator + chr + File.separator + interval.getStart + "-" + interval.getEnd + "-general.tsv") + if (genotype) new File(cmdArgs.outputDir, "bins" + File.separator + chr + File.separator + "genotype-" + interval.getStart + "-" + interval.getEnd + "-general.tsv") + else new File(cmdArgs.outputDir, "bins" + File.separator + chr + File.separator + interval.getStart + "-" + interval.getEnd + "-general.tsv") } writer.println(valueFromTsv(file, row, column).getOrElse(0)) } @@ -490,7 +416,7 @@ object VcfStats extends ToolCommand { } /** Function to check all general stats, all info expect sample/genotype specific stats */ - protected def checkGeneral(record: VariantContext, additionalTags: List[String]): Map[String, Map[String, Map[Any, Int]]] = { + protected[tools] def checkGeneral(record: VariantContext, additionalTags: List[String]): Map[String, Map[String, Map[Any, Int]]] = { val buffer = mutable.Map[String, Map[Any, Int]]() def addToBuffer(key: String, value: Any, found: Boolean): Unit = { @@ -499,7 +425,7 @@ object VcfStats extends ToolCommand { else buffer += key -> (map + (value -> map.getOrElse(value, 0))) } - buffer += "QUAL" -> Map(record.getPhredScaledQual -> 1) + buffer += "QUAL" -> Map(Math.round(record.getPhredScaledQual) -> 1) addToBuffer("SampleDistribution-Het", record.getGenotypes.count(genotype => genotype.isHet), found = true) addToBuffer("SampleDistribution-HetNonRef", record.getGenotypes.count(genotype => genotype.isHetNonRef), found = true) @@ -546,7 +472,7 @@ object VcfStats extends ToolCommand { } /** Function to check sample/genotype specific stats */ - protected def checkGenotype(record: VariantContext, genotype: Genotype, additionalTags: List[String]): Map[String, Map[String, Map[Any, Int]]] = { + protected[tools] def checkGenotype(record: VariantContext, genotype: Genotype, additionalTags: List[String]): Map[String, Map[String, Map[Any, Int]]] = { val buffer = mutable.Map[String, Map[Any, Int]]() def addToBuffer(key: String, value: Any, found: Boolean): Unit = { @@ -699,6 +625,7 @@ object VcfStats extends ToolCommand { def executeRscript(resource: String, args: Array[String]): Unit = { val is = getClass.getResourceAsStream(resource) val file = File.createTempFile("script.", "." + resource) + file.deleteOnExit() val os = new FileOutputStream(file) org.apache.commons.io.IOUtils.copy(is, os) os.close() diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfToTsv.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfToTsv.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfToTsv.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfToTsv.scala index 995efcdd2925b8e739646e52a9e6a3e21a90813a..20a15dacacba466b41235f32e77062179ad5f0a4 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfToTsv.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfToTsv.scala @@ -19,16 +19,13 @@ import java.io.{ File, PrintStream } import java.text.DecimalFormat import htsjdk.variant.vcf.VCFFileReader -import nl.lumc.sasc.biopet.core.ToolCommand +import nl.lumc.sasc.biopet.utils.ToolCommand import scala.collection.JavaConversions._ import scala.collection.mutable import scala.collection.mutable.{ ListBuffer, Map } -class VcfToTsv { - // TODO: Queue wrapper -} - +// TODO: Queue wrapper object VcfToTsv extends ToolCommand { case class Args(inputFile: File = null, outputFile: File = null, fields: List[String] = Nil, infoFields: List[String] = Nil, sampleFields: List[String] = Nil, disableDefaults: Boolean = false, diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfWithVcf.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfWithVcf.scala similarity index 56% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfWithVcf.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfWithVcf.scala index 665bd2d34546f87483cf03b5846388a0c38f7969..2b764ec9c2c7d75b12dc0248eddca338957e2ee7 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VcfWithVcf.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VcfWithVcf.scala @@ -16,52 +16,16 @@ package nl.lumc.sasc.biopet.tools import java.io.File +import java.util -import htsjdk.variant.variantcontext.VariantContextBuilder +import htsjdk.variant.variantcontext.{ VariantContext, VariantContextBuilder } import htsjdk.variant.variantcontext.writer.{ AsyncVariantContextWriter, VariantContextWriterBuilder } import htsjdk.variant.vcf._ -import nl.lumc.sasc.biopet.core.{ ToolCommandFuntion, ToolCommand } -import nl.lumc.sasc.biopet.core.config.Configurable -import org.broadinstitute.gatk.utils.commandline.{ Output, Input } +import nl.lumc.sasc.biopet.utils.ToolCommand +import nl.lumc.sasc.biopet.utils.VcfUtils.scalaListToJavaObjectArrayList import scala.collection.JavaConversions._ -/** - * Biopet extension for tool VcfWithVcf - */ -class VcfWithVcf(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - @Input(doc = "Input vcf file", shortName = "input", required = true) - var input: File = _ - - @Input(doc = "Secondary vcf file", shortName = "secondary", required = true) - var secondaryVcf: File = _ - - @Output(doc = "Output vcf file", shortName = "output", required = true) - var output: File = _ - - @Output(doc = "Output vcf file index", shortName = "output", required = true) - private var outputIndex: File = _ - - var fields: List[(String, String, Option[String])] = List() - - override def defaultCoreMemory = 2.0 - - override def beforeGraph() { - super.beforeGraph() - if (output.getName.endsWith(".gz")) outputIndex = new File(output.getAbsolutePath + ".tbi") - if (output.getName.endsWith(".vcf")) outputIndex = new File(output.getAbsolutePath + ".idx") - if (fields.isEmpty) throw new IllegalArgumentException("No fields found for VcfWithVcf") - } - - override def commandLine = super.commandLine + - required("-I", input) + - required("-o", output) + - required("-s", secondaryVcf) + - repeat("-f", fields.map(x => x._1 + ":" + x._2 + ":" + x._3.getOrElse("none"))) -} - /** * This is a tool to annotate a vcf file with info value from a other vcf file * @@ -99,7 +63,7 @@ object VcfWithVcf extends ToolCommand { | By default we will return all values found for a given field. | With <method> the values will processed after getting it from the secondary VCF file, posible methods are: | - max : takes maximum of found value, only works for numeric (integer/float) fields - | - min : takes minemal of found value, only works for numeric (integer/float) fields + | - min : takes minimum of found value, only works for numeric (integer/float) fields | - unique: takes only unique values """.stripMargin opt[Boolean]("match") valueName "<Boolean>" maxOccurs 1 action { (x, c) => c.copy(matchAllele = x) @@ -124,7 +88,7 @@ object VcfWithVcf extends ToolCommand { for (x <- commandArgs.fields) { if (header.hasInfoLine(x.outputField)) - throw new IllegalArgumentException("Field '" + x.outputField + "' already exist in input vcf") + throw new IllegalArgumentException("Field '" + x.outputField + "' already exists in input vcf") if (!secondHeader.hasInfoLine(x.inputField)) throw new IllegalArgumentException("Field '" + x.inputField + "' does not exist in secondary vcf") @@ -140,44 +104,11 @@ object VcfWithVcf extends ToolCommand { var counter = 0 for (record <- reader) { - val secondaryRecords = if (commandArgs.matchAllele) { - secondaryReader.query(record.getContig, record.getStart, record.getEnd).toList. - filter(x => record.getAlternateAlleles.exists(x.hasAlternateAllele)) - } else { - secondaryReader.query(record.getContig, record.getStart, record.getEnd).toList - } + val secondaryRecords = getSecondaryRecords(secondaryReader, record, commandArgs.matchAllele) - val fieldMap = (for ( - f <- commandArgs.fields if secondaryRecords.exists(_.hasAttribute(f.inputField)) - ) yield { - f.outputField -> (for ( - secondRecord <- secondaryRecords if secondRecord.hasAttribute(f.inputField) - ) yield { - secondRecord.getAttribute(f.inputField) match { - case l: List[_] => l - case x => List(x) - } - }).fold(Nil)(_ ::: _) - }).toMap - - writer.add(fieldMap.foldLeft(new VariantContextBuilder(record))((builder, attribute) => { - builder.attribute(attribute._1, commandArgs.fields.filter(_.outputField == attribute._1).head.fieldMethod match { - case FieldMethod.max => - header.getInfoHeaderLine(attribute._1).getType match { - case VCFHeaderLineType.Integer => Array(attribute._2.map(_.toString.toInt).max) - case VCFHeaderLineType.Float => Array(attribute._2.map(_.toString.toFloat).max) - case _ => throw new IllegalArgumentException("Type of field " + attribute._1 + " is not numeric") - } - case FieldMethod.min => - header.getInfoHeaderLine(attribute._1).getType match { - case VCFHeaderLineType.Integer => Array(attribute._2.map(_.toString.toInt).min) - case VCFHeaderLineType.Float => Array(attribute._2.map(_.toString.toFloat).min) - case _ => throw new IllegalArgumentException("Type of field " + attribute._1 + " is not numeric") - } - case FieldMethod.unique => attribute._2.distinct.toArray - case _ => attribute._2.toArray - }) - }).make()) + val fieldMap = createFieldMap(commandArgs.fields, secondaryRecords) + + writer.add(createRecord(fieldMap, record, commandArgs.fields, header)) counter += 1 if (counter % 100000 == 0) { @@ -192,4 +123,69 @@ object VcfWithVcf extends ToolCommand { secondaryReader.close() logger.info("Done") } + + /** + * Create Map of field -> List of attributes in secondary records + * @param fields List of Field + * @param secondaryRecords List of VariantContext with secondary records + * @return Map of fields and their values in secondary records + */ + def createFieldMap(fields: List[Fields], secondaryRecords: List[VariantContext]): Map[String, List[Any]] = { + val fieldMap = (for ( + f <- fields if secondaryRecords.exists(_.hasAttribute(f.inputField)) + ) yield { + f.outputField -> (for ( + secondRecord <- secondaryRecords if secondRecord.hasAttribute(f.inputField) + ) yield { + secondRecord.getAttribute(f.inputField) match { + case l: List[_] => l + case y: util.ArrayList[_] => y.toList + case x => List(x) + } + }).fold(Nil)(_ ::: _) + }).toMap + fieldMap + } + + /** + * Get secondary records matching the query record + * @param secondaryReader reader for secondary records + * @param record query record + * @param matchAllele allele has to match query allele? + * @return List of VariantContext + */ + def getSecondaryRecords(secondaryReader: VCFFileReader, + record: VariantContext, matchAllele: Boolean): List[VariantContext] = { + if (matchAllele) { + secondaryReader.query(record.getContig, record.getStart, record.getEnd).toList. + filter(x => record.getAlternateAlleles.exists(x.hasAlternateAllele)) + } else { + secondaryReader.query(record.getContig, record.getStart, record.getEnd).toList + } + } + + def createRecord(fieldMap: Map[String, List[Any]], record: VariantContext, + fields: List[Fields], header: VCFHeader): VariantContext = { + fieldMap.foldLeft(new VariantContextBuilder(record))((builder, attribute) => { + builder.attribute(attribute._1, fields.filter(_.outputField == attribute._1).head.fieldMethod match { + case FieldMethod.max => + header.getInfoHeaderLine(attribute._1).getType match { + case VCFHeaderLineType.Integer => scalaListToJavaObjectArrayList(List(attribute._2.map(_.toString.toInt).max)) + case VCFHeaderLineType.Float => scalaListToJavaObjectArrayList(List(attribute._2.map(_.toString.toFloat).max)) + case _ => throw new IllegalArgumentException("Type of field " + attribute._1 + " is not numeric") + } + case FieldMethod.min => + header.getInfoHeaderLine(attribute._1).getType match { + case VCFHeaderLineType.Integer => scalaListToJavaObjectArrayList(List(attribute._2.map(_.toString.toInt).min)) + case VCFHeaderLineType.Float => scalaListToJavaObjectArrayList(List(attribute._2.map(_.toString.toFloat).min)) + case _ => throw new IllegalArgumentException("Type of field " + attribute._1 + " is not numeric") + } + case FieldMethod.unique => scalaListToJavaObjectArrayList(attribute._2.distinct) + case _ => { + print(attribute._2.getClass.toString) + scalaListToJavaObjectArrayList(attribute._2) + } + }) + }).make() + } } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VepNormalizer.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VepNormalizer.scala similarity index 91% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VepNormalizer.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VepNormalizer.scala index 8543b7706670ef40d9a877f27a0bf1b1ed3d77a9..f9f0fe472686589f47c52b04d2c3e97a181a026e 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/VepNormalizer.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/VepNormalizer.scala @@ -21,9 +21,7 @@ import htsjdk.tribble.TribbleException import htsjdk.variant.variantcontext.writer.{ AsyncVariantContextWriter, VariantContextWriterBuilder } import htsjdk.variant.variantcontext.{ VariantContext, VariantContextBuilder } import htsjdk.variant.vcf._ -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.ToolCommand import scala.collection.JavaConversions._ import scala.collection.mutable.{ Map => MMap } @@ -37,28 +35,6 @@ import scala.collection.mutable.{ Map => MMap } * 2) standard - parse as a standard VCF, where multiple transcripts occur in the same line * Created by ahbbollen on 10/27/14. */ - -class VepNormalizer(val root: Configurable) extends ToolCommandFuntion { - javaMainClass = getClass.getName - - @Input(doc = "Input VCF, may be indexed", shortName = "InputFile", required = true) - var inputVCF: File = null - - @Output(doc = "Output VCF", shortName = "OutputFile", required = true) - var outputVcf: File = null - - var mode: String = config("mode", default = "explode") - var doNotRemove: Boolean = config("donotremove", default = false) - - override def defaultCoreMemory = 1.0 - - override def commandLine = super.commandLine + - required("-I", inputVCF) + - required("-O", outputVcf) + - required("-m", mode) + - conditional(doNotRemove, "--do-not-remove") -} - object VepNormalizer extends ToolCommand { def main(args: Array[String]): Unit = { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/WipeReads.scala b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/WipeReads.scala similarity index 92% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/WipeReads.scala rename to public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/WipeReads.scala index 96a9dc43631e832d95f0bc1f049ca7d5c0396675..082ad2626b1667d4438857b64f6579cfc725f4af 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/tools/WipeReads.scala +++ b/public/biopet-tools/src/main/scala/nl/lumc/sasc/biopet/tools/WipeReads.scala @@ -17,43 +17,17 @@ package nl.lumc.sasc.biopet.tools import java.io.File -import com.google.common.hash.{ BloomFilter, Funnel, PrimitiveSink } +import com.google.common.hash.{ PrimitiveSink, Funnel, BloomFilter } import htsjdk.samtools.{ QueryInterval, SAMFileWriter, SAMFileWriterFactory, SAMRecord, SamReader, SamReaderFactory, ValidationStringency } import htsjdk.samtools.util.{ Interval, IntervalTreeMap } -import htsjdk.tribble.AbstractFeatureReader.getFeatureReader -import htsjdk.tribble.bed.BEDCodec -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.core.{ ToolCommand, ToolCommandFuntion } +import nl.lumc.sasc.biopet.utils.ToolCommand +import nl.lumc.sasc.biopet.utils.intervals.BedRecordList import org.apache.commons.io.FilenameUtils.getExtension -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } import scala.collection.JavaConverters._ import scala.io.Source import scala.math.{ max, min } -// TODO: finish implementation for usage in pipelines -/** - * WipeReads function class for usage in Biopet pipelines - * - * @param root Configuration object for the pipeline - */ -class WipeReads(val root: Configurable) extends ToolCommandFuntion { - - javaMainClass = getClass.getName - - @Input(doc = "Input BAM file (must be indexed)", shortName = "I", required = true) - var inputBam: File = null - - @Input(doc = "Interval file", shortName = "r", required = true) - var intervalFile: File = null - - @Output(doc = "Output BAM", shortName = "o", required = true) - var outputBam: File = null - - @Output(doc = "BAM containing discarded reads", shortName = "f", required = false) - var discardedBam: File = null -} - object WipeReads extends ToolCommand { /** Creates a SamReader object from an input BAM file, ensuring it is indexed */ @@ -84,10 +58,7 @@ object WipeReads extends ToolCommand { logger.info("Parsing interval file ...") /** Function to create iterator from BED file */ - def makeIntervalFromBed(inFile: File): Iterator[Interval] = - asScalaIteratorConverter(getFeatureReader(inFile.toPath.toString, new BEDCodec(), false).iterator) - .asScala - .map(x => new Interval(x.getContig, x.getStart, x.getEnd)) + def makeIntervalFromBed(inFile: File) = BedRecordList.fromFile(inFile).sorted.toSamIntervals.toIterator /** * Parses a refFlat file to yield Interval objects diff --git a/public/biopet-framework/src/test/resources/README.txt b/public/biopet-tools/src/test/resources/README.txt similarity index 100% rename from public/biopet-framework/src/test/resources/README.txt rename to public/biopet-tools/src/test/resources/README.txt diff --git a/public/biopet-framework/src/test/resources/VCFv3.vcf b/public/biopet-tools/src/test/resources/VCFv3.vcf similarity index 100% rename from public/biopet-framework/src/test/resources/VCFv3.vcf rename to public/biopet-tools/src/test/resources/VCFv3.vcf diff --git a/public/biopet-framework/src/test/resources/VEP_oneline.vcf b/public/biopet-tools/src/test/resources/VEP_oneline.vcf similarity index 100% rename from public/biopet-framework/src/test/resources/VEP_oneline.vcf rename to public/biopet-tools/src/test/resources/VEP_oneline.vcf diff --git a/public/biopet-framework/src/test/resources/VEP_oneline.vcf.gz b/public/biopet-tools/src/test/resources/VEP_oneline.vcf.gz similarity index 100% rename from public/biopet-framework/src/test/resources/VEP_oneline.vcf.gz rename to public/biopet-tools/src/test/resources/VEP_oneline.vcf.gz diff --git a/public/biopet-framework/src/test/resources/VEP_oneline.vcf.gz.tbi b/public/biopet-tools/src/test/resources/VEP_oneline.vcf.gz.tbi similarity index 100% rename from public/biopet-framework/src/test/resources/VEP_oneline.vcf.gz.tbi rename to public/biopet-tools/src/test/resources/VEP_oneline.vcf.gz.tbi diff --git a/public/biopet-framework/src/test/resources/chrQ.vcf b/public/biopet-tools/src/test/resources/chrQ.vcf similarity index 100% rename from public/biopet-framework/src/test/resources/chrQ.vcf rename to public/biopet-tools/src/test/resources/chrQ.vcf diff --git a/public/biopet-framework/src/test/resources/chrQ.vcf.gz b/public/biopet-tools/src/test/resources/chrQ.vcf.gz similarity index 100% rename from public/biopet-framework/src/test/resources/chrQ.vcf.gz rename to public/biopet-tools/src/test/resources/chrQ.vcf.gz diff --git a/public/biopet-framework/src/test/resources/chrQ.vcf.gz.tbi b/public/biopet-tools/src/test/resources/chrQ.vcf.gz.tbi similarity index 100% rename from public/biopet-framework/src/test/resources/chrQ.vcf.gz.tbi rename to public/biopet-tools/src/test/resources/chrQ.vcf.gz.tbi diff --git a/public/biopet-tools/src/test/resources/chrQ2.vcf b/public/biopet-tools/src/test/resources/chrQ2.vcf new file mode 100644 index 0000000000000000000000000000000000000000..e49f468d7a6d54de23ed5e3d118d45a663c1cb63 --- /dev/null +++ b/public/biopet-tools/src/test/resources/chrQ2.vcf @@ -0,0 +1,85 @@ +##fileformat=VCFv4.1 +##reference=file:///data/DIV5/KG/references/gatk_bundle_2.5/hg19_nohap/ucsc.hg19_nohap.fasta +##UnifiedGenotyperCommandLine=<ID=ApplyRecalibration,Version=3.1-1-g07a4bf8,Date="Sat Jun 14 16:58:07 CEST 2014",Epoch=1402757887567,CommandLineOptions="analysis_type=ApplyRecalibration input_file=[] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/data/DIV5/KG/references/gatk_bundle_2.5/hg19_nohap/ucsc.hg19_nohap.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false input=[(RodBinding name=input source=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/Child_7006504.ug.chrom_merged.vcf)] recal_file=(RodBinding name=recal_file source=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/recalibration_ug/Child_7006504.snp.recal) tranches_file=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/recalibration_ug/Child_7006504.snp.tranches out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub ts_filter_level=99.0 lodCutoff=null ignore_filter=null excludeFiltered=false mode=SNP filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false"> +##UnifiedGenotyperCommandLine=<ID=ApplyRecalibration,Version=3.1-1-g07a4bf8,Date="Sat Jun 14 17:01:08 CEST 2014",Epoch=1402758068552,CommandLineOptions="analysis_type=ApplyRecalibration input_file=[] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/data/DIV5/KG/references/gatk_bundle_2.5/hg19_nohap/ucsc.hg19_nohap.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false input=[(RodBinding name=input source=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/recalibration_ug/Child_7006504.snp.recalibrated.vcf)] recal_file=(RodBinding name=recal_file source=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/recalibration_ug/Child_7006504.indel.recal) tranches_file=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/recalibration_ug/Child_7006504.indel.tranches out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub ts_filter_level=99.0 lodCutoff=null ignore_filter=null excludeFiltered=false mode=INDEL filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false"> +##UnifiedGenotyperCommandLine=<ID=UnifiedGenotyper,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/bams/Child_7006504.ready.bam, /data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/bams/Mother_7006508.ready.bam, /data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/bams/Father_7006506.ready.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=[chrM] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/data/DIV5/KG/references/gatk_bundle_2.5/hg19_nohap/ucsc.hg19_nohap.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false genotype_likelihoods_model=BOTH pcr_error_rate=1.0E-4 computeSLOD=false annotateNDA=false pair_hmm_implementation=LOGLESS_CACHING min_base_quality_score=17 max_deletion_fraction=0.05 allSitePLs=false min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indelGapContinuationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 indelDebug=false ignoreSNPAlleles=false allReadsSP=false ignoreLaneInfo=false reference_sample_calls=(RodBinding name= source=UNBOUND) reference_sample_name=null sample_ploidy=2 min_quality_score=1 max_quality_score=40 site_quality_prior=20 min_power_threshold_for_calling=0.95 min_reference_depth=100 exclude_filtered_reference_sites=false output_mode=EMIT_VARIANTS_ONLY heterozygosity=0.001 indel_heterozygosity=1.25E-4 genotyping_mode=DISCOVERY standard_min_confidence_threshold_for_calling=20.0 standard_min_confidence_threshold_for_emitting=20.0 alleles=(RodBinding name= source=UNBOUND) max_alternate_alleles=6 input_prior=[] contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=EXACT_INDEPENDENT exactcallslog=null dbsnp=(RodBinding name=dbsnp source=/data/DIV5/KG/references/gatk_bundle_2.5/hg19_nohap/dbsnp_137.hg19_nohap.vcf) comp=[] out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub onlyEmitSamples=[] debug_file=null metrics_file=null annotation=[] excludeAnnotation=[] filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false",Date="Sat Jun 14 15:20:24 CEST 2014",Epoch=1402752024377,Version=3.1-1-g07a4bf8> +##HaplotypeCallerCommandLine=<ID=ApplyRecalibration,Version=3.1-1-g07a4bf8,Date="Sat Jun 14 22:28:02 CEST 2014",Epoch=1402777682364,CommandLineOptions="analysis_type=ApplyRecalibration input_file=[] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/data/DIV5/KG/references/gatk_bundle_2.5/hg19_nohap/ucsc.hg19_nohap.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false input=[(RodBinding name=input source=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/Child_7006504.hc.chrom_merged.vcf)] recal_file=(RodBinding name=recal_file source=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/recalibration_hc/Child_7006504.snp.recal) tranches_file=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/recalibration_hc/Child_7006504.snp.tranches out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub ts_filter_level=99.0 lodCutoff=null ignore_filter=null excludeFiltered=false mode=SNP filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false"> +##HaplotypeCallerCommandLine=<ID=ApplyRecalibration,Version=3.1-1-g07a4bf8,Date="Sat Jun 14 22:31:13 CEST 2014",Epoch=1402777873043,CommandLineOptions="analysis_type=ApplyRecalibration input_file=[] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/data/DIV5/KG/references/gatk_bundle_2.5/hg19_nohap/ucsc.hg19_nohap.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false input=[(RodBinding name=input source=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/recalibration_hc/Child_7006504.snp.recalibrated.vcf)] recal_file=(RodBinding name=recal_file source=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/recalibration_hc/Child_7006504.indel.recal) tranches_file=/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/recalibration_hc/Child_7006504.indel.tranches out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub ts_filter_level=99.0 lodCutoff=null ignore_filter=null excludeFiltered=false mode=INDEL filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false"> +##HaplotypeCallerCommandLine=<ID=HaplotypeCaller,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[/data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/bams/Child_7006504.ready.bam, /data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/bams/Mother_7006508.ready.bam, /data/DIV5/KG/kg_wes_mr/runs/trio_7006504_run_00/trio_7006504/phase2/bams/Father_7006506.ready.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=[chrM] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/data/DIV5/KG/references/gatk_bundle_2.5/hg19_nohap/ucsc.hg19_nohap.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN graphOutput=null bamOutput=null bam_compression=null disable_bam_indexing=null generate_md5=null simplifyBAM=null bamWriterType=CALLED_HAPLOTYPES dbsnp=(RodBinding name=dbsnp source=/data/DIV5/KG/references/gatk_bundle_2.5/hg19_nohap/dbsnp_137.hg19_nohap.vcf) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[ClippingRankSumTest, DepthPerSampleHC] excludeAnnotation=[SpanningDeletions, TandemRepeatAnnotator] heterozygosity=0.001 indel_heterozygosity=1.25E-4 genotyping_mode=DISCOVERY standard_min_confidence_threshold_for_calling=20.0 standard_min_confidence_threshold_for_emitting=20.0 alleles=(RodBinding name= source=UNBOUND) max_alternate_alleles=6 input_prior=[] contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=EXACT_INDEPENDENT exactcallslog=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false numPruningSamples=1 recoverDanglingHeads=false dontRecoverDanglingTails=false consensus=false emitRefConfidence=NONE GVCFGQBands=[5, 20, 60] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 minPruning=2 gcpHMM=10 includeUmappedReads=false useAllelesTrigger=false useFilteredReadsForAnnotations=false phredScaledGlobalReadMismappingRate=45 maxNumHaplotypesInPopulation=128 mergeVariantsViaLD=false pair_hmm_implementation=LOGLESS_CACHING keepRG=null justDetermineActiveRegions=false dontGenotype=false errorCorrectKmers=false debug=false debugGraphTransformations=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false allowCyclesInKmerGraphToGeneratePaths=false noFpga=false errorCorrectReads=false kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 pcr_indel_model=CONSERVATIVE activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false",Date="Sat Jun 14 15:26:18 CEST 2014",Epoch=1402752378803,Version=3.1-1-g07a4bf8> +##INFO=<ID=DN,Number=1,Type=Integer,Description="inDbSNP"> +##INFO=<ID=DT,Number=0,Type=Flag,Description="in1000Genomes"> +##INFO=<ID=DA,Number=1,Type=String,Description="allelesDBSNP"> +##INFO=<ID=FG,Number=.,Type=String,Description="functionGVS"> +##INFO=<ID=FD,Number=.,Type=String,Description="functionDBSNP"> +##INFO=<ID=GM,Number=.,Type=String,Description="accession"> +##INFO=<ID=GL,Number=.,Type=String,Description="geneList"> +##INFO=<ID=AAC,Number=.,Type=String,Description="aminoAcids"> +##INFO=<ID=PP,Number=.,Type=String,Description="proteinPosition"> +##INFO=<ID=CDP,Number=.,Type=String,Description="cDNAPosition"> +##INFO=<ID=PH,Number=.,Type=String,Description="polyPhen"> +##INFO=<ID=CP,Number=1,Type=String,Description="scorePhastCons"> +##INFO=<ID=CG,Number=1,Type=String,Description="consScoreGERP"> +##INFO=<ID=AA,Number=1,Type=String,Description="chimpAllele"> +##INFO=<ID=CN,Number=.,Type=String,Description="CNV"> +##INFO=<ID=HA,Number=1,Type=String,Description="AfricanHapMapFreq"> +##INFO=<ID=HE,Number=1,Type=String,Description="EuropeanHapMapFreq"> +##INFO=<ID=HC,Number=1,Type=String,Description="AsianHapMapFreq"> +##INFO=<ID=DG,Number=0,Type=Flag,Description="hasGenotypes"> +##INFO=<ID=DV,Number=.,Type=String,Description="dbSNPValidation"> +##INFO=<ID=RM,Number=.,Type=String,Description="repeatMasker"> +##INFO=<ID=RT,Number=.,Type=String,Description="tandemRepeat"> +##INFO=<ID=CA,Number=0,Type=Flag,Description="clinicalAssociation"> +##INFO=<ID=DSP,Number=1,Type=Integer,Description="distanceToSplice"> +##INFO=<ID=GS,Number=.,Type=String,Description="granthamScore"> +##INFO=<ID=MR,Number=.,Type=String,Description="microRNAs"> +##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed"> +##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed"> +##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes"> +##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities"> +##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership"> +##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered"> +##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?"> +##INFO=<ID=Dels,Number=1,Type=Float,Description="Fraction of Reads Containing Spanning Deletions"> +##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval"> +##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias"> +##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes"> +##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation"> +##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed"> +##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed"> +##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality"> +##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads"> +##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities"> +##INFO=<ID=NEGATIVE_TRAIN_SITE,Number=0,Type=Flag,Description="This variant was used to build the negative training set of bad variants"> +##INFO=<ID=POSITIVE_TRAIN_SITE,Number=0,Type=Flag,Description="This variant was used to build the positive training set of good variants"> +##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth"> +##INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)"> +##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)"> +##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias"> +##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat"> +##INFO=<ID=VQSLOD,Number=1,Type=Float,Description="Log odds ratio of being a true variant versus being false under the trained gaussian mixture model"> +##INFO=<ID=culprit,Number=1,Type=String,Description="The annotation which was the worst performing in the Gaussian mixture model, likely the reason why the variant was filtered out"> +##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases"> +##INFO=<ID=GATKCaller,Number=.,Type=String,Description="GATK variant caller used to call the variant"> +##INFO=<ID=PartOfCompound,Number=.,Type=String,Description="Whether the record was originally part of a record containing compound variants"> +##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed"> +##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)"> +##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> +##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> +##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification"> +##FILTER=<ID=LowQual,Description="Low quality"> +##FILTER=<ID=VQSRTrancheINDEL99.00to99.90,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: -1.4714 <= x < -0.3324"> +##FILTER=<ID=VQSRTrancheINDEL99.90to100.00+,Description="Truth sensitivity tranche level for INDEL model at VQS Lod < -6.093"> +##FILTER=<ID=VQSRTrancheINDEL99.90to100.00,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: -6.093 <= x < -1.4714"> +##FILTER=<ID=VQSRTrancheSNP99.00to99.90,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -4.8126 <= x < 0.2264"> +##FILTER=<ID=VQSRTrancheSNP99.90to100.00+,Description="Truth sensitivity tranche level for SNP model at VQS Lod < -39474.9285"> +##FILTER=<ID=VQSRTrancheSNP99.90to100.00,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -39474.9285 <= x < -4.8126"> +##FILTER=<ID=TooHigh1000GAF,Description="Allele frequency in 1000G is more than 5%"> +##FILTER=<ID=TooHighGoNLAF,Description="Allele frequency in 1000G is more than 5%"> +##FILTER=<ID=IndexNotCalled,Description="Position in index sample is not called"> +##FILTER=<ID=IndexIsVariant,Description="Index call is a variant"> +##FILTER=<ID=InArtificialChrom,Description="Variant found in an artificial chromosome"> +##FILTER=<ID=IsIntergenic,Description="Variant found in intergenic region"> +##contig=<ID=chrQ,length=16571> +##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as predicted by VEP. Format: Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|AA_MAF|EA_MAF|ALLELE_NUM|DISTANCE|STRAND|CLIN_SIG|SYMBOL|SYMBOL_SOURCE|GMAF|HGVSc|HGVSp|AFR_MAF|AMR_MAF|ASN_MAF|EUR_MAF|PUBMED"> +#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Child_7006504 Father_7006506 Mother_7006508 +chrQ 50 rs199537431 T A 1541.12 PASS FG=intron;FD=unknown;GM=NM_152486.2;GL=SAMD11;CP=0.000;CG=-1.630;CN=2294,3274,30362,112930;DSP=107;AC=2;AF=0.333;AN=6;BaseQRankSum=4.068;DB;DP=124;FS=1.322;MLEAC=2;MLEAF=0.333;MQ=60.0;MQ0=0;MQRankSum=-0.197;QD=19.03;RPA=1,2;RU=A;ReadPosRankSum=-0.424;STR;VQSLOD=0.079;culprit=FS;GATKCaller=UG,HC;CSQ=A|ENSESTG00000013623|ENSESTT00000034081|Transcript|intron_variant&feature_elongation||||||rs199537431|||1||1||||A:0.0078|ENSESTT00000034081.1:c.306-110_306-109insA||||||,A|CCDS2.2|CCDS2.2|Transcript|intron_variant&feature_elongation||||||rs199537431|||1||1||||A:0.0078|CCDS2.2:c.306-110_306-109insA||||||,A|ENSESTG00000013623|ENSESTT00000034116|Transcript|upstream_gene_variant||||||rs199537431|||1|3610|1||||A:0.0078|||||||,A|ENSESTG00000013623|ENSESTT00000034091|Transcript|intron_variant&feature_elongation||||||rs199537431|||1||1||||A:0.0078|ENSESTT00000034091.1:c.306-110_306-109insA||||||,A|ENSESTG00000013623|ENSESTT00000034102|Transcript|intron_variant&feature_elongation||||||rs199537431|||1||1||||A:0.0078|ENSESTT00000034102.1:c.29-110_29-109insA||||||,A|148398|XM_005244723.1|Transcript|intron_variant&feature_elongation||||||rs199537431|||1||1||SAMD11||A:0.0078|XM_005244723.1:c.306-110_306-109insA||||||,A|148398|XM_005244724.1|Transcript|intron_variant&feature_elongation||||||rs199537431|||1||1||SAMD11||A:0.0078|XM_005244724.1:c.306-110_306-109insA||||||,A|148398|XM_005244725.1|Transcript|intron_variant&feature_elongation||||||rs199537431|||1||1||SAMD11||A:0.0078|XM_005244725.1:c.306-110_306-109insA||||||,A|148398|NM_152486.2|Transcript|intron_variant&feature_elongation||||||rs199537431|||1||1||SAMD11||A:0.0078|NM_152486.2:c.306-110_306-109insA||||||,A|148398|XM_005244727.1|Transcript|intron_variant&feature_elongation||||||rs199537431|||1||1||SAMD11||A:0.0078|XM_005244727.1:c.306-110_306-109insA||||||,A|148398|XM_005244726.1|Transcript|intron_variant&feature_elongation||||||rs199537431|||1||1||SAMD11||A:0.0078|XM_005244726.1:c.306-110_306-109insA|||||| GT:AD:DP:GQ:PL 0/1:24,21:45:99:838,0,889 0/1:17,19:36:99:744,0,603 0/0:42,0:43:99:0,126,1717 diff --git a/public/biopet-tools/src/test/resources/chrQ2.vcf.gz b/public/biopet-tools/src/test/resources/chrQ2.vcf.gz new file mode 100644 index 0000000000000000000000000000000000000000..22435b2c513dc40a2f9632f1970395188292aa67 Binary files /dev/null and b/public/biopet-tools/src/test/resources/chrQ2.vcf.gz differ diff --git a/public/biopet-tools/src/test/resources/chrQ2.vcf.gz.tbi b/public/biopet-tools/src/test/resources/chrQ2.vcf.gz.tbi new file mode 100644 index 0000000000000000000000000000000000000000..d376218edbf3aeb9bcbf9a16275c36a6005c57b2 Binary files /dev/null and b/public/biopet-tools/src/test/resources/chrQ2.vcf.gz.tbi differ diff --git a/public/biopet-tools/src/test/resources/chrQ_allN.fa b/public/biopet-tools/src/test/resources/chrQ_allN.fa new file mode 100644 index 0000000000000000000000000000000000000000..f2f89ba9c8b9bda54f666e0894e2234856aefc1b --- /dev/null +++ b/public/biopet-tools/src/test/resources/chrQ_allN.fa @@ -0,0 +1,2 @@ +>chrQ +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN diff --git a/public/biopet-framework/src/test/resources/fake_chrQ.fa.fai b/public/biopet-tools/src/test/resources/chrQ_allN.fa.fai similarity index 100% rename from public/biopet-framework/src/test/resources/fake_chrQ.fa.fai rename to public/biopet-tools/src/test/resources/chrQ_allN.fa.fai diff --git a/public/biopet-framework/src/test/resources/fake_chrQ.dict b/public/biopet-tools/src/test/resources/fake_chrQ.dict similarity index 100% rename from public/biopet-framework/src/test/resources/fake_chrQ.dict rename to public/biopet-tools/src/test/resources/fake_chrQ.dict diff --git a/public/biopet-framework/src/test/resources/fake_chrQ.fa b/public/biopet-tools/src/test/resources/fake_chrQ.fa similarity index 100% rename from public/biopet-framework/src/test/resources/fake_chrQ.fa rename to public/biopet-tools/src/test/resources/fake_chrQ.fa diff --git a/public/biopet-tools/src/test/resources/fake_chrQ.fa.fai b/public/biopet-tools/src/test/resources/fake_chrQ.fa.fai new file mode 100644 index 0000000000000000000000000000000000000000..b7a558fdb3b3c0e85f6e3c634cc3ae80c601336d --- /dev/null +++ b/public/biopet-tools/src/test/resources/fake_chrQ.fa.fai @@ -0,0 +1 @@ +chrQ 16571 6 16571 16572 diff --git a/public/biopet-tools/src/test/resources/flagstat_crossreport.txt b/public/biopet-tools/src/test/resources/flagstat_crossreport.txt new file mode 100644 index 0000000000000000000000000000000000000000..74eabb4125ad9351ff4691a7e5cf0fa68282249c --- /dev/null +++ b/public/biopet-tools/src/test/resources/flagstat_crossreport.txt @@ -0,0 +1,15 @@ + #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 +#1 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#2 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#4 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#8 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#9 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#10 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 diff --git a/public/biopet-tools/src/test/resources/flagstat_crosstrue.txt b/public/biopet-tools/src/test/resources/flagstat_crosstrue.txt new file mode 100644 index 0000000000000000000000000000000000000000..dd05aed41c50b81790742957b210b21840918826 --- /dev/null +++ b/public/biopet-tools/src/test/resources/flagstat_crosstrue.txt @@ -0,0 +1,15 @@ + #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 +#1 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#2 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#3 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#4 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#5 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#6 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#7 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#8 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#9 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#10 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#11 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#12 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#13 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#14 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% diff --git a/public/biopet-tools/src/test/resources/flagstat_report.txt b/public/biopet-tools/src/test/resources/flagstat_report.txt new file mode 100644 index 0000000000000000000000000000000000000000..acbe332a82a9d7c04c71e5c9ae74275b3c75b0cc --- /dev/null +++ b/public/biopet-tools/src/test/resources/flagstat_report.txt @@ -0,0 +1,48 @@ +Number Total Flags Fraction Name +#1 1 100.0000% All +#2 1 100.0000% Mapped +#3 0 0.0000% Duplicates +#4 1 100.0000% FirstOfPair +#5 0 0.0000% SecondOfPair +#6 0 0.0000% ReadNegativeStrand +#7 0 0.0000% NotPrimaryAlignment +#8 1 100.0000% ReadPaired +#9 1 100.0000% ProperPair +#10 1 100.0000% MateNegativeStrand +#11 0 0.0000% MateUnmapped +#12 0 0.0000% ReadFailsVendorQualityCheck +#13 0 0.0000% SupplementaryAlignment +#14 0 0.0000% SecondaryOrSupplementary + + #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 +#1 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#2 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#4 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#8 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#9 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#10 1 1 0 1 0 0 0 1 1 1 0 0 0 0 +#11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +#14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + + #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 +#1 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#2 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#3 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#4 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#5 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#6 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#7 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#8 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#9 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#10 100.0000% 100.0000% 0.0000% 100.0000% 0.0000% 0.0000% 0.0000% 100.0000% 100.0000% 100.0000% 0.0000% 0.0000% 0.0000% 0.0000% +#11 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#12 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#13 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% +#14 NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% NaN% + diff --git a/public/biopet-tools/src/test/resources/flagstat_summary.txt b/public/biopet-tools/src/test/resources/flagstat_summary.txt new file mode 100644 index 0000000000000000000000000000000000000000..e7984915cab425c8bd94dd0a22eae60e0407f518 --- /dev/null +++ b/public/biopet-tools/src/test/resources/flagstat_summary.txt @@ -0,0 +1,16 @@ +{ + "Duplicates" : 0, + "NotPrimaryAlignment" : 0, + "All" : 1, + "ReadNegativeStrand" : 0, + "ProperPair" : 1, + "MateUnmapped" : 0, + "ReadFailsVendorQualityCheck" : 0, + "Mapped" : 1, + "SupplementaryAlignment" : 0, + "MateNegativeStrand" : 1, + "FirstOfPair" : 1, + "ReadPaired" : 1, + "SecondaryOrSupplementary" : 0, + "SecondOfPair" : 0 +} \ No newline at end of file diff --git a/public/biopet-tools/src/test/resources/log4j.properties b/public/biopet-tools/src/test/resources/log4j.properties new file mode 100644 index 0000000000000000000000000000000000000000..501af67582a546db584c8538b28cb6f9e07f1692 --- /dev/null +++ b/public/biopet-tools/src/test/resources/log4j.properties @@ -0,0 +1,25 @@ +# +# Biopet is built on top of GATK Queue for building bioinformatic +# pipelines. It is mainly intended to support LUMC SHARK cluster which is running +# SGE. But other types of HPC that are supported by GATK Queue (such as PBS) +# should also be able to execute Biopet tools and pipelines. +# +# Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center +# +# Contact us at: sasc@lumc.nl +# +# A dual licensing mode is applied. The source code within this project that are +# not part of GATK Queue is freely available for non-commercial use under an AGPL +# license; For commercial users or users who do not want to follow the AGPL +# license, please contact us to obtain a separate license. +# + +# Set root logger level to DEBUG and its only appender to A1. +log4j.rootLogger=ERROR, A1 + +# A1 is set to be a ConsoleAppender. +log4j.appender.A1=org.apache.log4j.ConsoleAppender + +# A1 uses PatternLayout. +log4j.appender.A1.layout=org.apache.log4j.PatternLayout +log4j.appender.A1.layout.ConversionPattern=%-5p [%d] [%C{1}] - %m%n \ No newline at end of file diff --git a/public/biopet-tools/src/test/resources/mini.transcriptome.fa b/public/biopet-tools/src/test/resources/mini.transcriptome.fa new file mode 100644 index 0000000000000000000000000000000000000000..d86c34faa29af176b6dd1a5d098c16d4e618039f --- /dev/null +++ b/public/biopet-tools/src/test/resources/mini.transcriptome.fa @@ -0,0 +1,17 @@ +>ENST00000529862 havana:known chromosome:GRCh38:11:105194440:105194946:-1 gene:ENSG00000254767 gene_biotype:unprocessed_pseudogene transcript_biotype:unprocessed_pseudogene +ATGAATAATAATGGGAAATATCAACATAAGTCTTGAAAATTACTTTATTCTACTGGGTCT +TTCTAATTGACCTCCTCTGGAAATAGTTATTTTTGTAGTTCTCTTGATATTCTGCTTCAT +GACACTGATAGGCAAGCTGTTCAGCATCATTCTGTCATACCTGGACTCCCATCCCCACAC +TCTCGGTACTTATTCTCTTTTCTGGATTTCTGCTACACCATCAGTTCCATCTTTTAATTA +CAGTACAATCTCTGGGGCCCACAGAAGAACATCTCTTATGCCAGTGGTATGATTCAAATT +TATTTTGTTCTCACACTGGGAACCATGGATTGCGCTCTACTGGTGGTGATGTCCAGGACT +GTGATGCAGCTGGACACAGACACTTGCCTTATACTGTTGTTATGGCTGTGGCTTTTTGGG +TAAGTAGCTTTACCAACTCAGCATTTGATTCCTTTTTTACCTTCTGGGTAACCCTGTGTG +GACATCACTATTATGCTTACATCTTTA +>ENST00000528941 havana:known chromosome:GRCh38:11:105246880:105247060:-1 gene:ENSG00000255336 gene_biotype:unprocessed_pseudogene transcript_biotype:unprocessed_pseudogene +TATTCATAATTAAAGTCATACTTCAGCAAGCTGGCTTTAAATATACAACATATAATTCTT +TTAAATCAGACTCTCTGAATCCATGACCGCCATGTCTTCATGAAGCTGTCCTTCCTCAAT +CCCCATCTGTTTTAAGGGTTCCACCCATGTTCTTCCTTAGCACCCTGAGTATTTACTCTA +T +>ENST99999999999 havana:known chromosome:GRCh38:11:105246880:105247060:-1 gene:ENSG99999999999 gene_biotype:unprocessed_pseudogene transcript_biotype:unprocessed_pseudogene +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA diff --git a/public/biopet-tools/src/test/resources/no_sample.tsv b/public/biopet-tools/src/test/resources/no_sample.tsv new file mode 100644 index 0000000000000000000000000000000000000000..81ccc2453208183e7fa8a8a2b0748f2a4d8f6716 --- /dev/null +++ b/public/biopet-tools/src/test/resources/no_sample.tsv @@ -0,0 +1,3 @@ +library bam +Lib_ID_1 MyFirst.bam +Lib_ID_2 MySecond.bam diff --git a/public/biopet-tools/src/test/resources/number.tsv b/public/biopet-tools/src/test/resources/number.tsv new file mode 100644 index 0000000000000000000000000000000000000000..0a76d53e6db7be2f331f440f72f48003d63ef49f --- /dev/null +++ b/public/biopet-tools/src/test/resources/number.tsv @@ -0,0 +1,3 @@ +sample library bam +1 5 MyFirst.bam +2 6 MySecond.bam diff --git a/public/biopet-framework/src/test/resources/paired01.bam b/public/biopet-tools/src/test/resources/paired01.bam similarity index 100% rename from public/biopet-framework/src/test/resources/paired01.bam rename to public/biopet-tools/src/test/resources/paired01.bam diff --git a/public/biopet-framework/src/test/resources/paired01.bam.bai b/public/biopet-tools/src/test/resources/paired01.bam.bai similarity index 100% rename from public/biopet-framework/src/test/resources/paired01.bam.bai rename to public/biopet-tools/src/test/resources/paired01.bam.bai diff --git a/public/biopet-tools/src/test/resources/paired01.pileup b/public/biopet-tools/src/test/resources/paired01.pileup new file mode 100644 index 0000000000000000000000000000000000000000..559c87b964c35b428bd913b32fd769873e24fd14 --- /dev/null +++ b/public/biopet-tools/src/test/resources/paired01.pileup @@ -0,0 +1,320 @@ +chrQ 50 N 1 ^]T E +chrQ 51 N 1 A E +chrQ 52 N 1 C F +chrQ 53 N 1 G F +chrQ 54 N 1 T G +chrQ 55 N 1 A G +chrQ 56 N 1 C H +chrQ 57 N 1 G H +chrQ 58 N 1 T I +chrQ 59 N 1 A$ I +chrQ 90 N 1 ^]a E +chrQ 91 N 1 t E +chrQ 92 N 1 g F +chrQ 93 N 1 c F +chrQ 94 N 1 a G +chrQ 95 N 1 t G +chrQ 96 N 1 g H +chrQ 97 N 1 c H +chrQ 98 N 1 a I +chrQ 99 N 1 t$ I +chrQ 150 N 1 ^]A G +chrQ 151 N 1 A G +chrQ 152 N 1 A G +chrQ 153 N 1 A G +chrQ 154 N 1 A G +chrQ 155 N 1 G G +chrQ 156 N 1 G G +chrQ 157 N 1 G G +chrQ 158 N 1 G G +chrQ 159 N 1 G$ G +chrQ 190 N 1 ^]g G +chrQ 191 N 1 g G +chrQ 192 N 1 g G +chrQ 193 N 1 g G +chrQ 194 N 1 g G +chrQ 195 N 1 a G +chrQ 196 N 1 a G +chrQ 197 N 1 a G +chrQ 198 N 1 a G +chrQ 199 N 1 a$ G +chrQ 250 N 1 ^]A G +chrQ 251 N 1 A G +chrQ 252 N 1 A G +chrQ 253 N 1 A G +chrQ 254 N 1 A G +chrQ 255 N 1 G G +chrQ 256 N 1 G G +chrQ 257 N 1 G G +chrQ 258 N 1 G G +chrQ 259 N 1 G$ G +chrQ 290 N 1 ^]g G +chrQ 291 N 1 g G +chrQ 292 N 1 g G +chrQ 293 N 1 g G +chrQ 294 N 1 g G +chrQ 295 N 1 a G +chrQ 296 N 1 a G +chrQ 297 N 1 a G +chrQ 298 N 1 a G +chrQ 299 N 1 a$ G +chrQ 450 N 1 ^]C E +chrQ 451 N 1 G E +chrQ 452 N 1 T F +chrQ 453 N 1 A F +chrQ 454 N 1 C G +chrQ 455 N 1 G G +chrQ 456 N 1 T H +chrQ 457 N 1 A H +chrQ 458 N 1 C I +chrQ 459 N 1 G$ I +chrQ 490 N 1 ^]g E +chrQ 491 N 1 c E +chrQ 492 N 1 a F +chrQ 493 N 1 t F +chrQ 494 N 1 g G +chrQ 495 N 1 c G +chrQ 496 N 1 a H +chrQ 497 N 1 t H +chrQ 498 N 1 g I +chrQ 499 N 1 c$ I +chrQ 650 N 1 ^]T H +chrQ 651 N 1 T H +chrQ 652 N 1 T H +chrQ 653 N 1 T H +chrQ 654 N 1 T H +chrQ 655 N 1 C H +chrQ 656 N 1 C H +chrQ 657 N 1 C H +chrQ 658 N 1 C H +chrQ 659 N 1 C$ H +chrQ 690 N 1 ^]c H +chrQ 691 N 1 c H +chrQ 692 N 1 c H +chrQ 693 N 1 c H +chrQ 694 N 1 c H +chrQ 695 N 1 t H +chrQ 696 N 1 t H +chrQ 697 N 1 t H +chrQ 698 N 1 t H +chrQ 699 N 1 t$ H +chrQ 890 N 1 ^]T E +chrQ 891 N 1 A E +chrQ 892 N 1 C F +chrQ 893 N 1 G F +chrQ 894 N 1 T G +chrQ 895 N 1 > G +chrQ 896 N 1 > G +chrQ 897 N 1 > G +chrQ 898 N 1 > G +chrQ 899 N 1 > G +chrQ 900 N 1 > G +chrQ 901 N 1 > G +chrQ 902 N 1 > G +chrQ 903 N 1 > G +chrQ 904 N 1 > G +chrQ 905 N 1 > G +chrQ 906 N 1 > G +chrQ 907 N 1 > G +chrQ 908 N 1 > G +chrQ 909 N 1 > G +chrQ 910 N 1 > G +chrQ 911 N 1 > G +chrQ 912 N 1 > G +chrQ 913 N 1 > G +chrQ 914 N 1 > G +chrQ 915 N 1 > G +chrQ 916 N 1 > G +chrQ 917 N 1 > G +chrQ 918 N 1 > G +chrQ 919 N 1 > G +chrQ 920 N 1 > G +chrQ 921 N 1 > G +chrQ 922 N 1 > G +chrQ 923 N 1 > G +chrQ 924 N 1 > G +chrQ 925 N 1 > G +chrQ 926 N 1 > G +chrQ 927 N 1 > G +chrQ 928 N 1 > G +chrQ 929 N 1 > G +chrQ 930 N 1 > G +chrQ 931 N 1 > G +chrQ 932 N 1 > G +chrQ 933 N 1 > G +chrQ 934 N 1 > G +chrQ 935 N 1 > G +chrQ 936 N 1 > G +chrQ 937 N 1 > G +chrQ 938 N 1 > G +chrQ 939 N 1 > G +chrQ 940 N 1 > G +chrQ 941 N 1 > G +chrQ 942 N 1 > G +chrQ 943 N 1 > G +chrQ 944 N 1 > G +chrQ 945 N 1 > G +chrQ 946 N 1 > G +chrQ 947 N 1 > G +chrQ 948 N 1 > G +chrQ 949 N 1 > G +chrQ 950 N 1 > G +chrQ 951 N 1 > G +chrQ 952 N 1 > G +chrQ 953 N 1 > G +chrQ 954 N 1 > G +chrQ 955 N 1 > G +chrQ 956 N 1 > G +chrQ 957 N 1 > G +chrQ 958 N 1 > G +chrQ 959 N 1 > G +chrQ 960 N 1 > G +chrQ 961 N 1 > G +chrQ 962 N 1 > G +chrQ 963 N 1 > G +chrQ 964 N 1 > G +chrQ 965 N 1 > G +chrQ 966 N 1 > G +chrQ 967 N 1 > G +chrQ 968 N 1 > G +chrQ 969 N 1 > G +chrQ 970 N 1 > G +chrQ 971 N 1 > G +chrQ 972 N 1 > G +chrQ 973 N 1 > G +chrQ 974 N 1 > G +chrQ 975 N 1 > G +chrQ 976 N 1 > G +chrQ 977 N 1 > G +chrQ 978 N 1 > G +chrQ 979 N 1 > G +chrQ 980 N 1 > G +chrQ 981 N 1 > G +chrQ 982 N 1 > G +chrQ 983 N 1 > G +chrQ 984 N 1 > G +chrQ 985 N 1 > G +chrQ 986 N 1 > G +chrQ 987 N 1 > G +chrQ 988 N 1 > G +chrQ 989 N 1 > G +chrQ 990 N 1 > G +chrQ 991 N 1 > G +chrQ 992 N 1 > G +chrQ 993 N 1 > G +chrQ 994 N 1 > G +chrQ 995 N 1 > G +chrQ 996 N 1 > G +chrQ 997 N 1 > G +chrQ 998 N 1 > G +chrQ 999 N 1 > G +chrQ 1000 N 1 > G +chrQ 1001 N 1 > G +chrQ 1002 N 1 > G +chrQ 1003 N 1 > G +chrQ 1004 N 1 > G +chrQ 1005 N 1 > G +chrQ 1006 N 1 > G +chrQ 1007 N 1 > G +chrQ 1008 N 1 > G +chrQ 1009 N 1 > G +chrQ 1010 N 1 > G +chrQ 1011 N 1 > G +chrQ 1012 N 1 > G +chrQ 1013 N 1 > G +chrQ 1014 N 1 > G +chrQ 1015 N 1 > G +chrQ 1016 N 1 > G +chrQ 1017 N 1 > G +chrQ 1018 N 1 > G +chrQ 1019 N 1 > G +chrQ 1020 N 1 > G +chrQ 1021 N 1 > G +chrQ 1022 N 1 > G +chrQ 1023 N 1 > G +chrQ 1024 N 1 > G +chrQ 1025 N 1 > G +chrQ 1026 N 1 > G +chrQ 1027 N 1 > G +chrQ 1028 N 1 > G +chrQ 1029 N 1 > G +chrQ 1030 N 1 > G +chrQ 1031 N 1 > G +chrQ 1032 N 1 > G +chrQ 1033 N 1 > G +chrQ 1034 N 1 > G +chrQ 1035 N 1 > G +chrQ 1036 N 1 > G +chrQ 1037 N 1 > G +chrQ 1038 N 1 > G +chrQ 1039 N 1 > G +chrQ 1040 N 1 > G +chrQ 1041 N 1 > G +chrQ 1042 N 1 > G +chrQ 1043 N 1 > G +chrQ 1044 N 1 > G +chrQ 1045 N 1 > G +chrQ 1046 N 1 > G +chrQ 1047 N 1 > G +chrQ 1048 N 1 > G +chrQ 1049 N 1 > G +chrQ 1050 N 1 > G +chrQ 1051 N 1 > G +chrQ 1052 N 1 > G +chrQ 1053 N 1 > G +chrQ 1054 N 1 > G +chrQ 1055 N 1 > G +chrQ 1056 N 1 > G +chrQ 1057 N 1 > G +chrQ 1058 N 1 > G +chrQ 1059 N 1 > G +chrQ 1060 N 1 > G +chrQ 1061 N 1 > G +chrQ 1062 N 1 > G +chrQ 1063 N 1 > G +chrQ 1064 N 1 > G +chrQ 1065 N 1 > G +chrQ 1066 N 1 > G +chrQ 1067 N 1 > G +chrQ 1068 N 1 > G +chrQ 1069 N 1 > G +chrQ 1070 N 1 > G +chrQ 1071 N 1 > G +chrQ 1072 N 1 > G +chrQ 1073 N 1 > G +chrQ 1074 N 1 > G +chrQ 1075 N 1 > G +chrQ 1076 N 1 > G +chrQ 1077 N 1 > G +chrQ 1078 N 1 > G +chrQ 1079 N 1 > G +chrQ 1080 N 1 > G +chrQ 1081 N 1 > G +chrQ 1082 N 1 > G +chrQ 1083 N 1 > G +chrQ 1084 N 1 > G +chrQ 1085 N 1 > G +chrQ 1086 N 1 > G +chrQ 1087 N 1 > G +chrQ 1088 N 1 > G +chrQ 1089 N 1 > G +chrQ 1090 N 1 > G +chrQ 1091 N 1 > G +chrQ 1092 N 1 > G +chrQ 1093 N 1 > G +chrQ 1094 N 1 > G +chrQ 1095 N 1 A G +chrQ 1096 N 1 C H +chrQ 1097 N 1 G H +chrQ 1098 N 1 T I +chrQ 1099 N 1 A$ I +chrQ 1140 N 1 ^]a E +chrQ 1141 N 1 t E +chrQ 1142 N 1 g F +chrQ 1143 N 1 c F +chrQ 1144 N 1 a G +chrQ 1145 N 1 t G +chrQ 1146 N 1 g H +chrQ 1147 N 1 c H +chrQ 1148 N 1 a I +chrQ 1149 N 1 t$ I diff --git a/public/biopet-framework/src/test/resources/paired01.sam b/public/biopet-tools/src/test/resources/paired01.sam similarity index 100% rename from public/biopet-framework/src/test/resources/paired01.sam rename to public/biopet-tools/src/test/resources/paired01.sam diff --git a/public/biopet-framework/src/test/resources/paired01a.fq b/public/biopet-tools/src/test/resources/paired01a.fq similarity index 100% rename from public/biopet-framework/src/test/resources/paired01a.fq rename to public/biopet-tools/src/test/resources/paired01a.fq diff --git a/public/biopet-framework/src/test/resources/paired01b.fq b/public/biopet-tools/src/test/resources/paired01b.fq similarity index 100% rename from public/biopet-framework/src/test/resources/paired01b.fq rename to public/biopet-tools/src/test/resources/paired01b.fq diff --git a/public/biopet-framework/src/test/resources/paired02.bam b/public/biopet-tools/src/test/resources/paired02.bam similarity index 100% rename from public/biopet-framework/src/test/resources/paired02.bam rename to public/biopet-tools/src/test/resources/paired02.bam diff --git a/public/biopet-framework/src/test/resources/paired02.bam.bai b/public/biopet-tools/src/test/resources/paired02.bam.bai similarity index 100% rename from public/biopet-framework/src/test/resources/paired02.bam.bai rename to public/biopet-tools/src/test/resources/paired02.bam.bai diff --git a/public/biopet-framework/src/test/resources/paired02.sam b/public/biopet-tools/src/test/resources/paired02.sam similarity index 100% rename from public/biopet-framework/src/test/resources/paired02.sam rename to public/biopet-tools/src/test/resources/paired02.sam diff --git a/public/biopet-framework/src/test/resources/paired03.bam b/public/biopet-tools/src/test/resources/paired03.bam similarity index 100% rename from public/biopet-framework/src/test/resources/paired03.bam rename to public/biopet-tools/src/test/resources/paired03.bam diff --git a/public/biopet-framework/src/test/resources/paired03.bam.bai b/public/biopet-tools/src/test/resources/paired03.bam.bai similarity index 100% rename from public/biopet-framework/src/test/resources/paired03.bam.bai rename to public/biopet-tools/src/test/resources/paired03.bam.bai diff --git a/public/biopet-framework/src/test/resources/paired03.sam b/public/biopet-tools/src/test/resources/paired03.sam similarity index 100% rename from public/biopet-framework/src/test/resources/paired03.sam rename to public/biopet-tools/src/test/resources/paired03.sam diff --git a/public/biopet-framework/src/test/resources/rrna01.bed b/public/biopet-tools/src/test/resources/rrna01.bed similarity index 100% rename from public/biopet-framework/src/test/resources/rrna01.bed rename to public/biopet-tools/src/test/resources/rrna01.bed diff --git a/public/biopet-framework/src/test/resources/rrna01.gtf b/public/biopet-tools/src/test/resources/rrna01.gtf similarity index 100% rename from public/biopet-framework/src/test/resources/rrna01.gtf rename to public/biopet-tools/src/test/resources/rrna01.gtf diff --git a/public/biopet-framework/src/test/resources/rrna01.refFlat b/public/biopet-tools/src/test/resources/rrna01.refFlat similarity index 100% rename from public/biopet-framework/src/test/resources/rrna01.refFlat rename to public/biopet-tools/src/test/resources/rrna01.refFlat diff --git a/public/biopet-framework/src/test/resources/rrna02.bed b/public/biopet-tools/src/test/resources/rrna02.bed similarity index 67% rename from public/biopet-framework/src/test/resources/rrna02.bed rename to public/biopet-tools/src/test/resources/rrna02.bed index 191138d56b2f30f42f1dfe3a26769777e51e04d3..7d59f4c301bcfeefadafc791c589d52f6244f049 100644 --- a/public/biopet-framework/src/test/resources/rrna02.bed +++ b/public/biopet-tools/src/test/resources/rrna02.bed @@ -2,5 +2,5 @@ chrQ 300 350 rRNA03 0 + chrQ 350 400 rRNA03 0 + chrQ 450 480 rRNA02 0 - chrQ 470 475 rRNA04 0 - -chrQ 1 200 rRNA01 0 . -chrQ 150 250 rRNA01 0 . +chrQ 1 200 rRNA01 0 +chrQ 150 250 rRNA01 0 diff --git a/public/biopet-tools/src/test/resources/sageAllGenesTest.tsv b/public/biopet-tools/src/test/resources/sageAllGenesTest.tsv new file mode 100644 index 0000000000000000000000000000000000000000..602518753b0b9a24a18c6561fbb0b6aabd99a2fe --- /dev/null +++ b/public/biopet-tools/src/test/resources/sageAllGenesTest.tsv @@ -0,0 +1,3 @@ +ENSG00000255336 +ENSG00000254767 +ENSG99999999999 diff --git a/public/biopet-tools/src/test/resources/sageNoAntiTest.tsv b/public/biopet-tools/src/test/resources/sageNoAntiTest.tsv new file mode 100644 index 0000000000000000000000000000000000000000..84f1b39db543978adcf0f29ae27c96aacfe823f7 --- /dev/null +++ b/public/biopet-tools/src/test/resources/sageNoAntiTest.tsv @@ -0,0 +1 @@ +ENSG99999999999 diff --git a/public/biopet-tools/src/test/resources/sageNoTagsTest.tsv b/public/biopet-tools/src/test/resources/sageNoTagsTest.tsv new file mode 100644 index 0000000000000000000000000000000000000000..84f1b39db543978adcf0f29ae27c96aacfe823f7 --- /dev/null +++ b/public/biopet-tools/src/test/resources/sageNoTagsTest.tsv @@ -0,0 +1 @@ +ENSG99999999999 diff --git a/public/biopet-tools/src/test/resources/sageTest.tsv b/public/biopet-tools/src/test/resources/sageTest.tsv new file mode 100644 index 0000000000000000000000000000000000000000..080395ff9049459f7a43dfc131beb05367d89b7c --- /dev/null +++ b/public/biopet-tools/src/test/resources/sageTest.tsv @@ -0,0 +1,9 @@ +#tag firstTag AllTags FirstAntiTag AllAntiTags +CATGAAGACATGGCGGTCATG ENSG00000255336 +CATGAAGCAGAATATCAAGAG ENSG00000254767 +CATGACACTGATAGGCAAGCT ENSG00000254767 +CATGACCGCCATGTCTTCATG ENSG00000255336 +CATGGATTGCGCTCTACTGGT ENSG00000254767 ENSG00000254767 +CATGGGTGGAACCCTTAAAAC ENSG00000255336 ENSG00000255336 +CATGGTTCCCAGTGTGAGAAC ENSG00000254767 ENSG00000254767 +CATGTTCTTCCTTAGCACCCT ENSG00000255336 ENSG00000255336 diff --git a/public/biopet-tools/src/test/resources/same.tsv b/public/biopet-tools/src/test/resources/same.tsv new file mode 100644 index 0000000000000000000000000000000000000000..e82fcbb3b50a1c8a613480ad0b0ef76673a0060c --- /dev/null +++ b/public/biopet-tools/src/test/resources/same.tsv @@ -0,0 +1,3 @@ +sample library bam +Sample_ID_1 Lib_ID_1 MyFirst.bam +Sample_ID_1 Lib_ID_1 MySecond.bam diff --git a/public/biopet-tools/src/test/resources/sample.tsv b/public/biopet-tools/src/test/resources/sample.tsv new file mode 100644 index 0000000000000000000000000000000000000000..3c67fc7c1cbc58ad00d7869ed1275b9a85e96cf3 --- /dev/null +++ b/public/biopet-tools/src/test/resources/sample.tsv @@ -0,0 +1,3 @@ +sample library bam +Sample_ID_1 Lib_ID_1 MyFirst.bam +Sample_ID_2 Lib_ID_2 MySecond.bam diff --git a/public/biopet-framework/src/test/resources/single01.bam b/public/biopet-tools/src/test/resources/single01.bam similarity index 100% rename from public/biopet-framework/src/test/resources/single01.bam rename to public/biopet-tools/src/test/resources/single01.bam diff --git a/public/biopet-framework/src/test/resources/single01.bam.bai b/public/biopet-tools/src/test/resources/single01.bam.bai similarity index 100% rename from public/biopet-framework/src/test/resources/single01.bam.bai rename to public/biopet-tools/src/test/resources/single01.bam.bai diff --git a/public/biopet-framework/src/test/resources/single01.fq b/public/biopet-tools/src/test/resources/single01.fq similarity index 100% rename from public/biopet-framework/src/test/resources/single01.fq rename to public/biopet-tools/src/test/resources/single01.fq diff --git a/public/biopet-framework/src/test/resources/single01.sam b/public/biopet-tools/src/test/resources/single01.sam similarity index 100% rename from public/biopet-framework/src/test/resources/single01.sam rename to public/biopet-tools/src/test/resources/single01.sam diff --git a/public/biopet-framework/src/test/resources/single02.bam b/public/biopet-tools/src/test/resources/single02.bam similarity index 100% rename from public/biopet-framework/src/test/resources/single02.bam rename to public/biopet-tools/src/test/resources/single02.bam diff --git a/public/biopet-framework/src/test/resources/single02.bam.bai b/public/biopet-tools/src/test/resources/single02.bam.bai similarity index 100% rename from public/biopet-framework/src/test/resources/single02.bam.bai rename to public/biopet-tools/src/test/resources/single02.bam.bai diff --git a/public/biopet-framework/src/test/resources/single02.sam b/public/biopet-tools/src/test/resources/single02.sam similarity index 100% rename from public/biopet-framework/src/test/resources/single02.sam rename to public/biopet-tools/src/test/resources/single02.sam diff --git a/public/biopet-framework/src/test/resources/single03.bam b/public/biopet-tools/src/test/resources/single03.bam similarity index 100% rename from public/biopet-framework/src/test/resources/single03.bam rename to public/biopet-tools/src/test/resources/single03.bam diff --git a/public/biopet-framework/src/test/resources/single03.bam.bai b/public/biopet-tools/src/test/resources/single03.bam.bai similarity index 100% rename from public/biopet-framework/src/test/resources/single03.bam.bai rename to public/biopet-tools/src/test/resources/single03.bam.bai diff --git a/public/biopet-framework/src/test/resources/single03.sam b/public/biopet-tools/src/test/resources/single03.sam similarity index 100% rename from public/biopet-framework/src/test/resources/single03.sam rename to public/biopet-tools/src/test/resources/single03.sam diff --git a/public/biopet-framework/src/test/resources/single04.bam b/public/biopet-tools/src/test/resources/single04.bam similarity index 100% rename from public/biopet-framework/src/test/resources/single04.bam rename to public/biopet-tools/src/test/resources/single04.bam diff --git a/public/biopet-framework/src/test/resources/single04.bam.bai b/public/biopet-tools/src/test/resources/single04.bam.bai similarity index 100% rename from public/biopet-framework/src/test/resources/single04.bam.bai rename to public/biopet-tools/src/test/resources/single04.bam.bai diff --git a/public/biopet-framework/src/test/resources/single04.sam b/public/biopet-tools/src/test/resources/single04.sam similarity index 100% rename from public/biopet-framework/src/test/resources/single04.sam rename to public/biopet-tools/src/test/resources/single04.sam diff --git a/public/biopet-framework/src/test/resources/single05.bam b/public/biopet-tools/src/test/resources/single05.bam similarity index 100% rename from public/biopet-framework/src/test/resources/single05.bam rename to public/biopet-tools/src/test/resources/single05.bam diff --git a/public/biopet-framework/src/test/resources/single05.bam.bai b/public/biopet-tools/src/test/resources/single05.bam.bai similarity index 100% rename from public/biopet-framework/src/test/resources/single05.bam.bai rename to public/biopet-tools/src/test/resources/single05.bam.bai diff --git a/public/biopet-framework/src/test/resources/single05.sam b/public/biopet-tools/src/test/resources/single05.sam similarity index 100% rename from public/biopet-framework/src/test/resources/single05.sam rename to public/biopet-tools/src/test/resources/single05.sam diff --git a/public/biopet-tools/src/test/resources/tagCount.tsv b/public/biopet-tools/src/test/resources/tagCount.tsv new file mode 100644 index 0000000000000000000000000000000000000000..64181d09a20cdab0c5bfc4792dc081fe5fd0f222 --- /dev/null +++ b/public/biopet-tools/src/test/resources/tagCount.tsv @@ -0,0 +1,8 @@ +CATGAAGACATGGCGGTCATG 20 +CATGAAGCAGAATATCAAGAG 25 +CATGACACTGATAGGCAAGCT 30 +CATGACCGCCATGTCTTCATG 35 +CATGGATTGCGCTCTACTGGT 40 +CATGGGTGGAACCCTTAAAAC 45 +CATGGTTCCCAGTGTGAGAAC 50 +CATGTTCTTCCTTAGCACCCT 55 diff --git a/public/biopet-tools/src/test/resources/test.summary.json b/public/biopet-tools/src/test/resources/test.summary.json new file mode 100644 index 0000000000000000000000000000000000000000..aff9e962e4662f9f76f4a53f2a8fe0557ecb92d4 --- /dev/null +++ b/public/biopet-tools/src/test/resources/test.summary.json @@ -0,0 +1,17 @@ +{ + "samples" : { + "016" : { + "libraries" : { + "L001" : { + "flexiprep" : { + "settings" : { + "skip_trim" : false, + "skip_clip" : false, + "paired" : true + } + } + } + } + } + } +} diff --git a/public/biopet-tools/src/test/resources/unvep_online.vcf.gz b/public/biopet-tools/src/test/resources/unvep_online.vcf.gz new file mode 100644 index 0000000000000000000000000000000000000000..f102295f99d0bc62e25de296b84dc6610930a683 Binary files /dev/null and b/public/biopet-tools/src/test/resources/unvep_online.vcf.gz differ diff --git a/public/biopet-tools/src/test/resources/unvep_online.vcf.gz.tbi b/public/biopet-tools/src/test/resources/unvep_online.vcf.gz.tbi new file mode 100644 index 0000000000000000000000000000000000000000..bb43ff545f591dd276973e7158919e6d14c78f23 Binary files /dev/null and b/public/biopet-tools/src/test/resources/unvep_online.vcf.gz.tbi differ diff --git a/public/biopet-framework/src/test/resources/unvepped.vcf b/public/biopet-tools/src/test/resources/unvepped.vcf similarity index 100% rename from public/biopet-framework/src/test/resources/unvepped.vcf rename to public/biopet-tools/src/test/resources/unvepped.vcf diff --git a/public/biopet-framework/src/test/resources/unvepped.vcf.gz b/public/biopet-tools/src/test/resources/unvepped.vcf.gz similarity index 100% rename from public/biopet-framework/src/test/resources/unvepped.vcf.gz rename to public/biopet-tools/src/test/resources/unvepped.vcf.gz diff --git a/public/biopet-framework/src/test/resources/unvepped.vcf.gz.tbi b/public/biopet-tools/src/test/resources/unvepped.vcf.gz.tbi similarity index 100% rename from public/biopet-framework/src/test/resources/unvepped.vcf.gz.tbi rename to public/biopet-tools/src/test/resources/unvepped.vcf.gz.tbi diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/AnnotateVcfWithBedTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/AnnotateVcfWithBedTest.scala similarity index 100% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/AnnotateVcfWithBedTest.scala rename to public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/AnnotateVcfWithBedTest.scala diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/BastyGenerateFastaTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/BastyGenerateFastaTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..d9d8c1bee6bf96bcc98a372585b5a89454775e08 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/BastyGenerateFastaTest.scala @@ -0,0 +1,76 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import htsjdk.variant.vcf.VCFFileReader +import org.scalatest.Matchers +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test +import org.scalatest.mock.MockitoSugar +import org.mockito.Mockito._ + +/** + * Created by ahbbollen on 13-8-15. + */ +class BastyGenerateFastaTest extends TestNGSuite with MockitoSugar with Matchers { + + import BastyGenerateFasta._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + val vepped_path = resourcePath("/VEP_oneline.vcf") + val vepped = new File(vepped_path) + val bam_path = resourcePath("/paired01.bam") + val chrQ_path = resourcePath("/chrQ.vcf.gz") + val chrQRef_path = resourcePath("/fake_chrQ.fa") + val bam = new File(resourcePath("/paired01.bam")) + val chrQ = new File(resourcePath("/chrQ.vcf.gz")) + val chrQRef = new File(resourcePath("/fake_chrQ.fa")) + + @Test def testMainVcf = { + val tmp = File.createTempFile("basty_out", ".fa") + tmp.deleteOnExit() + val tmppath = tmp.getAbsolutePath + tmp.deleteOnExit() + + val arguments = Array("-V", chrQ_path, "--outputVariants", tmppath, "--sampleName", "Child_7006504", "--reference", chrQRef_path, "--outputName", "test") + main(arguments) + } + + @Test def testMainVcfAndBam = { + val tmp = File.createTempFile("basty_out", ".fa") + tmp.deleteOnExit() + val tmppath = tmp.getAbsolutePath + tmp.deleteOnExit() + + val arguments = Array("-V", chrQ_path, "--outputVariants", tmppath, "--bamFile", bam_path, "--sampleName", "Child_7006504", "--reference", chrQRef_path, "--outputName", "test") + main(arguments) + } + + @Test def testMainVcfAndBamMore = { + val tmp = File.createTempFile("basty_out", ".fa") + tmp.deleteOnExit() + val tmppath = tmp.getAbsolutePath + tmp.deleteOnExit() + + val arguments = Array("-V", chrQ_path, "--outputConsensus", tmppath, "--outputConsensusVariants", tmppath, "--bamFile", bam_path, "--sampleName", "Child_7006504", "--reference", chrQRef_path, "--outputName", "test") + main(arguments) + } + + @Test def testGetMaxAllele = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + val child = mock[Args] + when(child.sampleName) thenReturn "Child_7006504" + val father = mock[Args] + when(father.sampleName) thenReturn "Father_7006506" + + getMaxAllele(record)(child) shouldBe "C-" + getMaxAllele(record)(father) shouldBe "CA" + + } + +} diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/BiopetFlagstatTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/BiopetFlagstatTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..d919fe400154683f5fef7a061ce76c60aab5f5e7 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/BiopetFlagstatTest.scala @@ -0,0 +1,63 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import htsjdk.samtools.SamReaderFactory +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +import scala.io.Source + +/** + * Created by ahbbollen on 26-8-15. + */ +class BiopetFlagstatTest extends TestNGSuite with MockitoSugar with Matchers { + + import BiopetFlagstat._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + val bam = new File(resourcePath("/paired01.bam")) + val report = new File(resourcePath("/flagstat_report.txt")) + val summary = new File(resourcePath("/flagstat_summary.txt")) + val crossReport = new File(resourcePath("/flagstat_crossreport.txt")) + val crossTrue = new File(resourcePath("/flagstat_crosstrue.txt")) + + val record = SamReaderFactory.makeDefault().open(bam).iterator().next() + val processor = new FlagstatCollector + processor.loadDefaultFunctions() + processor.loadRecord(record) + + @Test + def testReport() = { + processor.report shouldBe Source.fromFile(report).mkString + } + + @Test + def testSummary() = { + processor.summary shouldBe Source.fromFile(summary).mkString + } + + @Test + def testCrossReport() = { + processor.crossReport() shouldBe Source.fromFile(crossReport).mkString + } + + @Test + def testCrossReportTrue() = { + processor.crossReport(true) shouldBe Source.fromFile(crossTrue).mkString + } + + @Test + def testMain() = { + //TODO: Test output file + val output = File.createTempFile("testMain", ".biopetflagstat") + output.deleteOnExit() + main(Array("-I", bam.getAbsolutePath, "-o", output.toString)) + } + +} diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBamTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBamTest.scala similarity index 74% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBamTest.scala rename to public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBamTest.scala index 6a99b92d3fd3b9ba16f9c09e67094cff1e40bedf..476e8e6230caa6bd1051a683459ca28566516699 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBamTest.scala +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/CheckAllelesVcfInBamTest.scala @@ -15,8 +15,11 @@ */ package nl.lumc.sasc.biopet.tools +import java.io.File import java.nio.file.Paths +import htsjdk.samtools.{ SamReaderFactory, SamReader } +import htsjdk.variant.vcf.VCFFileReader import org.scalatest.Matchers import org.scalatest.mock.MockitoSugar import org.scalatest.testng.TestNGSuite @@ -38,6 +41,7 @@ class CheckAllelesVcfInBamTest extends TestNGSuite with MockitoSugar with Matche val vcf = resourcePath("/chrQ.vcf") val bam = resourcePath("/single01.bam") + val vcf2 = new File(resourcePath("/chrQ2.vcf.gz")) val rand = new Random() @Test def testOutputTypeVcf() = { @@ -58,4 +62,19 @@ class CheckAllelesVcfInBamTest extends TestNGSuite with MockitoSugar with Matche main(arguments) } + @Test + def testCheckAllelesNone() = { + val variant = new File(vcf) + val samRecord = SamReaderFactory.makeDefault().open(new File(bam)).iterator().next() + val varRecord = new VCFFileReader(variant, false).iterator().next() + checkAlleles(samRecord, varRecord) shouldBe None + } + + @Test + def testCheckAlleles() = { + val samRecord = SamReaderFactory.makeDefault().open(new File(bam)).iterator().next() + val varRecord = new VCFFileReader(vcf2).iterator().next() + checkAlleles(samRecord, varRecord) shouldBe Some("T") + } + } diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/ExtractAlignedFastqTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/ExtractAlignedFastqTest.scala similarity index 100% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/ExtractAlignedFastqTest.scala rename to public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/ExtractAlignedFastqTest.scala diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/FastqSplitterTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/FastqSplitterTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..dd66b204fab38a4fd26fb011ffe6739faf304009 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/FastqSplitterTest.scala @@ -0,0 +1,40 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +/** + * Created by ahbbollen on 27-8-15. + */ +class FastqSplitterTest extends TestNGSuite with MockitoSugar with Matchers { + + import FastqSplitter._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + val fq = resourcePath("/paired01a.fq") + + @Test + def testMain() = { + val temp = File.createTempFile("out", ".fastq") + temp.deleteOnExit() + val args = Array("-I", fq, "-o", temp.getAbsolutePath) + main(args) + } + + @Test + def testManyOutMain() = { + val files = (0 until 10).map(_ => File.createTempFile("out", ".fastq")) + files.foreach(_.deleteOnExit()) + var args = Array("-I", fq) + files.foreach(x => args ++= Array("-o", x.getAbsolutePath)) + main(args) + } + +} diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/FastqSyncTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/FastqSyncTest.scala similarity index 100% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/FastqSyncTest.scala rename to public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/FastqSyncTest.scala diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/FindRepeatsPacBioTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/FindRepeatsPacBioTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..907270b8c31f3eb152766f03a612101f754ba6c9 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/FindRepeatsPacBioTest.scala @@ -0,0 +1,65 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import htsjdk.samtools.{ SamReaderFactory, QueryInterval } +import nl.lumc.sasc.biopet.tools.FastqSplitter._ +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +import scala.collection.immutable.Nil + +/** + * Created by ahbbollen on 27-8-15. + */ +class FindRepeatsPacBioTest extends TestNGSuite with MockitoSugar with Matchers { + + import FindRepeatsPacBio._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + val bed = resourcePath("/rrna01.bed") + val bam = resourcePath("/paired01.bam") + + @Test + def testMain() = { + + val outputFile = File.createTempFile("repeats", ".tsv") + outputFile.deleteOnExit() + val args = Array("-I", bam, "-b", bed, "-o", outputFile.toString) + main(args) + } + + @Test + def testResult() = { + val samReader = SamReaderFactory.makeDefault().open(new File(bam)) + val header = samReader.getFileHeader + val record = samReader.iterator().next() + val interval = new QueryInterval(header.getSequenceIndex("chrQ"), 50, 55) + val result = procesSamrecord(record, interval) + + result.isEmpty shouldBe false + + result.get.samRecord shouldEqual record + result.get.beginDel should be >= 0 + result.get.endDel should be >= 0 + } + + @Test + def testResultObject = { + val record = SamReaderFactory.makeDefault().open(new File(bam)).iterator().next() + val result = new Result + result.samRecord = record + + result.samRecord shouldEqual record + result.beginDel shouldBe 0 + result.endDel shouldBe 0 + result.dels shouldEqual Nil + result.ins shouldEqual Nil + } + +} diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/MergeAllelesTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/MergeAllelesTest.scala similarity index 100% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/MergeAllelesTest.scala rename to public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/MergeAllelesTest.scala diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/MergeTablesTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/MergeTablesTest.scala similarity index 100% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/MergeTablesTest.scala rename to public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/MergeTablesTest.scala diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/MpileupToVcfTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/MpileupToVcfTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..e708bc654b3c81699dcab85bfc07a7e21cd3ea94 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/MpileupToVcfTest.scala @@ -0,0 +1,81 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import htsjdk.samtools.reference.IndexedFastaSequenceFile +import htsjdk.variant.variantcontext.Allele +import htsjdk.variant.vcf.VCFFileReader +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +import scala.collection.JavaConversions._ + +/** + * Created by ahbbollen on 27-8-15. + */ +class MpileupToVcfTest extends TestNGSuite with MockitoSugar with Matchers { + + import MpileupToVcf._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + val pileup = resourcePath("/paired01.pileup") + + @Test + def testMain() = { + val tmp = File.createTempFile("mpileup", ".vcf") + tmp.deleteOnExit() + val args = Array("-I", pileup, "--sample", "test", "-o", tmp.getAbsolutePath) + + main(args) + } + + @Test + def validateOutVcf() = { + val tmp = File.createTempFile("mpileup", ".vcf") + tmp.deleteOnExit() + val args = Array("-I", pileup, "--sample", "test", "-o", tmp.getAbsolutePath, "--minDP", "1", "--minAP", "1") + main(args) + + val vcfReader = new VCFFileReader(tmp, false) + + // VariantContexts validate on creation + // therefore we just have to loop through them + + vcfReader.foreach(_ => 1) + + } + + @Test + def extraValidateOutVcf() = { + val tmp = File.createTempFile("mpileup", ".vcf") + tmp.deleteOnExit() + val args = Array("-I", pileup, "--sample", "test", "-o", tmp.getAbsolutePath, "--minDP", "1", "--minAP", "1") + main(args) + + val vcfReader = new VCFFileReader(tmp, false) + + val fasta = resourcePath("/chrQ_allN.fa") + + val sequenceFile = new IndexedFastaSequenceFile(new File(fasta)) + val sequenceDict = sequenceFile.getSequenceDictionary + + for (record <- vcfReader) { + val alleles = record.getAlleles.toSet + var ref_alleles = alleles -- record.getAlternateAlleles.toSet + + ref_alleles.size should be >= 1 + + val realRef = Allele.create(sequenceFile.getSubsequenceAt(record.getContig, + record.getStart, record.getEnd).getBases, true) + + for (ref <- ref_alleles) { + record.extraStrictValidation(ref, realRef, Set("")) + } + } + } +} diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/PrefixFastqTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/PrefixFastqTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..611557d836636aebc71f90e70707035033df6b97 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/PrefixFastqTest.scala @@ -0,0 +1,49 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import htsjdk.samtools.fastq.FastqReader +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +import scala.collection.JavaConversions._ + +/** + * Created by ahbbollen on 28-8-15. + */ +class PrefixFastqTest extends TestNGSuite with MockitoSugar with Matchers { + + import PrefixFastq._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + val fq = resourcePath("/paired01a.fq") + + @Test + def testMain() = { + val temp = File.createTempFile("out", ".fastq") + temp.deleteOnExit() + + val args = Array("-i", fq, "-o", temp.getAbsolutePath, "-s", "AAA") + main(args) + } + + @Test + def testOutput() = { + val temp = File.createTempFile("out", ".fastq") + temp.deleteOnExit() + + val args = Array("-i", fq, "-o", temp.getAbsolutePath, "-s", "AAA") + main(args) + + val reader = new FastqReader(temp) + + for (read <- reader.iterator()) { + read.getReadString.startsWith("AAA") shouldBe true + } + } +} diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SageCountFastqTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SageCountFastqTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..5c7731c6dd2ec60b7c8cd01f84c6929026d7e007 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SageCountFastqTest.scala @@ -0,0 +1,31 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +/** + * Created by ahbbollen on 28-8-15. + */ +class SageCountFastqTest extends TestNGSuite with MockitoSugar with Matchers { + import SageCountFastq._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + val fq = resourcePath("/paired01a.fq") + + @Test + def testMain() = { + val temp = File.createTempFile("out", ".fastq") + temp.deleteOnExit() + + val args = Array("-I", fq, "-o", temp.getAbsolutePath) + main(args) + } + +} diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SageCreateLibaryTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SageCreateLibaryTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..c86c1ba5a9787c0cdcce00293309500d6f6e4b86 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SageCreateLibaryTest.scala @@ -0,0 +1,120 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import org.biojava3.core.sequence.DNASequence +import org.biojava3.core.sequence.io.FastaReaderHelper +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +import scala.collection.JavaConversions._ + +import scala.io.Source + +/** + * Created by ahbbollen on 7-9-15. + */ +class SageCreateLibaryTest extends TestNGSuite with MockitoSugar with Matchers { + + import SageCreateLibrary._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + @Test + def testMain = { + + val input = resourcePath("/mini.transcriptome.fa") + val output = File.createTempFile("sageCreateLibrary", ".tsv") + output.deleteOnExit() + val noTagsOutput = File.createTempFile("sageCreateLibrary", ".tsv") + noTagsOutput.deleteOnExit() + val antiTagsOutput = File.createTempFile("sageCreateLibrary", ".tsv") + antiTagsOutput.deleteOnExit() + val allGenesOutput = File.createTempFile("sageCreateLibrary", ".tsv") + allGenesOutput.deleteOnExit() + + val args = Array("-I", input, "-o", output.getAbsolutePath, "--tag", "CATG", + "--length", "17", "--noTagsOutput", noTagsOutput.getAbsolutePath, "--noAntiTagsOutput", + antiTagsOutput.getAbsolutePath, "--allGenesOutput", allGenesOutput.getAbsolutePath) + + noException should be thrownBy main(args) + + val args2 = Array("-I", input, "-o", output.getAbsolutePath, "--tag", "CATG", + "--length", "17") + noException should be thrownBy main(args2) + val args3 = Array("-I", input, "-o", output.getAbsolutePath, "--tag", "CATG", + "--length", "17", "--noTagsOutput", noTagsOutput.getAbsolutePath) + noException should be thrownBy main(args3) + + } + + @Test + def testOutPut = { + val input = resourcePath("/mini.transcriptome.fa") + val output = File.createTempFile("sageCreateLibrary", ".tsv") + output.deleteOnExit() + val noTagsOutput = File.createTempFile("sageCreateLibrary", ".tsv") + noTagsOutput.deleteOnExit() + val antiTagsOutput = File.createTempFile("sageCreateLibrary", ".tsv") + antiTagsOutput.deleteOnExit() + val allGenesOutput = File.createTempFile("sageCreateLibrary", ".tsv") + allGenesOutput.deleteOnExit() + + val args = Array("-I", input, "-o", output.getAbsolutePath, "--tag", "CATG", + "--length", "17", "--noTagsOutput", noTagsOutput.getAbsolutePath, "--noAntiTagsOutput", + antiTagsOutput.getAbsolutePath, "--allGenesOutput", allGenesOutput.getAbsolutePath) + main(args) + + Source.fromFile(output).mkString should equal( + Source.fromFile(new File(resourcePath("/sageTest.tsv"))).mkString + ) + + Source.fromFile(noTagsOutput).mkString should equal( + Source.fromFile(new File(resourcePath("/sageNoTagsTest.tsv"))).mkString + ) + + Source.fromFile(antiTagsOutput).mkString should equal( + Source.fromFile(new File(resourcePath("/sageNoAntiTest.tsv"))).mkString + ) + + Source.fromFile(allGenesOutput).mkString should equal( + Source.fromFile(new File(resourcePath("/sageAllGenesTest.tsv"))).mkString + ) + } + + @Test + def testGetTags = { + val input = resourcePath("/mini.transcriptome.fa") + + val reader = FastaReaderHelper.readFastaDNASequence(new File(input)) + + val records = reader.iterator.toList + val tagRegex = ("CATG" + "[CATG]{" + 17 + "}").r + + val record1 = records(0) + val record2 = records(1) + val record3 = records(2) + + val result1 = getTags(record1._1, record1._2, tagRegex) + val result2 = getTags(record2._1, record2._2, tagRegex) + val result3 = getTags(record3._1, record3._2, tagRegex) + + result1.allTags.size shouldBe 2 + result1.allAntiTags.size shouldBe 2 + result1.firstTag shouldBe "CATGGATTGCGCTCTACTGGT" + result1.firstAntiTag shouldBe "CATGGTTCCCAGTGTGAGAAC" + + result2.allTags.size shouldBe 2 + result2.allAntiTags.size shouldBe 2 + result2.firstTag shouldBe "CATGTTCTTCCTTAGCACCCT" + result2.firstAntiTag shouldBe "CATGGGTGGAACCCTTAAAAC" + + result3.allTags.size shouldBe 0 + result3.allAntiTags.size shouldBe 0 + } + +} diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SageCreateTagCountsTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SageCreateTagCountsTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..ecd1dae87d5c43a88e722cbb12f7eb18ece3a9b8 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SageCreateTagCountsTest.scala @@ -0,0 +1,75 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +import scala.io.Source + +/** + * Created by ahbbollen on 7-9-15. + */ +class SageCreateTagCountsTest extends TestNGSuite with MockitoSugar with Matchers { + + import SageCreateTagCounts._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + @Test + def testMain = { + val input = resourcePath("/tagCount.tsv") + val tagLib = resourcePath("/sageTest.tsv") + + val sense = File.createTempFile("SageCreateTagCountsTEst", ".tsv") + sense.deleteOnExit() + val allSense = File.createTempFile("SageCreateTagCountsTEst", ".tsv") + allSense.deleteOnExit() + val antiSense = File.createTempFile("SageCreateTagCountsTEst", ".tsv") + antiSense.deleteOnExit() + val allAntiSense = File.createTempFile("SageCreateTagCountsTEst", ".tsv") + allAntiSense.deleteOnExit() + + noException should be thrownBy main(Array("-I", input, "--tagLib", tagLib, + "--countSense", sense.getAbsolutePath, "--countAllSense", allSense.getAbsolutePath, + "--countAntiSense", antiSense.getAbsolutePath, "--countAllAntiSense", allAntiSense.getAbsolutePath)) + noException should be thrownBy main(Array("-I", input, "--tagLib", tagLib, + "--countSense", sense.getAbsolutePath, "--countAllSense", allSense.getAbsolutePath, + "--countAntiSense", antiSense.getAbsolutePath)) + noException should be thrownBy main(Array("-I", input, "--tagLib", tagLib, + "--countSense", sense.getAbsolutePath, "--countAllSense", allSense.getAbsolutePath)) + noException should be thrownBy main(Array("-I", input, "--tagLib", tagLib, + "--countSense", sense.getAbsolutePath)) + noException should be thrownBy main(Array("-I", input, "--tagLib", tagLib)) + + } + + @Test + def testOutput = { + val input = resourcePath("/tagCount.tsv") + val tagLib = resourcePath("/sageTest.tsv") + + val sense = File.createTempFile("SageCreateTagCountsTEst", ".tsv") + sense.deleteOnExit() + val allSense = File.createTempFile("SageCreateTagCountsTEst", ".tsv") + allSense.deleteOnExit() + val antiSense = File.createTempFile("SageCreateTagCountsTEst", ".tsv") + antiSense.deleteOnExit() + val allAntiSense = File.createTempFile("SageCreateTagCountsTEst", ".tsv") + allAntiSense.deleteOnExit() + + main(Array("-I", input, "--tagLib", tagLib, "--countSense", sense.getAbsolutePath, + "--countAllSense", allSense.getAbsolutePath, "--countAntiSense", antiSense.getAbsolutePath, + "--countAllAntiSense", allAntiSense.getAbsolutePath)) + + Source.fromFile(sense).mkString should equal("ENSG00000254767\t40\nENSG00000255336\t55\n") + Source.fromFile(allSense).mkString should equal("ENSG00000254767\t70\nENSG00000255336\t90\n") + Source.fromFile(antiSense).mkString should equal("ENSG00000254767\t50\nENSG00000255336\t45\n") + Source.fromFile(allAntiSense).mkString should equal("ENSG00000254767\t75\nENSG00000255336\t65\n") + } + +} diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SamplesTsvToJsonTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SamplesTsvToJsonTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..e8a09a9d7cb9af569f2084c95a1b6c2e7f1e1aad --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SamplesTsvToJsonTest.scala @@ -0,0 +1,83 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +/** + * Created by ahbbollen on 28-8-15. + */ +class SamplesTsvToJsonTest extends TestNGSuite with MockitoSugar with Matchers { + import SamplesTsvToJson._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + @Test + def testCorrectSampleTsv = { + val tsv = resourcePath("/sample.tsv") + val output = File.createTempFile("testCorrectSampleTsv", ".json") + output.deleteOnExit() + + noException should be thrownBy main(Array("-i", tsv, "-o", output.toString)) + } + + @Test + def testNoSampleColumn() = { + val tsv = resourcePath("/no_sample.tsv") + val output = File.createTempFile("testNoSampleColumn", ".json") + output.deleteOnExit() + val thrown = the[IllegalStateException] thrownBy main(Array("-i", tsv, "-o", output.toString)) + thrown.getMessage should equal("Sample column does not exist in: " + tsv) + } + + @Test + def testNumberInLibs = { + val tsv = resourcePath("/number.tsv") + val output = File.createTempFile("testNumberInLibs", ".json") + output.deleteOnExit() + val thrown = the[IllegalStateException] thrownBy main(Array("-i", tsv, "-o", output.toString)) + thrown.getMessage should equal("Sample or library may not start with a number") + } + + @Test + def testSampleIDs = { + val tsv = resourcePath("/same.tsv") + val output = File.createTempFile("testSampleIDs", ".json") + output.deleteOnExit() + val thrown = the[IllegalStateException] thrownBy main(Array("-i", tsv, "-o", output.toString)) + thrown.getMessage should equal("Combination of Sample_ID_1 and Lib_ID_1 is found multiple times") + + } + + @Test + def testJson = { + val tsv = new File(resourcePath("/sample.tsv")) + val json = stringFromInputs(List(tsv)) + + json should equal( + """|{ + | "samples" : { + | "Sample_ID_1" : { + | "libraries" : { + | "Lib_ID_1" : { + | "bam" : "MyFirst.bam" + | } + | } + | }, + | "Sample_ID_2" : { + | "libraries" : { + | "Lib_ID_2" : { + | "bam" : "MySecond.bam" + | } + | } + | } + | } + |}""".stripMargin) + } + +} diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/SeqStatTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SeqStatTest.scala similarity index 97% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/SeqStatTest.scala rename to public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SeqStatTest.scala index d9180aec45025b68e9830227110ca9320571b683..c9dd5e290c3cd4da97c58198da2a596258ed0c64 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/SeqStatTest.scala +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SeqStatTest.scala @@ -101,4 +101,6 @@ class SeqStatTest extends TestNGSuite with MockitoSugar with Matchers { val parsed = parseArgs(args) parsed.fastq shouldBe resourceFile("/paired01a.fq") } + + // TODO: Shared state here. Calling main changes the state, which causes other tests to fail } \ No newline at end of file diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SummaryToTsvTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SummaryToTsvTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..4f921affbc090031b83c0ee5fea3c64b9c9bdf4c --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/SummaryToTsvTest.scala @@ -0,0 +1,74 @@ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import nl.lumc.sasc.biopet.tools.SamplesTsvToJson._ +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +import nl.lumc.sasc.biopet.utils.summary.Summary + +/** + * Created by ahbbollen on 31-8-15. + */ +class SummaryToTsvTest extends TestNGSuite with MockitoSugar with Matchers { + import SummaryToTsv._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + @Test + def testMain = { + val tsv = resourcePath("/test.summary.json") + val output = File.createTempFile("main", "tsv") + output.deleteOnExit() + + noException should be thrownBy main(Array("-s", tsv, "-p", "something=flexiprep:settings:skip_trim", + "-m", "root", "-o", output.toString)) + noException should be thrownBy main(Array("-s", tsv, "-p", "something=flexiprep:settings:skip_trim", + "-m", "sample", "-o", output.toString)) + noException should be thrownBy main(Array("-s", tsv, "-p", "something=flexiprep:settings:skip_trim", + "-m", "lib", "-o", output.toString)) + } + + @Test + def testHeader = { + val tsv = resourcePath("/test.summary.json") + val path = List("something=flexiprep:settings:skip_trim") + + val paths = path.map(x => { + val split = x.split("=", 2) + split(0) -> split(1).split(":") + }).toMap + + createHeader(paths) should equal("\tsomething") + } + + @Test + def testLine = { + val tsv = resourcePath("/test.summary.json") + val path = List("something=flexiprep:settings:skip_trim") + + val paths = path.map(x => { + val split = x.split("=", 2) + split(0) -> split(1).split(":") + }).toMap + + val summary = new Summary(new File(tsv)) + val values = fetchValues(summary, paths) + + val line = values.head._2.keys.map(x => createLine(paths, values, x)).head + line should equal("value\t") + val sample_values = fetchValues(summary, paths, true, false) + val sample_line = sample_values.head._2.keys.map(x => createLine(paths, sample_values, x)).head + sample_line should equal("016\t") + + val lib_values = fetchValues(summary, paths, false, true) + val lib_line = lib_values.head._2.keys.map(x => createLine(paths, lib_values, x)).head + lib_line should equal("016-L001\tfalse") + } + +} diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfFilterTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfFilterTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..80fe1980eccad9932e0472ae28242ae93e6b6420 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfFilterTest.scala @@ -0,0 +1,209 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths + +import htsjdk.variant.variantcontext.GenotypeType +import htsjdk.variant.vcf.VCFFileReader +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +import scala.util.Random + +/** + * Test class for [[VcfFilter]] + * + * Created by ahbbollen on 9-4-15. + */ +class VcfFilterTest extends TestNGSuite with MockitoSugar with Matchers { + + import VcfFilter._ + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + val vepped_path = resourcePath("/VEP_oneline.vcf") + val vepped = new File(vepped_path) + val rand = new Random() + + @Test def testOutputTypeVcf() = { + val tmp_path = "/tmp/VcfFilter_" + rand.nextString(10) + ".vcf" + val arguments: Array[String] = Array("-I", vepped_path, "-o", tmp_path) + main(arguments) + } + + @Test def testOutputTypeBcf() = { + val tmp_path = "/tmp/VcfFilter_" + rand.nextString(10) + ".bcf" + val arguments: Array[String] = Array("-I", vepped_path, "-o", tmp_path) + main(arguments) + } + + @Test def testOutputTypeVcfGz() = { + val tmp_path = "/tmp/VcfFilter_" + rand.nextString(10) + ".vcf.gz" + val arguments: Array[String] = Array("-I", vepped_path, "-o", tmp_path) + main(arguments) + } + + @Test def testHasGenotype() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + hasGenotype(record, List(("Child_7006504", GenotypeType.HET))) shouldBe true + hasGenotype(record, List(("Child_7006504", GenotypeType.HOM_VAR))) shouldBe false + hasGenotype(record, List(("Child_7006504", GenotypeType.HOM_REF))) shouldBe false + hasGenotype(record, List(("Child_7006504", GenotypeType.NO_CALL))) shouldBe false + hasGenotype(record, List(("Child_7006504", GenotypeType.MIXED))) shouldBe false + + hasGenotype(record, List(("Mother_7006508", GenotypeType.HET))) shouldBe false + hasGenotype(record, List(("Mother_7006508", GenotypeType.HOM_VAR))) shouldBe false + hasGenotype(record, List(("Mother_7006508", GenotypeType.HOM_REF))) shouldBe true + hasGenotype(record, List(("Mother_7006508", GenotypeType.NO_CALL))) shouldBe false + hasGenotype(record, List(("Mother_7006508", GenotypeType.MIXED))) shouldBe false + + hasGenotype(record, List(("Mother_7006508", GenotypeType.HOM_REF), ("Child_7006504", GenotypeType.HET))) shouldBe true + hasGenotype(record, List(("Mother_7006508", GenotypeType.HET), ("Child_7006504", GenotypeType.HOM_REF))) shouldBe false + } + + @Test def testMinQualScore() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + minQualscore(record, 2000) shouldBe false + minQualscore(record, 1000) shouldBe true + + } + + @Test def testHasNonRefCalls() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + hasNonRefCalls(record) shouldBe true + } + + @Test def testHasCalls() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + hasCalls(record) shouldBe true + } + + @Test def testHasMinDP() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + hasMinTotalDepth(record, 100) shouldBe true + hasMinTotalDepth(record, 200) shouldBe false + } + + @Test def testHasMinSampleDP() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + hasMinSampleDepth(record, 30, 1) shouldBe true + hasMinSampleDepth(record, 30, 2) shouldBe true + hasMinSampleDepth(record, 30, 3) shouldBe true + hasMinSampleDepth(record, 40, 1) shouldBe true + hasMinSampleDepth(record, 40, 2) shouldBe true + hasMinSampleDepth(record, 40, 3) shouldBe false + hasMinSampleDepth(record, 50, 1) shouldBe false + hasMinSampleDepth(record, 50, 2) shouldBe false + hasMinSampleDepth(record, 50, 3) shouldBe false + } + + @Test def testHasMinSampleAD() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + minAlternateDepth(record, 0, 3) shouldBe true + minAlternateDepth(record, 10, 2) shouldBe true + minAlternateDepth(record, 10, 3) shouldBe false + minAlternateDepth(record, 20, 1) shouldBe true + minAlternateDepth(record, 20, 2) shouldBe false + } + + @Test def testMustHaveVariant() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + mustHaveVariant(record, List("Child_7006504")) shouldBe true + mustHaveVariant(record, List("Child_7006504", "Father_7006506")) shouldBe true + mustHaveVariant(record, List("Child_7006504", "Father_7006506", "Mother_7006508")) shouldBe false + } + + @Test def testSameGenotype() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + notSameGenotype(record, "Child_7006504", "Father_7006506") shouldBe false + notSameGenotype(record, "Child_7006504", "Mother_7006508") shouldBe true + notSameGenotype(record, "Father_7006506", "Mother_7006508") shouldBe true + } + + @Test def testfilterHetVarToHomVar() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + filterHetVarToHomVar(record, "Child_7006504", "Father_7006506") shouldBe true + filterHetVarToHomVar(record, "Child_7006504", "Mother_7006508") shouldBe true + filterHetVarToHomVar(record, "Father_7006506", "Mother_7006508") shouldBe true + } + + @Test def testDeNovo() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + denovoInSample(record, "Child_7006504") shouldBe false + denovoInSample(record, "Father_7006506") shouldBe false + denovoInSample(record, "Mother_7006508") shouldBe false + } + + @Test def testResToDom() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + val trio = new Trio("Child_7006504", "Father_7006506", "Mother_7006508") + + resToDom(record, List(trio)) shouldBe false + } + + @Test def testTrioCompound = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + val trio = new Trio("Child_7006504", "Father_7006506", "Mother_7006508") + + trioCompound(record, List(trio)) + } + + @Test def testDeNovoTrio = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + val trio = new Trio("Child_7006504", "Father_7006506", "Mother_7006508") + + denovoTrio(record, List(trio)) + } + + @Test def testInIDSet() = { + val reader = new VCFFileReader(vepped, false) + val record = reader.iterator().next() + + inIdSet(record, Set("rs199537431")) shouldBe true + inIdSet(record, Set("dummy")) shouldBe false + } + +} diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfStatsTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfStatsTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..b7a30c52e615c18502646d23b325637e288e8e72 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfStatsTest.scala @@ -0,0 +1,414 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.{ Files, Paths } + +import htsjdk.variant.variantcontext.Allele +import htsjdk.variant.vcf.VCFFileReader +import nl.lumc.sasc.biopet.tools.VcfStats._ +import org.scalatest.Matchers +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +import scala.collection.mutable + +/** + * Test class for [[VcfStats]] + * + * Created by pjvan_thof on 2/5/15. + */ +class VcfStatsTest extends TestNGSuite with Matchers { + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + @Test + def testSampleToSampleStats(): Unit = { + val s1 = SampleToSampleStats() + val s2 = SampleToSampleStats() + s1.alleleOverlap shouldBe 0 + s1.genotypeOverlap shouldBe 0 + s2.alleleOverlap shouldBe 0 + s2.genotypeOverlap shouldBe 0 + + s1 += s2 + s1.alleleOverlap shouldBe 0 + s1.genotypeOverlap shouldBe 0 + s2.alleleOverlap shouldBe 0 + s2.genotypeOverlap shouldBe 0 + + s2.alleleOverlap = 2 + s2.genotypeOverlap = 3 + + s1 += s2 + s1.alleleOverlap shouldBe 2 + s1.genotypeOverlap shouldBe 3 + s2.alleleOverlap shouldBe 2 + s2.genotypeOverlap shouldBe 3 + + s1 += s2 + s1.alleleOverlap shouldBe 4 + s1.genotypeOverlap shouldBe 6 + s2.alleleOverlap shouldBe 2 + s2.genotypeOverlap shouldBe 3 + } + + @Test + def testSampleStats(): Unit = { + val s1 = SampleStats() + val s2 = SampleStats() + + s1.sampleToSample += "s1" -> SampleToSampleStats() + s1.sampleToSample += "s2" -> SampleToSampleStats() + s2.sampleToSample += "s1" -> SampleToSampleStats() + s2.sampleToSample += "s2" -> SampleToSampleStats() + + s1.sampleToSample("s1").alleleOverlap = 1 + s2.sampleToSample("s2").alleleOverlap = 2 + + val bla1 = s1.genotypeStats.getOrElse("chr", mutable.Map[String, mutable.Map[Any, Int]]()) += "1" -> mutable.Map(1 -> 1) + s1.genotypeStats += "chr" -> bla1 + val bla2 = s2.genotypeStats.getOrElse("chr", mutable.Map[String, mutable.Map[Any, Int]]()) += "2" -> mutable.Map(2 -> 2) + s2.genotypeStats += "chr" -> bla2 + + val ss1 = SampleToSampleStats() + val ss2 = SampleToSampleStats() + + s1 += s2 + s1.genotypeStats.getOrElse("chr", mutable.Map[String, mutable.Map[Any, Int]]()) shouldBe mutable.Map("1" -> mutable.Map(1 -> 1), "2" -> mutable.Map(2 -> 2)) + ss1.alleleOverlap = 1 + ss2.alleleOverlap = 2 + s1.sampleToSample shouldBe mutable.Map("s1" -> ss1, "s2" -> ss2) + + s1 += s2 + s1.genotypeStats.getOrElse("chr", mutable.Map[String, mutable.Map[Any, Int]]()) shouldBe mutable.Map("1" -> mutable.Map(1 -> 1), "2" -> mutable.Map(2 -> 4)) + + s1 += s1 + s1.genotypeStats.getOrElse("chr", mutable.Map[String, mutable.Map[Any, Int]]()) shouldBe mutable.Map("1" -> mutable.Map(1 -> 2), "2" -> mutable.Map(2 -> 8)) + } + + @Test + def testAlleleOverlap(): Unit = { + + val a1 = Allele.create("G") + val a2 = Allele.create("A") + + alleleOverlap(List(a1, a1), List(a1, a1)) shouldBe 2 + alleleOverlap(List(a2, a2), List(a2, a2)) shouldBe 2 + alleleOverlap(List(a1, a2), List(a1, a2)) shouldBe 2 + alleleOverlap(List(a1, a2), List(a2, a1)) shouldBe 2 + alleleOverlap(List(a2, a1), List(a1, a2)) shouldBe 2 + alleleOverlap(List(a2, a1), List(a2, a1)) shouldBe 2 + + alleleOverlap(List(a1, a2), List(a1, a1)) shouldBe 1 + alleleOverlap(List(a2, a1), List(a1, a1)) shouldBe 1 + alleleOverlap(List(a1, a1), List(a1, a2)) shouldBe 1 + alleleOverlap(List(a1, a1), List(a2, a1)) shouldBe 1 + + alleleOverlap(List(a1, a1), List(a2, a2)) shouldBe 0 + alleleOverlap(List(a2, a2), List(a1, a1)) shouldBe 0 + } + + @Test + def testMergeStatsMap = { + val m1: mutable.Map[Any, Int] = mutable.Map("a" -> 1) + val m2: mutable.Map[Any, Int] = mutable.Map("b" -> 2) + + mergeStatsMap(m1, m2) + + m1 should equal(mutable.Map("a" -> 1, "b" -> 2)) + + val m3: mutable.Map[Any, Int] = mutable.Map(1 -> 500) + val m4: mutable.Map[Any, Int] = mutable.Map(6 -> 125) + + mergeStatsMap(m3, m4) + + m3 should equal(mutable.Map(1 -> 500, 6 -> 125)) + + mergeStatsMap(m1, m3) + + m1 should equal(mutable.Map("a" -> 1, "b" -> 2, 1 -> 500, 6 -> 125)) + } + + @Test + def testMergeNestedStatsMap = { + val m1: mutable.Map[String, mutable.Map[String, mutable.Map[Any, Int]]] = mutable.Map("test" -> + mutable.Map("nested" -> mutable.Map("a" -> 1))) + val m2: Map[String, Map[String, Map[Any, Int]]] = Map("test" -> + Map("nested" -> Map("b" -> 2))) + + mergeNestedStatsMap(m1, m2) + + m1 should equal(mutable.Map("test" -> mutable.Map("nested" -> mutable.Map("a" -> 1, "b" -> 2)))) + + val m3: mutable.Map[String, mutable.Map[String, mutable.Map[Any, Int]]] = mutable.Map("test" -> + mutable.Map("nestedd" -> mutable.Map(1 -> 500))) + val m4: Map[String, Map[String, Map[Any, Int]]] = Map("test" -> + Map("nestedd" -> Map(6 -> 125))) + + mergeNestedStatsMap(m3, m4) + + m3 should equal(mutable.Map("test" -> mutable.Map("nestedd" -> mutable.Map(1 -> 500, 6 -> 125)))) + + val m5 = m3.toMap.map(x => x._1 -> x._2.toMap.map(y => y._1 -> y._2.toMap)) + + mergeNestedStatsMap(m1, m5) + + m1 should equal(mutable.Map("test" -> mutable.Map("nested" -> mutable.Map("a" -> 1, "b" -> 2), + "nestedd" -> mutable.Map(1 -> 500, 6 -> 125)))) + } + + @Test + def testValueOfTsv = { + val i = new File(resourcePath("/sample.tsv")) + + valueFromTsv(i, "Sample_ID_1", "library") should be(Some("Lib_ID_1")) + valueFromTsv(i, "Sample_ID_2", "library") should be(Some("Lib_ID_2")) + valueFromTsv(i, "Sample_ID_1", "bam") should be(Some("MyFirst.bam")) + valueFromTsv(i, "Sample_ID_2", "bam") should be(Some("MySecond.bam")) + valueFromTsv(i, "Sample_ID_3", "bam") should be(empty) + } + + @Test + def testMain = { + val tmp = Files.createTempDirectory("vcfStats") + val vcf = resourcePath("/chrQ.vcf.gz") + val ref = resourcePath("/fake_chrQ.fa") + + noException should be thrownBy main(Array("-I", vcf, "-R", ref, "-o", tmp.toAbsolutePath.toString)) + noException should be thrownBy main(Array("-I", vcf, "-R", ref, "-o", tmp.toAbsolutePath.toString, "--allInfoTags")) + noException should be thrownBy main(Array("-I", vcf, "-R", ref, "-o", + tmp.toAbsolutePath.toString, "--allInfoTags", "--allGenotypeTags")) + noException should be thrownBy main(Array("-I", vcf, "-R", ref, "-o", + tmp.toAbsolutePath.toString, "--binSize", "50", "--writeBinStats")) + noException should be thrownBy main(Array("-I", vcf, "-R", ref, "-o", + tmp.toAbsolutePath.toString, "--binSize", "50", "--writeBinStats", + "--generalWiggle", "Total")) + noException should be thrownBy main(Array("-I", vcf, "-R", ref, "-o", + tmp.toAbsolutePath.toString, "--binSize", "50", "--writeBinStats", + "--genotypeWiggle", "Total")) + + val genotypes = List("Het", "HetNonRef", "Hom", "HomRef", "HomVar", "Mixed", "NoCall", "NonInformative", + "Available", "Called", "Filtered", "Variant") + + genotypes.foreach( + x => noException should be thrownBy main(Array("-I", vcf, "-R", ref, "-o", + tmp.toAbsolutePath.toString, "--binSize", "50", "--writeBinStats", + "--genotypeWiggle", x)) + ) + + val general = List("Biallelic", "ComplexIndel", "Filtered", "FullyDecoded", "Indel", "Mixed", + "MNP", "MonomorphicInSamples", "NotFiltered", "PointEvent", "PolymorphicInSamples", + "SimpleDeletion", "SimpleInsertion", "SNP", "StructuralIndel", "Symbolic", + "SymbolicOrSV", "Variant") + + general.foreach( + x => noException should be thrownBy main(Array("-I", vcf, "-R", ref, "-o", + tmp.toAbsolutePath.toString, "--binSize", "50", "--writeBinStats", + "--generalWiggle", x)) + ) + + // returns null when validation fails + def validateArgs(array: Array[String]): Option[Args] = { + val argsParser = new OptParser + argsParser.parse(array, Args()) + } + + val stderr1 = new java.io.ByteArrayOutputStream + Console.withErr(stderr1) { + validateArgs(Array("-I", vcf, "-R", ref, "-o", + tmp.toAbsolutePath.toString, "--binSize", "50", "--writeBinStats", + "--genotypeWiggle", "NonexistentThing")) shouldBe empty + } + + val stderr2 = new java.io.ByteArrayOutputStream + Console.withErr(stderr2) { + validateArgs(Array("-I", vcf, "-R", ref, "-o", + tmp.toAbsolutePath.toString, "--binSize", "50", "--writeBinStats", + "--generalWiggle", "NonexistentThing")) shouldBe empty + } + + val stderr3 = new java.io.ByteArrayOutputStream + Console.withErr(stderr3) { + validateArgs(Array("-R", ref, "-o", + tmp.toAbsolutePath.toString)) shouldBe empty + } + } + + @Test + def testSortAnyAny = { + //stub + val one: Any = 1 + val two: Any = 2 + val text: Any = "hello" + val text2: Any = "goodbye" + + sortAnyAny(one, two) shouldBe true + sortAnyAny(two, one) shouldBe false + sortAnyAny(text, text2) shouldBe false + sortAnyAny(text2, text) shouldBe true + sortAnyAny(one, text) shouldBe true + sortAnyAny(text, one) shouldBe false + } + + @Test + def testCheckGeneral = { + val record = new VCFFileReader(new File(resourcePath("/chrQ.vcf.gz"))).iterator().next() + + val blah = checkGeneral(record, List()) + + blah.get("chrQ") should not be empty + blah.get("total") should not be empty + + val chrq = blah.get("chrQ").get + chrq.get("SampleDistribution-NonInformative") shouldEqual Some(Map(0 -> 1)) + chrq.get("SampleDistribution-Called") shouldEqual Some(Map(3 -> 1)) + chrq.get("SampleDistribution-Mixed") shouldEqual Some(Map(0 -> 1)) + chrq.get("SampleDistribution-Hom") shouldEqual Some(Map(1 -> 1)) + chrq.get("SampleDistribution-HomRef") shouldEqual Some(Map(1 -> 1)) + chrq.get("SampleDistribution-Available") shouldEqual Some(Map(3 -> 1)) + chrq.get("QUAL") shouldEqual Some(Map(1541 -> 1)) + chrq.get("SampleDistribution-HetNonRef") shouldEqual Some(Map(0 -> 1)) + chrq.get("SampleDistribution-Het") shouldEqual Some(Map(2 -> 1)) + chrq.get("SampleDistribution-NoCall") shouldEqual Some(Map(0 -> 1)) + chrq.get("SampleDistribution-Filtered") shouldEqual Some(Map(0 -> 1)) + chrq.get("SampleDistribution-HomVar") shouldEqual Some(Map(0 -> 1)) + chrq.get("SampleDistribution-Variant") shouldEqual Some(Map(2 -> 1)) + + chrq.get("general") should not be empty + val general = chrq.get("general").get + + general.get("PolymorphicInSamples") shouldEqual Some(1) + general.get("ComplexIndel") shouldEqual Some(0) + general.get("FullyDecoded") shouldEqual Some(0) + general.get("PointEvent") shouldEqual Some(0) + general.get("MNP") shouldEqual Some(0) + general.get("Indel") shouldEqual Some(1) + general.get("Biallelic") shouldEqual Some(1) + general.get("SimpleDeletion") shouldEqual Some(0) + general.get("Variant") shouldEqual Some(1) + general.get("SymbolicOrSV") shouldEqual Some(0) + general.get("MonomorphicInSamples") shouldEqual Some(0) + general.get("SNP") shouldEqual Some(0) + general.get("Filtered") shouldEqual Some(0) + general.get("StructuralIndel") shouldEqual Some(0) + general.get("Total") shouldEqual Some(1) + general.get("Mixed") shouldEqual Some(0) + general.get("NotFiltered") shouldEqual Some(1) + general.get("Symbolic") shouldEqual Some(0) + general.get("SimpleInsertion") shouldEqual Some(1) + + val total = blah.get("total").get + total.get("SampleDistribution-NonInformative") shouldEqual Some(Map(0 -> 1)) + total.get("SampleDistribution-Called") shouldEqual Some(Map(3 -> 1)) + total.get("SampleDistribution-Mixed") shouldEqual Some(Map(0 -> 1)) + total.get("SampleDistribution-Hom") shouldEqual Some(Map(1 -> 1)) + total.get("SampleDistribution-HomRef") shouldEqual Some(Map(1 -> 1)) + total.get("SampleDistribution-Available") shouldEqual Some(Map(3 -> 1)) + total.get("QUAL") shouldEqual Some(Map(1541 -> 1)) + total.get("SampleDistribution-HetNonRef") shouldEqual Some(Map(0 -> 1)) + total.get("SampleDistribution-Het") shouldEqual Some(Map(2 -> 1)) + total.get("SampleDistribution-NoCall") shouldEqual Some(Map(0 -> 1)) + total.get("SampleDistribution-Filtered") shouldEqual Some(Map(0 -> 1)) + total.get("SampleDistribution-HomVar") shouldEqual Some(Map(0 -> 1)) + total.get("SampleDistribution-Variant") shouldEqual Some(Map(2 -> 1)) + + chrq.get("general") should not be empty + val totGeneral = total.get("general").get + + totGeneral.get("PolymorphicInSamples") shouldEqual Some(1) + totGeneral.get("ComplexIndel") shouldEqual Some(0) + totGeneral.get("FullyDecoded") shouldEqual Some(0) + totGeneral.get("PointEvent") shouldEqual Some(0) + totGeneral.get("MNP") shouldEqual Some(0) + totGeneral.get("Indel") shouldEqual Some(1) + totGeneral.get("Biallelic") shouldEqual Some(1) + totGeneral.get("SimpleDeletion") shouldEqual Some(0) + totGeneral.get("Variant") shouldEqual Some(1) + totGeneral.get("SymbolicOrSV") shouldEqual Some(0) + totGeneral.get("MonomorphicInSamples") shouldEqual Some(0) + totGeneral.get("SNP") shouldEqual Some(0) + totGeneral.get("Filtered") shouldEqual Some(0) + totGeneral.get("StructuralIndel") shouldEqual Some(0) + totGeneral.get("Total") shouldEqual Some(1) + totGeneral.get("Mixed") shouldEqual Some(0) + totGeneral.get("NotFiltered") shouldEqual Some(1) + totGeneral.get("Symbolic") shouldEqual Some(0) + totGeneral.get("SimpleInsertion") shouldEqual Some(1) + } + + @Test + def testCheckGenotype = { + val record = new VCFFileReader(new File(resourcePath("/chrQ.vcf.gz"))).iterator().next() + + val genotype = record.getGenotype(0) + + val blah = checkGenotype(record, genotype, List()) + + blah.get("chrQ") should not be empty + blah.get("total") should not be empty + + val chrq = blah.get("chrQ").get + chrq.get("GQ") shouldEqual Some(Map(99 -> 1)) + chrq.get("AD") shouldEqual Some(Map(24 -> 1, 21 -> 1)) + chrq.get("AD-used") shouldEqual Some(Map(24 -> 1, 21 -> 1)) + chrq.get("DP") shouldEqual Some(Map(45 -> 1)) + chrq.get("AD-alt") shouldEqual Some(Map(21 -> 1)) + chrq.get("AD-ref") shouldEqual Some(Map(24 -> 1)) + chrq.get("general") should not be empty + + val general = chrq.get("general").get + general.get("Hom") shouldEqual Some(0) + general.get("NoCall") shouldEqual Some(0) + general.get("Variant") shouldEqual Some(1) + general.get("Filtered") shouldEqual Some(0) + general.get("NonInformative") shouldEqual Some(0) + general.get("Called") shouldEqual Some(1) + general.get("Total") shouldEqual Some(1) + general.get("HomVar") shouldEqual Some(0) + general.get("HomRef") shouldEqual Some(0) + general.get("Mixed") shouldEqual Some(0) + general.get("Available") shouldEqual Some(1) + general.get("Het") shouldEqual Some(1) + general.get("HetNonRef") shouldEqual Some(0) + + val total = blah.get("total").get + total.get("GQ") shouldEqual Some(Map(99 -> 1)) + total.get("AD") shouldEqual Some(Map(24 -> 1, 21 -> 1)) + total.get("AD-used") shouldEqual Some(Map(24 -> 1, 21 -> 1)) + total.get("DP") shouldEqual Some(Map(45 -> 1)) + total.get("AD-alt") shouldEqual Some(Map(21 -> 1)) + total.get("AD-ref") shouldEqual Some(Map(24 -> 1)) + total.get("general") should not be empty + + val totGeneral = total.get("general").get + totGeneral.get("Hom") shouldEqual Some(0) + totGeneral.get("NoCall") shouldEqual Some(0) + totGeneral.get("Variant") shouldEqual Some(1) + totGeneral.get("Filtered") shouldEqual Some(0) + totGeneral.get("NonInformative") shouldEqual Some(0) + totGeneral.get("Called") shouldEqual Some(1) + totGeneral.get("Total") shouldEqual Some(1) + totGeneral.get("HomVar") shouldEqual Some(0) + totGeneral.get("HomRef") shouldEqual Some(0) + totGeneral.get("Mixed") shouldEqual Some(0) + totGeneral.get("Available") shouldEqual Some(1) + totGeneral.get("Het") shouldEqual Some(1) + totGeneral.get("HetNonRef") shouldEqual Some(0) + } +} diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfToTsvTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfToTsvTest.scala similarity index 100% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VcfToTsvTest.scala rename to public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfToTsvTest.scala diff --git a/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfWithVcfTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfWithVcfTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..a6a70881012480a8e92b9aac1c689e4456f9f8c7 --- /dev/null +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VcfWithVcfTest.scala @@ -0,0 +1,188 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.tools + +import java.io.File +import java.nio.file.Paths +import java.util + +import htsjdk.variant.vcf.VCFFileReader +import org.scalatest.Matchers +import org.scalatest.mock.MockitoSugar +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +import scala.util.Random +import scala.collection.JavaConversions._ + +import nl.lumc.sasc.biopet.utils.VcfUtils.identicalVariantContext + +/** + * Test class for [[VcfWithVcfTest]] + * + * Created by ahbbollen on 10-4-15. + */ +class VcfWithVcfTest extends TestNGSuite with MockitoSugar with Matchers { + import VcfWithVcf._ + + private def resourcePath(p: String): String = { + Paths.get(getClass.getResource(p).toURI).toString + } + + val veppedPath = resourcePath("/VEP_oneline.vcf.gz") + val unveppedPath = resourcePath("/unvep_online.vcf.gz") + val rand = new Random() + + @Test def testOutputTypeVcf() = { + val tmpFile = File.createTempFile("VcfWithVcf_", ".vcf") + tmpFile.deleteOnExit() + val arguments = Array("-I", unveppedPath, "-s", veppedPath, "-o", tmpFile.getAbsolutePath, "-f", "CSQ") + main(arguments) + } + + @Test def testOutputTypeVcfGz() = { + val tmpFile = File.createTempFile("VcfWithVcf_", ".vcf.gz") + tmpFile.deleteOnExit() + val arguments = Array("-I", unveppedPath, "-s", veppedPath, "-o", tmpFile.getAbsolutePath, "-f", "CSQ") + main(arguments) + } + + @Test def testOutputTypeBcf() = { + val tmpFile = File.createTempFile("VcfWithVcf_", ".bcf") + tmpFile.deleteOnExit() + val arguments = Array("-I", unveppedPath, "-s", veppedPath, "-o", tmpFile.getAbsolutePath, "-f", "CSQ") + main(arguments) + } + + @Test def testOutputFieldException = { + val tmpFile = File.createTempFile("VCFWithVCf", ".vcf") + tmpFile.deleteOnExit() + val args = Array("-I", unveppedPath, "-s", veppedPath, "-o", tmpFile.getAbsolutePath, "-f", "CSQ:AC") + an[IllegalArgumentException] should be thrownBy main(args) + val thrown = the[IllegalArgumentException] thrownBy main(args) + thrown.getMessage should equal("Field 'AC' already exists in input vcf") + } + + @Test def testInputFieldException = { + val tmpFile = File.createTempFile("VCFWithVCf", ".vcf") + tmpFile.deleteOnExit() + val args = Array("-I", unveppedPath, "-s", unveppedPath, "-o", tmpFile.getAbsolutePath, "-f", "CSQ:NEW_CSQ") + an[IllegalArgumentException] should be thrownBy main(args) + val thrown = the[IllegalArgumentException] thrownBy main(args) + thrown.getMessage should equal("Field 'CSQ' does not exist in secondary vcf") + } + + @Test def testMinMethodException = { + val tmpFile = File.createTempFile("VcfWithVcf_", ".vcf") + tmpFile.deleteOnExit() + val args = Array("-I", unveppedPath, "-s", veppedPath, "-o", tmpFile.getAbsolutePath, "-f", "CSQ:CSQ:min") + an[IllegalArgumentException] should be thrownBy main(args) + val thrown = the[IllegalArgumentException] thrownBy main(args) + thrown.getMessage should equal("Type of field CSQ is not numeric") + } + + @Test def testMaxMethodException = { + val tmpFile = File.createTempFile("VcfWithVcf_", ".vcf") + tmpFile.deleteOnExit() + val args = Array("-I", unveppedPath, "-s", veppedPath, "-o", tmpFile.getAbsolutePath, "-f", "CSQ:CSQ:max") + an[IllegalArgumentException] should be thrownBy main(args) + val thrown = the[IllegalArgumentException] thrownBy main(args) + thrown.getMessage should equal("Type of field CSQ is not numeric") + } + + @Test + def testFieldMap = { + val unvep_record = new VCFFileReader(new File(unveppedPath)).iterator().next() + + var fields = List(new Fields("FG", "FG")) + fields :::= List(new Fields("FD", "FD")) + fields :::= List(new Fields("GM", "GM")) + fields :::= List(new Fields("GL", "GL")) + fields :::= List(new Fields("CP", "CP")) + fields :::= List(new Fields("CG", "CG")) + fields :::= List(new Fields("CN", "CN")) + fields :::= List(new Fields("DSP", "DSP")) + fields :::= List(new Fields("AC", "AC")) + fields :::= List(new Fields("AF", "AF")) + fields :::= List(new Fields("AN", "AN")) + fields :::= List(new Fields("BaseQRankSum", "BaseQRankSum")) + fields :::= List(new Fields("DP", "DP")) + fields :::= List(new Fields("FS", "FS")) + fields :::= List(new Fields("MLEAC", "MLEAC")) + fields :::= List(new Fields("MLEAF", "MLEAF")) + fields :::= List(new Fields("MQ", "MQ")) + fields :::= List(new Fields("MQ0", "MQ0")) + fields :::= List(new Fields("MQRankSum", "MQRankSum")) + fields :::= List(new Fields("QD", "QD")) + fields :::= List(new Fields("RPA", "RPA")) + fields :::= List(new Fields("RU", "RU")) + fields :::= List(new Fields("ReadPosRankSum", "ReadPosRankSum")) + fields :::= List(new Fields("VQSLOD", "VQSLOD")) + fields :::= List(new Fields("culprit", "culprit")) + + val fieldMap = createFieldMap(fields, List(unvep_record)) + + fieldMap("FG") shouldBe List("intron") + fieldMap("FD") shouldBe List("unknown") + fieldMap("GM") shouldBe List("NM_152486.2") + fieldMap("GL") shouldBe List("SAMD11") + fieldMap("CP") shouldBe List("0.000") + fieldMap("CG") shouldBe List("-1.630") + fieldMap("CN") shouldBe List("2294", "3274", "30362", "112930") + fieldMap("DSP") shouldBe List("107") + fieldMap("AC") shouldBe List("2") + fieldMap("AF") shouldBe List("0.333") + fieldMap("AN") shouldBe List("6") + fieldMap("DP") shouldBe List("124") + fieldMap("FS") shouldBe List("1.322") + fieldMap("MLEAC") shouldBe List("2") + fieldMap("MLEAF") shouldBe List("0.333") + fieldMap("MQ") shouldBe List("60.0") + fieldMap("MQ0") shouldBe List("0") + fieldMap("MQRankSum") shouldBe List("-0.197") + fieldMap("QD") shouldBe List("19.03") + fieldMap("RPA") shouldBe List("1", "2") + fieldMap("RU") shouldBe List("A") + fieldMap("ReadPosRankSum") shouldBe List("-0.424") + fieldMap("VQSLOD") shouldBe List("0.079") + fieldMap("culprit") shouldBe List("FS") + + } + + @Test def testGetSecondaryRecords = { + val unvep_record = new VCFFileReader(new File(unveppedPath)).iterator().next() + val vep_reader = new VCFFileReader(new File(veppedPath)) + val vep_record = vep_reader.iterator().next() + + val secRec = getSecondaryRecords(vep_reader, unvep_record, false) + + secRec.foreach(x => identicalVariantContext(x, vep_record) shouldBe true) + } + + @Test def testCreateRecord = { + val unvep_record = new VCFFileReader(new File(unveppedPath)).iterator().next() + val vep_reader = new VCFFileReader(new File(veppedPath)) + val header = vep_reader.getFileHeader + val vep_record = vep_reader.iterator().next() + + val secRec = getSecondaryRecords(vep_reader, unvep_record, false) + + val fieldMap = createFieldMap(List(new Fields("CSQ", "CSQ")), secRec) + val created_record = createRecord(fieldMap, unvep_record, List(new Fields("CSQ", "CSQ")), header) + identicalVariantContext(created_record, vep_record) shouldBe true + } + +} diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VepNormalizerTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VepNormalizerTest.scala similarity index 84% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VepNormalizerTest.scala rename to public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VepNormalizerTest.scala index b2a063f9853c07ceb23f623b174e5d7980dc275f..53aeddfaf1d6ba87f9e89ac29b31edff8fc5e01b 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/VepNormalizerTest.scala +++ b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/VepNormalizerTest.scala @@ -47,38 +47,44 @@ class VepNormalizerTest extends TestNGSuite with MockitoSugar with Matchers { val rand = new Random() @Test def testGzOutputExplode(): Unit = { - val tmp_path = "/tmp/VepNorm_" + rand.nextString(10) + ".vcf.gz" - val arguments: Array[String] = Array("-I", vepped_path, "-O", tmp_path, "-m", "explode") + val tmpFile = File.createTempFile("VepNormalizer_", ".vcf.gz") + tmpFile.deleteOnExit() + val arguments: Array[String] = Array("-I", vepped_path, "-O", tmpFile.getAbsolutePath, "-m", "explode") main(arguments) } @Test def testVcfOutputExplode(): Unit = { - val tmp_path = "/tmp/VepNorm_" + rand.nextString(10) + ".vcf" - val arguments: Array[String] = Array("-I", vepped_path, "-O", tmp_path, "-m", "explode") + val tmpFile = File.createTempFile("VepNormalizer_", ".vcf") + tmpFile.deleteOnExit() + val arguments: Array[String] = Array("-I", vepped_path, "-O", tmpFile.getAbsolutePath, "-m", "explode") main(arguments) } @Test def testBcfOutputExplode(): Unit = { - val tmp_path = "/tmp/VepNorm_" + rand.nextString(10) + ".bcf" - val arguments: Array[String] = Array("-I", vepped_path, "-O", tmp_path, "-m", "explode") + val tmpFile = File.createTempFile("VepNormalizer_", ".bcf") + tmpFile.deleteOnExit() + val arguments: Array[String] = Array("-I", vepped_path, "-O", tmpFile.getAbsolutePath, "-m", "explode") main(arguments) } @Test def testGzOutputStandard(): Unit = { - val tmp_path = "/tmp/VepNorm_" + rand.nextString(10) + ".vcf.gz" - val arguments: Array[String] = Array("-I", vepped_path, "-O", tmp_path, "-m", "standard") + val tmpFile = File.createTempFile("VepNormalizer_", ".vcf.gz") + tmpFile.deleteOnExit() + val arguments: Array[String] = Array("-I", vepped_path, "-O", tmpFile.getAbsolutePath, "-m", "standard") main(arguments) } @Test def testVcfOutputStandard(): Unit = { - val tmp_path = "/tmp/VepNorm_" + rand.nextString(10) + ".vcf" - val arguments: Array[String] = Array("-I", vepped_path, "-O", tmp_path, "-m", "standard") + val tmpFile = File.createTempFile("VepNormalizer_", ".vcf") + tmpFile.deleteOnExit() + val arguments: Array[String] = Array("-I", vepped_path, "-O", tmpFile.getAbsolutePath, "-m", "standard") main(arguments) } @Test def testBcfOutputStandard(): Unit = { - val tmp_path = "/tmp/VepNorm_" + rand.nextString(10) + ".bcf" - val arguments: Array[String] = Array("-I", vepped_path, "-O", tmp_path, "-m", "standard") + val tmpFile = File.createTempFile("VepNormalizer_", ".bcf") + tmpFile.deleteOnExit() + val arguments: Array[String] = Array("-I", vepped_path, "-O", tmpFile.getAbsolutePath, "-m", "standard") main(arguments) } diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/WipeReadsTest.scala b/public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/WipeReadsTest.scala similarity index 100% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/tools/WipeReadsTest.scala rename to public/biopet-tools/src/test/scala/nl/lumc/sasc/biopet/tools/WipeReadsTest.scala diff --git a/public/biopet-utils/pom.xml b/public/biopet-utils/pom.xml new file mode 100644 index 0000000000000000000000000000000000000000..bb9f2b2611c31ceed8a3e0f6505d4dca2b1198a0 --- /dev/null +++ b/public/biopet-utils/pom.xml @@ -0,0 +1,71 @@ +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <parent> + <artifactId>Biopet</artifactId> + <groupId>nl.lumc.sasc</groupId> + <version>0.5.0-SNAPSHOT</version> + <relativePath>../</relativePath> + </parent> + <modelVersion>4.0.0</modelVersion> + + <artifactId>BiopetUtils</artifactId> + <packaging>jar</packaging> + + <dependencies> + <dependency> + <groupId>org.testng</groupId> + <artifactId>testng</artifactId> + <version>6.8</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.mockito</groupId> + <artifactId>mockito-all</artifactId> + <version>1.9.5</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.scalatest</groupId> + <artifactId>scalatest_2.10</artifactId> + <version>2.2.1</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>log4j</groupId> + <artifactId>log4j</artifactId> + <version>1.2.17</version> + </dependency> + <dependency> + <groupId>commons-io</groupId> + <artifactId>commons-io</artifactId> + <version>2.1</version> + </dependency> + <dependency> + <groupId>com.github.samtools</groupId> + <artifactId>htsjdk</artifactId> + <version>1.132</version> + </dependency> + <dependency> + <groupId>org.scala-lang</groupId> + <artifactId>scala-library</artifactId> + <version>2.10.4</version> + </dependency> + <dependency> + <groupId>org.yaml</groupId> + <artifactId>snakeyaml</artifactId> + <version>1.15</version> + </dependency> + <dependency> + <groupId>io.argonaut</groupId> + <artifactId>argonaut_2.10</artifactId> + <version>6.1-M4</version> + </dependency> + <dependency> + <groupId>com.github.scopt</groupId> + <artifactId>scopt_2.10</artifactId> + <version>3.3.0</version> + </dependency> + </dependencies> +</project> \ No newline at end of file diff --git a/public/biopet-utils/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotScatter.R b/public/biopet-utils/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotScatter.R new file mode 100644 index 0000000000000000000000000000000000000000..a1959a262cf868d9949b0320f57c9d54b7c50860 --- /dev/null +++ b/public/biopet-utils/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotScatter.R @@ -0,0 +1,40 @@ +library(reshape2) +library(ggplot2) +library(argparse) + +parser <- ArgumentParser(description='Process some integers') +parser$add_argument('--input', dest='input', type='character', help='Input tsv file', required=TRUE) +parser$add_argument('--output', dest='output', type='character', help='Output png file', required=TRUE) +parser$add_argument('--width', dest='width', type='integer', default = 500) +parser$add_argument('--height', dest='height', type='integer', default = 500) +parser$add_argument('--xlabel', dest='xlabel', type='character') +parser$add_argument('--ylabel', dest='ylabel', type='character', required=TRUE) +parser$add_argument('--llabel', dest='llabel', type='character') +parser$add_argument('--title', dest='title', type='character') +parser$add_argument('--removeZero', dest='removeZero', type='character', default="false") + +arguments <- parser$parse_args() + +png(filename = arguments$output, width = arguments$width, height = arguments$height) + +DF <- read.table(arguments$input, header=TRUE) + +if (is.null(arguments$xlabel)) xlab <- colnames(DF)[1] else xlab <- arguments$xlabel + +colnames(DF)[1] <- "Rank" + +DF1 <- melt(DF, id.var="Rank") + +if (arguments$removeZero == "true") DF1 <- DF1[DF1$value > 0, ] +if (arguments$removeZero == "true") print("Removed 0 values") + +ggplot(DF1, aes(x = Rank, y = value, group = variable, color = variable)) + + xlab(xlab) + + ylab(arguments$ylabel) + + guides(fill=guide_legend(title=arguments$llabel)) + + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) + + ggtitle(arguments$title) + + theme_bw() + + geom_point() + +dev.off() diff --git a/public/biopet-utils/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotXY.R b/public/biopet-utils/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotXY.R new file mode 100644 index 0000000000000000000000000000000000000000..eee20c030b280c9246ce77428d3415173ea1b796 --- /dev/null +++ b/public/biopet-utils/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/plotXY.R @@ -0,0 +1,40 @@ +library(reshape2) +library(ggplot2) +library(argparse) + +parser <- ArgumentParser(description='Process some integers') +parser$add_argument('--input', dest='input', type='character', help='Input tsv file', required=TRUE) +parser$add_argument('--output', dest='output', type='character', help='Output png file', required=TRUE) +parser$add_argument('--width', dest='width', type='integer', default = 500) +parser$add_argument('--height', dest='height', type='integer', default = 500) +parser$add_argument('--xlabel', dest='xlabel', type='character') +parser$add_argument('--ylabel', dest='ylabel', type='character', required=TRUE) +parser$add_argument('--llabel', dest='llabel', type='character') +parser$add_argument('--title', dest='title', type='character') +parser$add_argument('--removeZero', dest='removeZero', type='character', default="false") + +arguments <- parser$parse_args() + +png(filename = arguments$output, width = arguments$width, height = arguments$height) + +DF <- read.table(arguments$input, header=TRUE) + +if (is.null(arguments$xlabel)) xlab <- colnames(DF)[1] else xlab <- arguments$xlabel + +colnames(DF)[1] <- "Rank" + +DF1 <- melt(DF, id.var="Rank") + +if (arguments$removeZero == "true") DF1 <- DF1[DF1$value > 0, ] +if (arguments$removeZero == "true") print("Removed 0 values") + +ggplot(DF1, aes(x = Rank, y = value, group = variable, color = variable)) + + xlab(xlab) + + ylab(arguments$ylabel) + + guides(fill=guide_legend(title=arguments$llabel)) + + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) + + ggtitle(arguments$title) + + theme_bw() + + geom_line() + +dev.off() diff --git a/public/biopet-utils/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/stackedBar.R b/public/biopet-utils/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/stackedBar.R new file mode 100644 index 0000000000000000000000000000000000000000..0ae7f942cc69d1047ab2a642342381d5e8f2eade --- /dev/null +++ b/public/biopet-utils/src/main/resources/nl/lumc/sasc/biopet/utils/rscript/stackedBar.R @@ -0,0 +1,35 @@ +library(reshape2) +library(ggplot2) +library(argparse) + +parser <- ArgumentParser(description='Process some integers') +parser$add_argument('--input', dest='input', type='character', help='Input tsv file', required=TRUE) +parser$add_argument('--output', dest='output', type='character', help='Output png file', required=TRUE) +parser$add_argument('--width', dest='width', type='integer', default = 500) +parser$add_argument('--height', dest='height', type='integer', default = 500) +parser$add_argument('--xlabel', dest='xlabel', type='character') +parser$add_argument('--ylabel', dest='ylabel', type='character', required=TRUE) +parser$add_argument('--llabel', dest='llabel', type='character') +parser$add_argument('--title', dest='title', type='character') + +arguments <- parser$parse_args() + +png(filename = arguments$output, width = arguments$width, height = arguments$height) + +DF <- read.table(arguments$input, header=TRUE) + +if (is.null(arguments$xlabel)) xlab <- colnames(DF)[1] else xlab <- arguments$xlabel + +colnames(DF)[1] <- "Rank" + +DF1 <- melt(DF, id.var="Rank") + +ggplot(DF1, aes(x = Rank, y = value, fill = variable)) + + xlab(xlab) + + ylab(arguments$ylabel) + + guides(fill=guide_legend(title=arguments$llabel)) + + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) + + geom_bar(stat = "identity", width=1) + + ggtitle(arguments$title) + +dev.off() diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/package.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/package.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/package.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/package.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetExecutable.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/BiopetExecutable.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetExecutable.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/BiopetExecutable.scala index 749ac33a2176ac7b2d6e78e0ca2bb33c85c61877..43522e36dbe93abe174483953caa343b67ca572b 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/BiopetExecutable.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/BiopetExecutable.scala @@ -13,11 +13,10 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core +package nl.lumc.sasc.biopet.utils import java.io.{ PrintWriter, StringWriter } -import nl.lumc.sasc.biopet.core.BiopetExecutable._ import nl.lumc.sasc.biopet.{ FullVersion, LastCommitHash } import org.apache.log4j.Logger @@ -88,7 +87,7 @@ trait BiopetExecutable extends Logging { case Array("version") => println("version: " + FullVersion) case Array("license") => - println(getLicense) + println(BiopetExecutable.getLicense) case Array(module, name, passArgs @ _*) => try { getCommand(module, name).main(passArgs.toArray) diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/utils/ConfigUtils.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/ConfigUtils.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/utils/ConfigUtils.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/ConfigUtils.scala index 6c1795f48acaecac0dabb01c75238bc1f8ffca13..6a1a0889b68811ae40c3028f706eb3ceda9918fc 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/utils/ConfigUtils.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/ConfigUtils.scala @@ -19,8 +19,7 @@ import java.io.File import argonaut.Argonaut._ import argonaut._ -import nl.lumc.sasc.biopet.core.{ BiopetQScript, Logging } -import nl.lumc.sasc.biopet.core.config.ConfigValue +import nl.lumc.sasc.biopet.utils.config.ConfigValue import org.yaml.snakeyaml.Yaml import scala.collection.JavaConversions._ @@ -316,7 +315,7 @@ object ConfigUtils extends Logging { private def requiredValue(value: ConfigValue): Boolean = { val exist = valueExists(value) if (!exist) - BiopetQScript.addError("Value does not exist but is required, key: " + value.requestIndex.key + + Logging.addError("Value does not exist but is required, key: " + value.requestIndex.key + " module: " + value.requestIndex.module, if (value.requestIndex.path != Nil) " path: " + value.requestIndex.path.mkString("->") else null) exist diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/utils/IoUtils.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/IoUtils.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/utils/IoUtils.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/IoUtils.scala diff --git a/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/Logging.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/Logging.scala new file mode 100644 index 0000000000000000000000000000000000000000..93a43e273f1a8aa0be87fc9fe3eeb44e9c0b9067 --- /dev/null +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/Logging.scala @@ -0,0 +1,61 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.utils + +import org.apache.log4j.Logger + +import scala.collection.mutable.ListBuffer + +/** + * Trait to implement logger function on local class/object + */ +trait Logging { + /** + * + * @return Global biopet logger + */ + def logger = Logging.logger +} + +/** + * Logger object, has a global logger + */ +object Logging { + val logger = Logger.getRootLogger + + private val errors: ListBuffer[Exception] = ListBuffer() + + def addError(error: String, debug: String = null): Unit = { + val msg = error + (if (debug != null && logger.isDebugEnabled) "; " + debug else "") + errors.append(new Exception(msg)) + } + + def checkErrors(): Unit = { + if (errors.nonEmpty) { + logger.error("*************************") + logger.error("Biopet found some errors:") + if (logger.isDebugEnabled) { + for (e <- errors) { + logger.error(e.getMessage) + logger.debug(e.getStackTrace.mkString("Stack trace:\n", "\n", "\n")) + } + } else { + errors.map(_.getMessage).sorted.distinct.foreach(logger.error(_)) + } + throw new IllegalStateException("Biopet found errors") + } + } +} \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/MainCommand.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/MainCommand.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/MainCommand.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/MainCommand.scala index cdd226c89f6f2363a452e98116d60b691c1033c1..9a8550d9be1293c1dd8afa02bfe75c7b7d9793e2 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/MainCommand.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/MainCommand.scala @@ -13,7 +13,7 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core +package nl.lumc.sasc.biopet.utils /** * This trait is used in the biopet executable diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/ToolCommand.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/ToolCommand.scala similarity index 92% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/ToolCommand.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/ToolCommand.scala index 68451bc26139618b987552ba05766aa512497ec0..ce2f099b25077e0caecd3edc4e674ced64349069 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/ToolCommand.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/ToolCommand.scala @@ -13,7 +13,7 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core +package nl.lumc.sasc.biopet.utils import nl.lumc.sasc.biopet.FullVersion @@ -54,6 +54,3 @@ trait ToolCommand extends MainCommand with Logging { protected type OptParser <: AbstractOptParser } -trait ToolCommandFuntion extends BiopetJavaCommandLineFunction { - override def getVersion = Some("Biopet " + FullVersion) -} \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/utils/VcfUtils.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/VcfUtils.scala similarity index 54% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/utils/VcfUtils.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/VcfUtils.scala index 9074ff1f3bcc71fda5ad5c8aa9ba5ae5867fc6c8..8e375f4e7e35cbb49c9cc90c688753b2b6ca42ea 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/utils/VcfUtils.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/VcfUtils.scala @@ -15,6 +15,8 @@ */ package nl.lumc.sasc.biopet.utils +import java.util + import htsjdk.variant.variantcontext.VariantContext import scala.collection.JavaConversions._ @@ -43,4 +45,39 @@ object VcfUtils { def fillAllele(bases: String, newSize: Int, fillWith: Char = '-'): String = { bases + Array.fill[Char](newSize - bases.length)(fillWith).mkString } + + /** + * Stands for scalaListToJavaObjectArrayList + * Convert a scala List[Any] to a java ArrayList[Object]. This is necessary for BCF conversions + * As scala ints and floats cannot be directly cast to java objects (they aren't objects), + * we need to box them. + * For items not Int, Float or Object, we assume them to be strings (TODO: sane assumption?) + * @param array scala List[Any] + * @return converted java ArrayList[Object] + */ + def scalaListToJavaObjectArrayList(array: List[Any]): util.ArrayList[Object] = { + val out = new util.ArrayList[Object]() + + array.foreach { + case x: Long => out.add(Long.box(x)) + case x: Int => out.add(Int.box(x)) + case x: Char => out.add(Char.box(x)) + case x: Byte => out.add(Byte.box(x)) + case x: Double => out.add(Double.box(x)) + case x: Float => out.add(Float.box(x)) + case x: Boolean => out.add(Boolean.box(x)) + case x: String => out.add(x) + case x: Object => out.add(x) + case x => out.add(x.toString) + } + out + } + + //TODO: Add genotype comparing to this function + def identicalVariantContext(var1: VariantContext, var2: VariantContext): Boolean = { + var1.getContig == var2.getContig && + var1.getStart == var2.getStart && + var1.getEnd == var2.getEnd && + var1.getAttributes == var2.getAttributes + } } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/Config.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/Config.scala similarity index 71% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/Config.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/Config.scala index 10099c57983b7f0f5af4cf1070b1ed7d6753ac69..4c69bf57c8be5332aeb3f1cc3d1f78b2ae204007 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/Config.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/Config.scala @@ -13,21 +13,19 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core.config +package nl.lumc.sasc.biopet.utils.config import java.io.{ File, PrintWriter } - -import nl.lumc.sasc.biopet.core.Logging -import nl.lumc.sasc.biopet.utils.ConfigUtils +import nl.lumc.sasc.biopet.utils.{ Logging, ConfigUtils } import nl.lumc.sasc.biopet.utils.ConfigUtils._ /** * This class can store nested config values - * @param map Map with value for new config + * @param _map Map with value for new config * @constructor Load config with existing map */ -class Config(var map: Map[String, Any], - protected[core] var defaults: Map[String, Any] = Map()) extends Logging { +class Config(protected var _map: Map[String, Any], + protected var _defaults: Map[String, Any] = Map()) extends Logging { logger.debug("Init phase of config") /** Default constructor */ @@ -36,6 +34,9 @@ class Config(var map: Map[String, Any], loadDefaultConfig() } + def map = _map + def defaults = _defaults + /** * Loading a environmental variable as location of config files to merge into the config * @param valueName Name of value @@ -67,13 +68,13 @@ class Config(var map: Map[String, Any], def loadConfigFile(configFile: File, default: Boolean = false) { val configMap = fileToConfigMap(configFile) if (default) { - if (defaults.isEmpty) defaults = configMap - else defaults = mergeMaps(configMap, defaults) - logger.debug("New defaults: " + defaults) + if (_defaults.isEmpty) _defaults = configMap + else _defaults = mergeMaps(configMap, _defaults) + logger.debug("New defaults: " + _defaults) } else { - if (map.isEmpty) map = configMap - else map = mergeMaps(configMap, map) - logger.debug("New config: " + map) + if (_map.isEmpty) _map = configMap + else _map = mergeMaps(configMap, _map) + logger.debug("New config: " + _map) } } @@ -86,11 +87,12 @@ class Config(var map: Map[String, Any], */ def addValue(key: String, value: Any, path: List[String] = Nil, default: Boolean = false): Unit = { val valueMap = path.foldRight(Map(key -> value))((a, b) => Map(a -> b)) - if (default) defaults = mergeMaps(valueMap, defaults) - else map = mergeMaps(valueMap, map) + if (default) _defaults = mergeMaps(valueMap, _defaults) + else _map = mergeMaps(valueMap, _map) } protected[config] var notFoundCache: List[ConfigValueIndex] = List() + protected[config] var fixedCache: Map[ConfigValueIndex, ConfigValue] = Map() protected[config] var foundCache: Map[ConfigValueIndex, ConfigValue] = Map() protected[config] var defaultCache: Map[ConfigValueIndex, ConfigValue] = Map() protected[config] def clearCache(): Unit = { @@ -105,24 +107,39 @@ class Config(var map: Map[String, Any], * @param s key * @return True if exist */ - def contains(s: String): Boolean = map.contains(s) + def contains(s: String): Boolean = _map.contains(s) /** * Checks if value exist in config * @param requestedIndex Index to value * @return True if exist */ - def contains(requestedIndex: ConfigValueIndex): Boolean = + def contains(requestedIndex: ConfigValueIndex): Boolean = contains(requestedIndex, Map()) + + /** + * Checks if value exist in config + * @param requestedIndex Index to value + * @param fixedValues Fixed values + * @return True if exist + */ + def contains(requestedIndex: ConfigValueIndex, fixedValues: Map[String, Any]): Boolean = if (notFoundCache.contains(requestedIndex)) false + else if (fixedCache.contains(requestedIndex)) true else if (foundCache.contains(requestedIndex)) true else { - val value = Config.getValueFromMap(map, requestedIndex) - if (value.isDefined && value.get.value != None) { - foundCache += (requestedIndex -> value.get) + val fixedValue = Config.getValueFromMap(fixedValues, requestedIndex) + if (fixedValue.isDefined) { + fixedCache += (requestedIndex -> fixedValue.get) true } else { - notFoundCache +:= requestedIndex - false + val value = Config.getValueFromMap(_map, requestedIndex) + if (value.isDefined && value.get.value != None) { + foundCache += (requestedIndex -> value.get) + true + } else { + notFoundCache +:= requestedIndex + false + } } } @@ -134,9 +151,12 @@ class Config(var map: Map[String, Any], * @param freeVar Default true, if set false value must exist in module * @return True if exist */ - def contains(module: String, path: List[String], key: String, freeVar: Boolean = true): Boolean = { + def contains(module: String, path: List[String], + key: String, + freeVar: Boolean = true, + fixedValues: Map[String, Any] = Map()): Boolean = { val requestedIndex = ConfigValueIndex(module, path, key, freeVar) - contains(requestedIndex) + contains(requestedIndex, fixedValues) } /** @@ -148,10 +168,23 @@ class Config(var map: Map[String, Any], * @param freeVar Default true, if set false value must exist in module * @return Config value */ - protected[config] def apply(module: String, path: List[String], key: String, default: Any = null, freeVar: Boolean = true): ConfigValue = { + protected[config] def apply(module: String, + path: List[String], + key: String, + default: Any = null, + freeVar: Boolean = true, + fixedValues: Map[String, Any] = Map()): ConfigValue = { val requestedIndex = ConfigValueIndex(module, path, key, freeVar) - if (contains(requestedIndex)) foundCache(requestedIndex) - else if (default != null) { + if (contains(requestedIndex, fixedValues)) { + val fixedValue = fixedCache.get(requestedIndex) + if (fixedValue.isDefined) { + val userValue = Config.getValueFromMap(_map, requestedIndex) + if (userValue.isDefined) + logger.warn(s"Ignoring user-supplied value ${requestedIndex.key} at path ${requestedIndex.path} because it is a fixed value.") + } + + fixedValue.getOrElse(foundCache(requestedIndex)) + } else if (default != null) { defaultCache += (requestedIndex -> ConfigValue(requestedIndex, null, default, freeVar)) defaultCache(requestedIndex) } else ConfigValue(requestedIndex, null, null, freeVar) @@ -181,9 +214,11 @@ class Config(var map: Map[String, Any], // Positions where values are found val found = convertIndexValuesToMap(foundCache.filter(!_._2.default).toList.map(x => (x._2.foundIndex, x._2.value))) + val fixed = convertIndexValuesToMap(fixedCache.filter(!_._2.default).toList.map(x => (x._2.foundIndex, x._2.value))) // Positions where to start searching val effectiveFound = convertIndexValuesToMap(foundCache.filter(!_._2.default).toList.map(x => (x._2.requestIndex, x._2.value)), Some(false)) + val effectiveFixed = convertIndexValuesToMap(fixedCache.filter(!_._2.default).toList.map(x => (x._2.requestIndex, x._2.value)), Some(false)) val effectiveDefaultFound = convertIndexValuesToMap(defaultCache.filter(_._2.default).toList.map(x => (x._2.requestIndex, x._2.value)), Some(false)) val notFound = convertIndexValuesToMap(notFoundCache.map((_, None)), Some(false)) @@ -191,16 +226,19 @@ class Config(var map: Map[String, Any], val fullEffective = ConfigUtils.mergeMaps(effectiveFound, effectiveDefaultFound) val fullEffectiveWithNotFound = ConfigUtils.mergeMaps(fullEffective, notFound) - writeMapToJsonFile(this.map, "input") + writeMapToJsonFile(_map, "input") + writeMapToJsonFile(_defaults, "defaults") writeMapToJsonFile(found, "found") + writeMapToJsonFile(fixed, "fixed") writeMapToJsonFile(effectiveFound, "effective.found") + writeMapToJsonFile(effectiveFixed, "effective.fixed") writeMapToJsonFile(effectiveDefaultFound, "effective.defaults") writeMapToJsonFile(notFound, "not.found") writeMapToJsonFile(fullEffective, "effective.full") writeMapToJsonFile(fullEffectiveWithNotFound, "effective.full.notfound") } - override def toString: String = map.toString() + override def toString: String = _map.toString() } object Config extends Logging { @@ -212,7 +250,7 @@ object Config extends Logging { * @param config2 Low prio map * @return Merged config */ - def mergeConfigs(config1: Config, config2: Config): Config = new Config(mergeMaps(config1.map, config2.map)) + def mergeConfigs(config1: Config, config2: Config): Config = new Config(mergeMaps(config1._map, config2._map)) /** * Search for value in index position in a map diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/ConfigValue.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/ConfigValue.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/ConfigValue.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/ConfigValue.scala index 1dc4b4702f894f426c56a6d0e7439d7140f1dc25..a4eea343a86dd8a097522825ecd6a8e9d7ecc852 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/ConfigValue.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/ConfigValue.scala @@ -13,7 +13,7 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core.config +package nl.lumc.sasc.biopet.utils.config import java.io.File diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/ConfigValueIndex.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/ConfigValueIndex.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/ConfigValueIndex.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/ConfigValueIndex.scala index b310e8b552d2172d9144923efec6fc37c17fe2d7..9bb4340345d0b4aa566d1d35158985d6e76214cf 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/ConfigValueIndex.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/ConfigValueIndex.scala @@ -13,7 +13,7 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core.config +package nl.lumc.sasc.biopet.utils.config /** * General case class used as index config values. This stores the path to the value, the module, name of the value and if freeVar is allowed diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/Configurable.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/Configurable.scala similarity index 74% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/Configurable.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/Configurable.scala index 6b9c1f922cf398be1c05f2127e978fa315a65aa9..68fe36e303e6a49e96a7628a741fc120392a5d19 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/config/Configurable.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/config/Configurable.scala @@ -13,8 +13,9 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core.config +package nl.lumc.sasc.biopet.utils.config +import nl.lumc.sasc.biopet.utils.ConfigUtils import nl.lumc.sasc.biopet.utils.ConfigUtils.ImplicitConversions trait Configurable extends ImplicitConversions { @@ -29,15 +30,34 @@ trait Configurable extends ImplicitConversions { def configPath: List[String] = if (root != null) root.configFullPath else Nil /** Gets name of module for config */ - protected[core] def configName = getClass.getSimpleName.toLowerCase + def configName = getClass.getSimpleName.toLowerCase /** ull path with module in there */ - protected[core] def configFullPath: List[String] = configPath ::: configName :: Nil + def configFullPath: List[String] = configPath ::: configName :: Nil /** Map to store defaults for config */ - def defaults: Map[String, Any] = { - if (root != null) root.defaults - else globalConfig.defaults + def defaults: Map[String, Any] = Map() + + /** This method merge defaults from the root to it's own */ + protected[config] def internalDefaults: Map[String, Any] = { + (root != null, defaults.isEmpty) match { + case (true, true) => root.internalDefaults + case (true, false) => ConfigUtils.mergeMaps(defaults, root.internalDefaults) + case (false, true) => globalConfig.defaults + case (false, false) => ConfigUtils.mergeMaps(defaults, globalConfig.defaults) + } + } + + /** All values found in this map will be skipped from the user config */ + def fixedValues: Map[String, Any] = Map() + + /** This method merge fixedValues from the root to it's own */ + protected def internalFixedValues: Map[String, Any] = { + (root != null, fixedValues.isEmpty) match { + case (true, true) => root.internalFixedValues + case (true, false) => ConfigUtils.mergeMaps(fixedValues, root.internalFixedValues) + case _ => fixedValues + } } val config = new ConfigFunctions @@ -90,11 +110,11 @@ trait Configurable extends ImplicitConversions { val m = if (submodule != null) submodule else configName val p = if (path == null) getConfigPath(s, l, submodule) ::: subPath else path val d = { - val value = Config.getValueFromMap(defaults, ConfigValueIndex(m, p, key, freeVar)) + val value = Config.getValueFromMap(internalDefaults, ConfigValueIndex(m, p, key, freeVar)) if (value.isDefined) value.get.value else default } - if (d == null) globalConfig(m, p, key, freeVar = freeVar) - else globalConfig(m, p, key, d, freeVar) + if (d == null) globalConfig(m, p, key, freeVar = freeVar, fixedValues = internalFixedValues) + else globalConfig(m, p, key, d, freeVar, fixedValues = internalFixedValues) } /** @@ -117,7 +137,7 @@ trait Configurable extends ImplicitConversions { val m = if (submodule != null) submodule else configName val p = if (path == null) getConfigPath(s, l, submodule) ::: subPath else path - globalConfig.contains(m, p, key, freeVar) || Config.getValueFromMap(defaults, ConfigValueIndex(m, p, key, freeVar)).isDefined + globalConfig.contains(m, p, key, freeVar, internalFixedValues) || Config.getValueFromMap(internalDefaults, ConfigValueIndex(m, p, key, freeVar)).isDefined } } } diff --git a/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecord.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecord.scala new file mode 100644 index 0000000000000000000000000000000000000000..5b6b1931ab7ea604f8812d88c869893f17929d65 --- /dev/null +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecord.scala @@ -0,0 +1,148 @@ +package nl.lumc.sasc.biopet.utils.intervals + +import htsjdk.samtools.util.Interval + +import scala.collection.mutable.ListBuffer + +/** + * Created by pjvanthof on 20/08/15. + */ +case class BedRecord(chr: String, + start: Int, + end: Int, + name: Option[String] = None, + score: Option[Double] = None, + strand: Option[Boolean] = None, + thickStart: Option[Int] = None, + thickEnd: Option[Int] = None, + rgbColor: Option[(Int, Int, Int)] = None, + blockCount: Option[Int] = None, + blockSizes: IndexedSeq[Int] = IndexedSeq(), + blockStarts: IndexedSeq[Int] = IndexedSeq(), + protected[intervals] val _originals: List[BedRecord] = Nil) { + + def originals(nested: Boolean = true): List[BedRecord] = { + if (_originals.isEmpty) List(this) + else if (nested) _originals.flatMap(_.originals(true)) + else _originals + } + + def overlapWith(record: BedRecord): Boolean = { + if (chr != record.chr) false + else if (start < record.end && record.start < end) true + else false + } + + def length = end - start + + def scatter(binSize: Int) = { + val binNumber = length / binSize + if (binNumber <= 1) List(this) + else { + val size = length / binNumber + val buffer = ListBuffer[BedRecord]() + for (i <- 1 until binNumber) buffer += BedRecord(chr, start + ((i - 1) * size), start + (i * size)) + buffer += BedRecord(chr, start + ((binNumber - 1) * size), end) + buffer.toList + } + } + + lazy val exons = if (blockCount.isDefined && blockSizes.length > 0 && blockStarts.length > 0) { + Some(for (i <- 0 until blockCount.get) yield { + val exonNumber = strand match { + case Some(false) => blockCount.get - i + case _ => i + 1 + } + BedRecord(chr, start + blockStarts(i), start + blockStarts(i) + blockSizes(i), + Some(s"exon-$exonNumber"), _originals = List(this)) + }) + } else None + + lazy val introns = if (blockCount.isDefined && blockSizes.length > 0 && blockStarts.length > 0) { + Some(for (i <- 0 until (blockCount.get - 1)) yield { + val intronNumber = strand match { + case Some(false) => blockCount.get - i + case _ => i + 1 + } + BedRecord(chr, start + blockStarts(i) + blockSizes(i), start + blockStarts(i + 1), + Some(s"intron-$intronNumber"), _originals = List(this)) + }) + } else None + + lazy val utr5 = (strand, thickStart, thickEnd) match { + case (Some(true), Some(tStart), Some(tEnd)) if (tStart > start && tEnd < end) => + Some(BedRecord(chr, start, tStart, name.map(_ + "_utr5"))) + case (Some(false), Some(tStart), Some(tEnd)) if (tStart > start && tEnd < end) => + Some(BedRecord(chr, tEnd, end, name.map(_ + "_utr5"))) + case _ => None + } + + lazy val utr3 = (strand, thickStart, thickEnd) match { + case (Some(false), Some(tStart), Some(tEnd)) if (tStart > start && tEnd < end) => + Some(BedRecord(chr, start, tStart, name.map(_ + "_utr3"))) + case (Some(true), Some(tStart), Some(tEnd)) if (tStart > start && tEnd < end) => + Some(BedRecord(chr, tEnd, end, name.map(_ + "_utr3"))) + case _ => None + } + + override def toString = { + def arrayToOption[T](array: IndexedSeq[T]): Option[IndexedSeq[T]] = { + if (array.isEmpty) None + else Some(array) + } + List(Some(chr), Some(start), Some(end), + name, score, strand.map(if (_) "+" else "-"), + thickStart, thickEnd, rgbColor.map(x => s"${x._1},${x._2},${x._3}"), + blockCount, arrayToOption(blockSizes).map(_.mkString(",")), arrayToOption(blockStarts).map(_.mkString(","))) + .takeWhile(_.isDefined) + .flatten + .mkString("\t") + } + + def validate = { + require(start < end, "Start is greater then end") + (thickStart, thickEnd) match { + case (Some(s), Some(e)) => require(s <= e, "Thick start is greater then end") + case _ => + } + blockCount match { + case Some(count) => { + require(count == blockSizes.length, "Number of sizes is not the same as blockCount") + require(count == blockStarts.length, "Number of starts is not the same as blockCount") + } + case _ => + } + this + } + + def toSamInterval = (name, strand) match { + case (Some(name), Some(strand)) => new Interval(chr, start + 1, end, !strand, name) + case (Some(name), _) => new Interval(chr, start + 1, end, false, name) + case _ => new Interval(chr, start + 1, end) + } +} + +object BedRecord { + def fromLine(line: String): BedRecord = { + val values = line.split("\t") + require(values.length >= 3, "Not enough columns count for a bed file") + BedRecord( + values(0), + values(1).toInt, + values(2).toInt, + values.lift(3), + values.lift(4).map(_.toDouble), + values.lift(5).map { + case "-" => false + case "+" => true + case _ => throw new IllegalStateException("Strand (column 6) must be '+' or '-'") + }, + values.lift(6).map(_.toInt), + values.lift(7) map (_.toInt), + values.lift(8).map(_.split(",", 3).map(_.toInt)).map(x => (x.headOption.getOrElse(0), x.lift(1).getOrElse(0), x.lift(2).getOrElse(0))), + values.lift(9).map(_.toInt), + values.lift(10).map(_.split(",").map(_.toInt).toIndexedSeq).getOrElse(IndexedSeq()), + values.lift(11).map(_.split(",").map(_.toInt).toIndexedSeq).getOrElse(IndexedSeq()) + ) + } +} \ No newline at end of file diff --git a/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecordList.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecordList.scala new file mode 100644 index 0000000000000000000000000000000000000000..56b2f303b0a7161879ea9ae01eaf3ba1cc0b86f7 --- /dev/null +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecordList.scala @@ -0,0 +1,146 @@ +package nl.lumc.sasc.biopet.utils.intervals + +import java.io.{ PrintWriter, File } + +import htsjdk.samtools.reference.FastaSequenceFile + +import scala.collection.JavaConversions._ + +import scala.collection.mutable +import scala.collection.mutable.ListBuffer +import scala.io.Source +import nl.lumc.sasc.biopet.utils.Logging + +/** + * Created by pjvan_thof on 8/20/15. + */ +case class BedRecordList(val chrRecords: Map[String, List[BedRecord]], val header: List[String] = Nil) { + def allRecords = for (chr <- chrRecords; record <- chr._2) yield record + + def toSamIntervals = allRecords.map(_.toSamInterval) + + lazy val sorted = { + val sorted = new BedRecordList(chrRecords.map(x => x._1 -> x._2.sortWith((a, b) => a.start < b.start))) + if (sorted.chrRecords.forall(x => x._2 == chrRecords(x._1))) this else sorted + } + + lazy val isSorted = sorted.hashCode() == this.hashCode() || sorted.chrRecords.forall(x => x._2 == chrRecords(x._1)) + + def overlapWith(record: BedRecord) = sorted.chrRecords + .getOrElse(record.chr, Nil) + .dropWhile(_.end <= record.start) + .takeWhile(_.start < record.end) + + def length = allRecords.foldLeft(0L)((a, b) => a + b.length) + + def squishBed(strandSensitive: Boolean = true, nameSensitive: Boolean = true) = BedRecordList.fromList { + (for ((chr, records) <- sorted.chrRecords; record <- records) yield { + val overlaps = overlapWith(record) + .filterNot(_ == record) + .filterNot(strandSensitive && _.strand != record.strand) + .filterNot(nameSensitive && _.name == record.name) + if (overlaps.isEmpty) { + List(record) + } else { + overlaps + .foldLeft(List(record))((result, overlap) => { + (for (r <- result) yield { + (overlap.start <= r.start, overlap.end >= r.end) match { + case (true, true) => + Nil + case (true, false) => + List(r.copy(start = overlap.end, _originals = List(r))) + case (false, true) => + List(r.copy(end = overlap.start, _originals = List(r))) + case (false, false) => + List(r.copy(end = overlap.start, _originals = List(r)), r.copy(start = overlap.end, _originals = List(r))) + } + }).flatten + }) + } + }).flatten + } + + def combineOverlap: BedRecordList = { + new BedRecordList(for ((chr, records) <- sorted.chrRecords) yield chr -> { + def combineOverlap(records: List[BedRecord], + newRecords: ListBuffer[BedRecord] = ListBuffer()): List[BedRecord] = { + if (records.nonEmpty) { + val chr = records.head.chr + val start = records.head.start + val overlapRecords = records.takeWhile(_.start <= records.head.end) + val end = overlapRecords.map(_.end).max + + newRecords += BedRecord(chr, start, end, _originals = overlapRecords) + combineOverlap(records.drop(overlapRecords.length), newRecords) + } else newRecords.toList + } + combineOverlap(records) + }) + } + + def scatter(binSize: Int) = BedRecordList( + chrRecords.map(x => x._1 -> x._2.flatMap(_.scatter(binSize))) + ) + + def validateContigs(reference: File) = { + val referenceFile = new FastaSequenceFile(reference, true) + val dict = referenceFile.getSequenceDictionary + val notExisting = chrRecords.keys.filter(dict.getSequence(_) == null).toList + require(notExisting.isEmpty, s"Contigs found in bed records but are not existing in reference: ${notExisting.mkString(",")}") + this + } + + def writeToFile(file: File): Unit = { + val writer = new PrintWriter(file) + header.foreach(writer.println) + allRecords.foreach(writer.println) + writer.close() + } +} + +object BedRecordList { + def fromListWithHeader(records: Traversable[BedRecord], + header: List[String]): BedRecordList = fromListWithHeader(records.toIterator, header) + + def fromListWithHeader(records: TraversableOnce[BedRecord], header: List[String]): BedRecordList = { + val map = mutable.Map[String, ListBuffer[BedRecord]]() + for (record <- records) { + if (!map.contains(record.chr)) map += record.chr -> ListBuffer() + map(record.chr) += record + } + new BedRecordList(map.toMap.map(m => m._1 -> m._2.toList), header) + } + + def fromList(records: Traversable[BedRecord]): BedRecordList = fromListWithHeader(records.toIterator, Nil) + + def fromList(records: TraversableOnce[BedRecord]): BedRecordList = fromListWithHeader(records, Nil) + + def fromFile(bedFile: File) = { + val reader = Source.fromFile(bedFile) + val all = reader.getLines().toList + val header = all.takeWhile(x => x.startsWith("browser") || x.startsWith("track")) + var lineCount = header.length + val content = all.drop(lineCount) + try { + fromListWithHeader(content.map(line => { + lineCount += 1 + BedRecord.fromLine(line).validate + }), header) + } catch { + case e: Exception => + Logging.logger.warn(s"Parsing line number $lineCount failed on file: ${bedFile.getAbsolutePath}") + throw e + } finally { + reader.close() + } + } + + def fromReference(file: File) = { + val referenceFile = new FastaSequenceFile(file, true) + + fromList(for (contig <- referenceFile.getSequenceDictionary.getSequences) yield { + BedRecord(contig.getSequenceName, 0, contig.getSequenceLength) + }) + } +} \ No newline at end of file diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/utils/package.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/package.scala similarity index 100% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/utils/package.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/package.scala diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/rscript/LinePlot.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/LinePlot.scala similarity index 65% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/rscript/LinePlot.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/LinePlot.scala index 5affda2f871ad8f2e2f20e91a1649a76fa9867f9..1954609198b33dcaa380eaafa3534f463ef43d4c 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/rscript/LinePlot.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/LinePlot.scala @@ -13,26 +13,22 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.extensions.rscript +package nl.lumc.sasc.biopet.utils.rscript import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.extensions.RscriptCommandLineFunction -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.config.Configurable /** * Extension for en general line plot with R * * Created by pjvan_thof on 4/29/15. */ -class LinePlot(val root: Configurable) extends RscriptCommandLineFunction { +class LinePlot(val root: Configurable) extends Rscript { protected var script: File = config("script", default = "plotXY.R") - @Input var input: File = _ - @Output var output: File = _ var width: Option[Int] = config("width") @@ -43,14 +39,14 @@ class LinePlot(val root: Configurable) extends RscriptCommandLineFunction { var title: Option[String] = config("title") var removeZero: Boolean = config("removeZero", default = false) - override def cmdLine: String = super.cmdLine + - required("--input", input) + - required("--output", output) + - optional("--width", width) + - optional("--height", height) + - optional("--xlabel", xlabel) + - required("--ylabel", ylabel) + - optional("--llabel", llabel) + - optional("--title", title) + - optional("--removeZero", removeZero) + override def cmd = super.cmd ++ + Seq("--input", input.getAbsolutePath) ++ + Seq("--output", output.getAbsolutePath) ++ + width.map(x => Seq("--width", x.toString)).getOrElse(Seq()) ++ + height.map(x => Seq("--height", x.toString)).getOrElse(Seq()) ++ + xlabel.map(Seq("--xlabel", _)).getOrElse(Seq()) ++ + ylabel.map(Seq("--ylabel", _)).getOrElse(Seq()) ++ + llabel.map(Seq("--llabel", _)).getOrElse(Seq()) ++ + title.map(Seq("--title", _)).getOrElse(Seq()) ++ + (if (removeZero) Seq("--removeZero", "true") else Seq()) } diff --git a/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/Rscript.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/Rscript.scala new file mode 100644 index 0000000000000000000000000000000000000000..3dfac894eeb591a84a31b3fd5bcfa88c34619192 --- /dev/null +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/Rscript.scala @@ -0,0 +1,69 @@ +package nl.lumc.sasc.biopet.utils.rscript + +import java.io.{ File, FileOutputStream } + +import nl.lumc.sasc.biopet.utils.Logging +import nl.lumc.sasc.biopet.utils.config.Configurable + +import scala.sys.process.{ Process, ProcessLogger } + +/** + * Created by pjvanthof on 13/09/15. + */ +trait Rscript extends Configurable { + protected var script: File + + def rscriptExecutable: String = config("exe", default = "Rscript", submodule = "Rscript") + + /** This is the defaul implementation, to add arguments override this */ + def cmd: Seq[String] = Seq(rscriptExecutable, script.getAbsolutePath) + + /** + * If script not exist in file system it try to copy it from the jar + * @param dir Directory to store temp script, if None or not given File.createTempFile is called + */ + protected def checkScript(dir: Option[File] = None): Unit = { + if (script.exists()) { + script = script.getAbsoluteFile + } else { + val rScript: File = dir match { + case Some(dir) => new File(dir, script.getName) + case _ => { + val file = File.createTempFile(script.getName, ".R") + file.deleteOnExit() + file + } + } + if (!rScript.getParentFile.exists) rScript.getParentFile.mkdirs + + val is = getClass.getResourceAsStream(script.getPath) + val os = new FileOutputStream(rScript) + + org.apache.commons.io.IOUtils.copy(is, os) + os.close() + + script = rScript + } + } + + /** + * Execute rscript on local system + * @param logger How to handle stdout and stderr + */ + def runLocal(logger: ProcessLogger): Unit = { + checkScript() + + Logging.logger.info("Running: " + cmd.mkString(" ")) + + val process = Process(cmd).run(logger) + Logging.logger.info(process.exitValue()) + } + + /** + * Execute rscript on local system + * Stdout and stderr will go to biopet logger + */ + def runLocal(): Unit = { + runLocal(ProcessLogger(Logging.logger.info(_))) + } +} diff --git a/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/ScatterPlot.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/ScatterPlot.scala new file mode 100644 index 0000000000000000000000000000000000000000..9bcbaffcc7d159cc7e82e012c218d0a2f26fc745 --- /dev/null +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/ScatterPlot.scala @@ -0,0 +1,52 @@ +/** + * Biopet is built on top of GATK Queue for building bioinformatic + * pipelines. It is mainly intended to support LUMC SHARK cluster which is running + * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) + * should also be able to execute Biopet tools and pipelines. + * + * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center + * + * Contact us at: sasc@lumc.nl + * + * A dual licensing mode is applied. The source code within this project that are + * not part of GATK Queue is freely available for non-commercial use under an AGPL + * license; For commercial users or users who do not want to follow the AGPL + * license, please contact us to obtain a separate license. + */ +package nl.lumc.sasc.biopet.utils.rscript + +import java.io.File + +import nl.lumc.sasc.biopet.utils.config.Configurable + +/** + * Extension for en general line plot with R + * + * Created by pjvan_thof on 4/29/15. + */ +class ScatterPlot(val root: Configurable) extends Rscript { + protected var script: File = config("script", default = "plotScatter.R") + + var input: File = _ + + var output: File = _ + + var width: Option[Int] = config("width") + var height: Option[Int] = config("height") + var xlabel: Option[String] = config("xlabel") + var ylabel: Option[String] = config("ylabel") + var llabel: Option[String] = config("llabel") + var title: Option[String] = config("title") + var removeZero: Boolean = config("removeZero", default = false) + + override def cmd = super.cmd ++ + Seq("--input", input.getAbsolutePath) ++ + Seq("--output", output.getAbsolutePath) ++ + width.map(x => Seq("--width", x.toString)).getOrElse(Seq()) ++ + height.map(x => Seq("--height", x.toString)).getOrElse(Seq()) ++ + xlabel.map(Seq("--xlabel", _)).getOrElse(Seq()) ++ + ylabel.map(Seq("--ylabel", _)).getOrElse(Seq()) ++ + llabel.map(Seq("--llabel", _)).getOrElse(Seq()) ++ + title.map(Seq("--title", _)).getOrElse(Seq()) ++ + (if (removeZero) Seq("--removeZero") else Seq()) +} diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/rscript/StackedBarPlot.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/StackedBarPlot.scala similarity index 65% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/rscript/StackedBarPlot.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/StackedBarPlot.scala index 4f90a4dbcb1baf293b592ecc4deb445419540587..1965e0a54c810a758db3147de2d24ba8162051fe 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/rscript/StackedBarPlot.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/rscript/StackedBarPlot.scala @@ -13,26 +13,22 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.extensions.rscript +package nl.lumc.sasc.biopet.utils.rscript import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.extensions.RscriptCommandLineFunction -import org.broadinstitute.gatk.utils.commandline.{ Input, Output } +import nl.lumc.sasc.biopet.utils.config.Configurable /** * Extension for en general stackedbar plot with R * * Created by pjvan_thof on 4/29/15. */ -class StackedBarPlot(val root: Configurable) extends RscriptCommandLineFunction { +class StackedBarPlot(val root: Configurable) extends Rscript { protected var script: File = config("script", default = "stackedBar.R") - @Input var input: File = _ - @Output var output: File = _ var width: Option[Int] = config("width") @@ -42,13 +38,13 @@ class StackedBarPlot(val root: Configurable) extends RscriptCommandLineFunction var llabel: Option[String] = config("llabel") var title: Option[String] = config("title") - override def cmdLine: String = super.cmdLine + - required("--input", input) + - required("--output", output) + - optional("--width", width) + - optional("--height", height) + - optional("--xlabel", xlabel) + - required("--ylabel", ylabel) + - optional("--llabel", llabel) + - optional("--title", title) + override def cmd = super.cmd ++ + Seq("--input", input.getAbsolutePath) ++ + Seq("--output", output.getAbsolutePath) ++ + width.map(x => Seq("--width", x.toString)).getOrElse(Seq()) ++ + height.map(x => Seq("--height", x.toString)).getOrElse(Seq()) ++ + xlabel.map(Seq("--xlabel", _)).getOrElse(Seq()) ++ + ylabel.map(Seq("--ylabel", _)).getOrElse(Seq()) ++ + llabel.map(Seq("--llabel", _)).getOrElse(Seq()) ++ + title.map(Seq("--title", _)).getOrElse(Seq()) } diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/Summary.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/summary/Summary.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/Summary.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/summary/Summary.scala index 916d51eea6c7efb3d66b14b7b2960e8669118812..6b863f81d16d4f26754ea87ddf5c703f35fd4588 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/Summary.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/summary/Summary.scala @@ -13,7 +13,7 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core.summary +package nl.lumc.sasc.biopet.utils.summary import java.io.File diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/SummaryValue.scala b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/summary/SummaryValue.scala similarity index 78% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/SummaryValue.scala rename to public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/summary/SummaryValue.scala index 154fd50cff2741501b46c1601db65b38621f196a..371abc9a701e9ed9d224a60bd40df3bcfb8a06cf 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/core/summary/SummaryValue.scala +++ b/public/biopet-utils/src/main/scala/nl/lumc/sasc/biopet/utils/summary/SummaryValue.scala @@ -1,19 +1,4 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.core.summary +package nl.lumc.sasc.biopet.utils.summary /** * This case class is used for easy access and calculations on those values @@ -79,4 +64,3 @@ case class SummaryValue(value: Option[Any]) { } } } - diff --git a/public/biopet-utils/src/test/resources/log4j.properties b/public/biopet-utils/src/test/resources/log4j.properties new file mode 100644 index 0000000000000000000000000000000000000000..501af67582a546db584c8538b28cb6f9e07f1692 --- /dev/null +++ b/public/biopet-utils/src/test/resources/log4j.properties @@ -0,0 +1,25 @@ +# +# Biopet is built on top of GATK Queue for building bioinformatic +# pipelines. It is mainly intended to support LUMC SHARK cluster which is running +# SGE. But other types of HPC that are supported by GATK Queue (such as PBS) +# should also be able to execute Biopet tools and pipelines. +# +# Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center +# +# Contact us at: sasc@lumc.nl +# +# A dual licensing mode is applied. The source code within this project that are +# not part of GATK Queue is freely available for non-commercial use under an AGPL +# license; For commercial users or users who do not want to follow the AGPL +# license, please contact us to obtain a separate license. +# + +# Set root logger level to DEBUG and its only appender to A1. +log4j.rootLogger=ERROR, A1 + +# A1 is set to be a ConsoleAppender. +log4j.appender.A1=org.apache.log4j.ConsoleAppender + +# A1 uses PatternLayout. +log4j.appender.A1.layout=org.apache.log4j.PatternLayout +log4j.appender.A1.layout.ConversionPattern=%-5p [%d] [%C{1}] - %m%n \ No newline at end of file diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/utils/ConfigUtilsTest.scala b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/ConfigUtilsTest.scala similarity index 99% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/utils/ConfigUtilsTest.scala rename to public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/ConfigUtilsTest.scala index 571e9d62f0d4eb2ec505c2905e95910472dfda64..6bac74cdf9f3bf6148a78b05271593242d9b2d07 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/utils/ConfigUtilsTest.scala +++ b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/ConfigUtilsTest.scala @@ -19,7 +19,7 @@ import java.io.{ File, PrintWriter } import argonaut.Argonaut._ import argonaut.Json -import nl.lumc.sasc.biopet.core.config.{ ConfigValue, ConfigValueIndex } +import nl.lumc.sasc.biopet.utils.config.{ ConfigValue, ConfigValueIndex } import org.scalatest.Matchers import org.scalatest.testng.TestNGSuite import org.testng.annotations.Test @@ -228,6 +228,7 @@ class ConfigUtilsTest extends TestNGSuite with Matchers { object ConfigUtilsTest { def writeTemp(text: String, extension: String): File = { val file = File.createTempFile("TestConfigUtils.", extension) + file.deleteOnExit() val w = new PrintWriter(file) w.write(text) w.close() diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/utils/PackageTest.scala b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/PackageTest.scala similarity index 100% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/utils/PackageTest.scala rename to public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/PackageTest.scala diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/core/config/ConfigTest.scala b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/config/ConfigTest.scala similarity index 99% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/core/config/ConfigTest.scala rename to public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/config/ConfigTest.scala index 023b6db5f17e9d763c2fbd90fee063a5293b27ad..6c92d45d8ff9b594d65af3ba33911ede1718e231 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/core/config/ConfigTest.scala +++ b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/config/ConfigTest.scala @@ -13,7 +13,7 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core.config +package nl.lumc.sasc.biopet.utils.config import nl.lumc.sasc.biopet.utils.{ ConfigUtils, ConfigUtilsTest } import org.scalatest.Matchers diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/core/config/ConfigValueTest.scala b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/config/ConfigValueTest.scala similarity index 97% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/core/config/ConfigValueTest.scala rename to public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/config/ConfigValueTest.scala index a09b074bc5792852d75f1623efaf183c8729484e..d0fce8573dd28b45259d3920d3f72ded84508fc2 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/core/config/ConfigValueTest.scala +++ b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/config/ConfigValueTest.scala @@ -13,7 +13,7 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core.config +package nl.lumc.sasc.biopet.utils.config import java.io.File diff --git a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/core/config/ConfigurableTest.scala b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/config/ConfigurableTest.scala similarity index 67% rename from public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/core/config/ConfigurableTest.scala rename to public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/config/ConfigurableTest.scala index b889f08273eb2617a1af73ba4c2d9f5e4be3b4e7..37e851cf1572848a648bcb45ff5b7a51edeb0a68 100644 --- a/public/biopet-framework/src/test/scala/nl/lumc/sasc/biopet/core/config/ConfigurableTest.scala +++ b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/config/ConfigurableTest.scala @@ -13,7 +13,7 @@ * license; For commercial users or users who do not want to follow the AGPL * license, please contact us to obtain a separate license. */ -package nl.lumc.sasc.biopet.core.config +package nl.lumc.sasc.biopet.utils.config import org.scalatest.Matchers import org.scalatest.testng.TestNGSuite @@ -25,10 +25,36 @@ import org.testng.annotations.Test * Created by pjvan_thof on 1/8/15. */ class ConfigurableTest extends TestNGSuite with Matchers { + + abstract class Cfg extends Configurable { + def get(key: String, + default: String = null, + submodule: String = null, + freeVar: Boolean = true, + sample: String = null, + library: String = null) = { + config(key, default, submodule, freeVar = freeVar, sample = sample, library = library) + } + } + + class ClassA(val root: Configurable) extends Cfg + + class ClassB(val root: Configurable) extends Cfg { + lazy val classA = new ClassA(this) + // Why this needs to be lazy? + } + + class ClassC(val root: Configurable) extends Cfg { + def this() = this(null) + lazy val classB = new ClassB(this) + // Why this needs to be lazy? + } + @Test def testConfigurable(): Unit = { val classC = new ClassC { override def configName = "classc" override val globalConfig = new Config(ConfigurableTest.map) + override val fixedValues = Map("fixed" -> "fixed") } classC.configPath shouldBe Nil @@ -51,46 +77,33 @@ class ConfigurableTest extends TestNGSuite with Matchers { classC.get("bla", sample = "sample1", library = "library1").asString shouldBe "bla" classC.get("test", sample = "sample1", library = "library1").asString shouldBe "test" classC.get("test", sample = "sample1").asString shouldBe "test" - } -} -abstract class Cfg extends Configurable { - def get(key: String, - default: String = null, - submodule: String = null, - freeVar: Boolean = true, - sample: String = null, - library: String = null) = { - config(key, default, submodule, freeVar = freeVar, sample = sample, library = library) + // Fixed values + classC.get("fixed").asString shouldBe "fixed" + classC.classB.get("fixed").asString shouldBe "fixed" + classC.classB.classA.get("fixed").asString shouldBe "fixed" } } -class ClassA(val root: Configurable) extends Cfg - -class ClassB(val root: Configurable) extends Cfg { - lazy val classA = new ClassA(this) - // Why this needs to be lazy? -} - -class ClassC(val root: Configurable) extends Cfg { - def this() = this(null) - lazy val classB = new ClassB(this) - // Why this needs to be lazy? -} - object ConfigurableTest { val map = Map( + "fixed" -> "nonfixed", "classa" -> Map( - "k1" -> "a1" + "k1" -> "a1", + "fixed" -> "nonfixed" ), "classb" -> Map( - "k1" -> "b1" + "k1" -> "b1", + "fixed" -> "nonfixed" ), "classc" -> Map( - "k1" -> "c1" + "k1" -> "c1", + "fixed" -> "nonfixed" ), "samples" -> Map( "sample1" -> Map( + "fixed" -> "nonfixed", "test" -> "test", "libraries" -> Map( "library1" -> Map( + "fixed" -> "nonfixed", "bla" -> "bla" ) ) diff --git a/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecordListTest.scala b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecordListTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..0490b28db7dbf199a9b01c1b48c6783f11720f7d --- /dev/null +++ b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecordListTest.scala @@ -0,0 +1,152 @@ +package nl.lumc.sasc.biopet.utils.intervals + +import java.io.{ PrintWriter, File } + +import htsjdk.samtools.util.Interval +import org.scalatest.Matchers +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.{ Test, AfterClass, BeforeClass } + +import scala.io.Source + +/** + * Created by pjvan_thof on 8/25/15. + */ +class BedRecordListTest extends TestNGSuite with Matchers { + @BeforeClass + def start: Unit = { + { + val writer = new PrintWriter(BedRecordListTest.bedFile) + writer.print(BedRecordListTest.bedContent) + writer.close() + } + { + val writer = new PrintWriter(BedRecordListTest.corruptBedFile) + writer.print(BedRecordListTest.corruptBedContent) + writer.close() + } + { + val writer = new PrintWriter(BedRecordListTest.bedFileUcscHeader) + writer.print(BedRecordListTest.ucscHeader) + writer.print(BedRecordListTest.bedContent) + writer.close() + } + } + + @Test def testReadBedFile { + val records = BedRecordList.fromFile(BedRecordListTest.bedFile) + records.allRecords.size shouldBe 2 + records.header shouldBe Nil + + val tempFile = File.createTempFile("region", ".bed") + tempFile.deleteOnExit() + records.writeToFile(tempFile) + BedRecordList.fromFile(tempFile) shouldBe records + tempFile.delete() + } + + @Test def testReadBedFileUcscHeader { + val records = BedRecordList.fromFile(BedRecordListTest.bedFileUcscHeader) + records.allRecords.size shouldBe 2 + records.header shouldBe BedRecordListTest.ucscHeader.split("\n").toList + + val tempFile = File.createTempFile("region", ".bed") + tempFile.deleteOnExit() + records.writeToFile(tempFile) + BedRecordList.fromFile(tempFile) shouldBe records + tempFile.delete() + } + + @Test def testSorted: Unit = { + val unsorted = BedRecordList.fromList(List(BedRecord("chrQ", 10, 20), BedRecord("chrQ", 0, 10))) + unsorted.isSorted shouldBe false + unsorted.sorted.isSorted shouldBe true + val sorted = BedRecordList.fromList(List(BedRecord("chrQ", 0, 10), BedRecord("chrQ", 10, 20))) + sorted.isSorted shouldBe true + sorted.sorted.isSorted shouldBe true + sorted.hashCode() shouldBe sorted.sorted.hashCode() + } + + @Test def testOverlap: Unit = { + val list = BedRecordList.fromList(List(BedRecord("chrQ", 0, 10), BedRecord("chrQ", 10, 20))) + list.overlapWith(BedRecord("chrQ", 5, 15)).size shouldBe 2 + list.overlapWith(BedRecord("chrQ", 0, 10)).size shouldBe 1 + list.overlapWith(BedRecord("chrQ", 10, 20)).size shouldBe 1 + list.overlapWith(BedRecord("chrQ", 19, 25)).size shouldBe 1 + list.overlapWith(BedRecord("chrQ", 20, 25)).size shouldBe 0 + } + + @Test def testLength: Unit = { + val list = BedRecordList.fromList(List(BedRecord("chrQ", 0, 10), BedRecord("chrQ", 10, 20))) + list.length shouldBe 20 + } + + @Test def testCombineOverlap: Unit = { + val noOverlapList = BedRecordList.fromList(List(BedRecord("chrQ", 0, 10), BedRecord("chrQ", 10, 20))) + noOverlapList.length shouldBe 20 + noOverlapList.combineOverlap.length shouldBe 20 + + val overlapList = BedRecordList.fromList(List(BedRecord("chrQ", 0, 10), BedRecord("chrQ", 5, 15), BedRecord("chrQ", 10, 20))) + overlapList.length shouldBe 30 + overlapList.combineOverlap.length shouldBe 20 + } + + @Test def testSquishBed: Unit = { + val noOverlapList = BedRecordList.fromList(List(BedRecord("chrQ", 0, 10), BedRecord("chrQ", 10, 20))) + noOverlapList.length shouldBe 20 + noOverlapList.squishBed().length shouldBe 20 + + val overlapList = BedRecordList.fromList(List( + BedRecord("chrQ", 0, 10), + BedRecord("chrQ", 5, 15), + BedRecord("chrQ", 10, 20), + BedRecord("chrQ", 25, 35), + BedRecord("chrQ", 50, 80), + BedRecord("chrQ", 60, 70) + )) + overlapList.length shouldBe 80 + val squishedList = overlapList.squishBed(strandSensitive = false, nameSensitive = false) + squishedList.allRecords.size shouldBe 5 + squishedList.length shouldBe 40 + } + + @Test def testSamInterval: Unit = { + val list = BedRecordList.fromList(List(BedRecord("chrQ", 0, 10), BedRecord("chrQ", 5, 15))) + list.toSamIntervals.toList shouldBe List(new Interval("chrQ", 1, 10), new Interval("chrQ", 6, 15)) + } + + @Test def testTraversable: Unit = { + val list = List(BedRecord("chrQ", 0, 10)) + BedRecordList.fromList(list) shouldBe BedRecordList.fromList(list.toIterator) + } + + @Test def testErrors: Unit = { + intercept[IllegalArgumentException] { + val records = BedRecordList.fromFile(BedRecordListTest.corruptBedFile) + } + } + + @Test def testScatter: Unit = { + val list = BedRecordList.fromList(List(BedRecord("chrQ", 0, 1000), BedRecord("chrQ", 3000, 3500))) + list.scatter(100).allRecords.size shouldBe 15 + list.scatter(100).length shouldBe 1500 + } +} + +object BedRecordListTest { + val ucscHeader = """browser position chr7:127471196-127495720 + |browser hide all + |track name="ItemRGBDemo" description="Item RGB demonstration" visibility=2 itemRgb="On" + |""".stripMargin + val bedContent = """chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488 0,3512 + |chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399 0,3601""".stripMargin + val corruptBedContent = """chr22 5000 1000 cloneA 960 + 1000 5000 0 2 567,488 0,3512 + |chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399 0,3601""".stripMargin + + val bedFile = File.createTempFile("regions", ".bed") + bedFile.deleteOnExit() + val corruptBedFile = File.createTempFile("regions", ".bed") + corruptBedFile.deleteOnExit() + val bedFileUcscHeader = File.createTempFile("regions", ".bed") + bedFileUcscHeader.deleteOnExit() +} \ No newline at end of file diff --git a/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecordTest.scala b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecordTest.scala new file mode 100644 index 0000000000000000000000000000000000000000..a4ae25293748214d8d0d3b27494b01c34f71a684 --- /dev/null +++ b/public/biopet-utils/src/test/scala/nl/lumc/sasc/biopet/utils/intervals/BedRecordTest.scala @@ -0,0 +1,163 @@ +package nl.lumc.sasc.biopet.utils.intervals + +import htsjdk.samtools.util.Interval +import org.scalatest.Matchers +import org.scalatest.testng.TestNGSuite +import org.testng.annotations.Test + +/** + * Created by pjvanthof on 24/08/15. + */ +class BedRecordTest extends TestNGSuite with Matchers { + @Test def testLineParse: Unit = { + BedRecord("chrQ", 0, 4) shouldBe BedRecord("chrQ", 0, 4) + BedRecord.fromLine("chrQ\t0\t4") shouldBe BedRecord("chrQ", 0, 4) + BedRecord.fromLine("chrQ\t0\t4\tname\t3\t+") shouldBe BedRecord("chrQ", 0, 4, Some("name"), Some(3.0), Some(true)) + BedRecord.fromLine("chrQ\t0\t4\tname\t3\t+\t1\t3") shouldBe + BedRecord("chrQ", 0, 4, Some("name"), Some(3.0), Some(true), Some(1), Some(3)) + BedRecord.fromLine("chrQ\t0\t4\tname\t3\t+\t1\t3\t255,0,0") shouldBe + BedRecord("chrQ", 0, 4, Some("name"), Some(3.0), Some(true), Some(1), Some(3), Some((255, 0, 0))) + BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+\t1\t3\t255,0,0\t2\t10,20\t20,50") shouldBe + BedRecord("chrQ", 0, 100, Some("name"), Some(3.0), Some(true), Some(1), Some(3), Some((255, 0, 0)), + Some(2), IndexedSeq(10, 20), IndexedSeq(20, 50)) + } + + @Test def testLineOutput: Unit = { + BedRecord("chrQ", 0, 4).toString shouldBe "chrQ\t0\t4" + BedRecord.fromLine("chrQ\t0\t4").toString shouldBe "chrQ\t0\t4" + BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+\t1\t3\t255,0,0\t2\t10,20\t20,50").toString shouldBe "chrQ\t0\t100\tname\t3.0\t+\t1\t3\t255,0,0\t2\t10,20\t20,50" + } + + @Test def testOverlap: Unit = { + BedRecord("chrQ", 0, 4).overlapWith(BedRecord("chrQ", 0, 4)) shouldBe true + BedRecord("chrQ", 0, 4).overlapWith(BedRecord("chrX", 0, 4)) shouldBe false + BedRecord("chrQ", 0, 4).overlapWith(BedRecord("chrQ", 4, 8)) shouldBe false + BedRecord("chrQ", 0, 4).overlapWith(BedRecord("chrQ", 3, 8)) shouldBe true + BedRecord("chrQ", 4, 8).overlapWith(BedRecord("chrQ", 0, 4)) shouldBe false + BedRecord("chrQ", 3, 4).overlapWith(BedRecord("chrQ", 0, 4)) shouldBe true + BedRecord("chrQ", 3, 4).overlapWith(BedRecord("chrQ", 4, 5)) shouldBe false + } + + @Test def testLength: Unit = { + BedRecord("chrQ", 0, 4).length shouldBe 4 + BedRecord("chrQ", 0, 1).length shouldBe 1 + BedRecord("chrQ", 3, 4).length shouldBe 1 + } + + @Test def testToSamInterval: Unit = { + BedRecord("chrQ", 0, 4).toSamInterval shouldBe new Interval("chrQ", 1, 4) + BedRecord("chrQ", 0, 4, Some("name"), Some(0.0), Some(true)).toSamInterval shouldBe new Interval("chrQ", 1, 4, false, "name") + } + + @Test def testExons: Unit = { + BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+\t1\t3\t255,0,0").exons shouldBe None + + val record = BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+\t1\t3\t255,0,0\t2\t10,20\t0,80") + val exons = record.exons + exons should not be None + exons.get(0).originals()(0) shouldBe record + exons.get(0).originals().size shouldBe 1 + exons.get(1).originals()(0) shouldBe record + exons.get(1).originals().size shouldBe 1 + exons.get(0).start shouldBe 0 + exons.get(0).end shouldBe 10 + exons.get(1).start shouldBe 80 + exons.get(1).end shouldBe 100 + exons.get.foldLeft(0)(_ + _.length) shouldBe 30 + } + + @Test def testIntrons: Unit = { + BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+\t1\t3\t255,0,0").introns shouldBe None + + val record = BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+\t1\t3\t255,0,0\t2\t10,20\t0,80") + val introns = record.introns + introns should not be None + introns.get(0).originals()(0) shouldBe record + introns.get(0).originals().size shouldBe 1 + introns.get(0).start shouldBe 10 + introns.get(0).end shouldBe 80 + introns.get.foldLeft(0)(_ + _.length) shouldBe 70 + } + + @Test def testExonIntronOverlap: Unit = { + val record = BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+\t1\t3\t255,0,0\t2\t10,20\t0,80") + val exons = record.exons + val introns = record.introns + for (exon <- exons.get; intron <- introns.get) { + exon.overlapWith(intron) shouldBe false + } + } + + @Test def testUtrsPositive: Unit = { + BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+").utr3 shouldBe None + BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+").utr5 shouldBe None + + val record = BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+\t3\t93\t255,0,0\t2\t10,20\t0,80") + val utr5 = record.utr5 + val utr3 = record.utr3 + utr5 should not be None + utr3 should not be None + utr5.get.length shouldBe 3 + utr3.get.length shouldBe 7 + + } + + @Test def testUtrsNegative: Unit = { + BedRecord.fromLine("chrQ\t0\t100\tname\t3\t-").utr3 shouldBe None + BedRecord.fromLine("chrQ\t0\t100\tname\t3\t-").utr5 shouldBe None + + val record = BedRecord.fromLine("chrQ\t0\t100\tname\t3\t-\t3\t93\t255,0,0\t2\t10,20\t0,80") + val utr5 = record.utr5 + val utr3 = record.utr3 + utr5 should not be None + utr3 should not be None + utr5.get.length shouldBe 7 + utr3.get.length shouldBe 3 + } + + @Test def testOriginals: Unit = { + val original = BedRecord("chrQ", 1, 2) + val level1 = BedRecord("chrQ", 1, 2, _originals = List(original)) + val level2 = BedRecord("chrQ", 2, 3, _originals = List(level1)) + original.originals() shouldBe List(original) + original.originals(nested = false) shouldBe List(original) + level1.originals() shouldBe List(original) + level1.originals(nested = false) shouldBe List(original) + level2.originals() shouldBe List(original) + level2.originals(nested = false) shouldBe List(level1) + } + + @Test def testScatter: Unit = { + val list = BedRecord("chrQ", 0, 1000).scatter(10) + list.size shouldBe 100 + BedRecordList.fromList(list).length shouldBe 1000 + for (l1 <- list; l2 <- list if l1 != l2) l1.overlapWith(l2) shouldBe false + + val list2 = BedRecord("chrQ", 0, 999).scatter(10) + list2.size shouldBe 99 + BedRecordList.fromList(list2).length shouldBe 999 + for (l1 <- list2; l2 <- list2 if l1 != l2) l1.overlapWith(l2) shouldBe false + + val list3 = BedRecord("chrQ", 0, 999).scatter(9) + list3.size shouldBe 111 + BedRecordList.fromList(list3).length shouldBe 999 + for (l1 <- list3; l2 <- list3 if l1 != l2) l1.overlapWith(l2) shouldBe false + } + + @Test def testErrors: Unit = { + BedRecord("chrQ", 0, 3).validate + BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+\t1\t3\t255,0,0\t2\t10,20\t20,50").validate + intercept[IllegalArgumentException] { + BedRecord("chrQ", 0, 0).validate + } + intercept[IllegalArgumentException] { + BedRecord("chrQ", 4, 3).validate + } + intercept[IllegalArgumentException] { + BedRecord.fromLine("chrQ\t0\t100\tname\t3\t+\t1\t3\t255,0,0\t2\t10\t50").validate + } + intercept[IllegalStateException] { + BedRecord.fromLine("chrQ\t0\t100\tname\t3\tx\t1\t3\t255,0,0\t2\t10,20\t20,50").validate + } + } +} diff --git a/public/carp/pom.xml b/public/carp/pom.xml index e4782ea6af3061476e3d8285d82a834d278d6e4e..6436a278cc101d9914dfbfdc3b8ef74185e85f23 100644 --- a/public/carp/pom.xml +++ b/public/carp/pom.xml @@ -25,7 +25,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -35,7 +35,7 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/public/carp/src/main/resources/nl/lumc/sasc/biopet/pipelines/carp/carpFront.ssp b/public/carp/src/main/resources/nl/lumc/sasc/biopet/pipelines/carp/carpFront.ssp index d6052e8c30de00be49bb0a2e31afeaae99d1513b..d27d719622c39812a9787a4935808c5da55cdd97 100644 --- a/public/carp/src/main/resources/nl/lumc/sasc/biopet/pipelines/carp/carpFront.ssp +++ b/public/carp/src/main/resources/nl/lumc/sasc/biopet/pipelines/carp/carpFront.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) <%@ var summary: Summary %> <table class="table"> <tbody> diff --git a/public/carp/src/main/scala/nl/lumc/sasc/biopet/pipelines/carp/Carp.scala b/public/carp/src/main/scala/nl/lumc/sasc/biopet/pipelines/carp/Carp.scala index aada3bc2f7c82ec87a14c7acffa4c7cb71c8203b..de37b352ecf8a9091ee67d1b455e85087156f642 100644 --- a/public/carp/src/main/scala/nl/lumc/sasc/biopet/pipelines/carp/Carp.scala +++ b/public/carp/src/main/scala/nl/lumc/sasc/biopet/pipelines/carp/Carp.scala @@ -18,11 +18,12 @@ package nl.lumc.sasc.biopet.pipelines.carp import java.io.File import nl.lumc.sasc.biopet.core._ -import nl.lumc.sasc.biopet.core.config._ +import nl.lumc.sasc.biopet.extensions.samtools.SamtoolsView +import nl.lumc.sasc.biopet.utils.config._ import nl.lumc.sasc.biopet.core.summary.SummaryQScript import nl.lumc.sasc.biopet.extensions.Ln import nl.lumc.sasc.biopet.extensions.macs2.Macs2CallPeak -import nl.lumc.sasc.biopet.extensions.picard.MergeSamFiles +import nl.lumc.sasc.biopet.extensions.picard.{ BuildBamIndex, MergeSamFiles } import nl.lumc.sasc.biopet.pipelines.bammetrics.BamMetrics import nl.lumc.sasc.biopet.pipelines.bamtobigwig.Bam2Wig import nl.lumc.sasc.biopet.pipelines.mapping.Mapping @@ -38,12 +39,20 @@ class Carp(val root: Configurable) extends QScript with MultiSampleQScript with qscript => def this() = this(null) - override def defaults = ConfigUtils.mergeMaps(Map( + override def defaults = Map( "mapping" -> Map( - "skip_markduplicates" -> true, + "skip_markduplicates" -> false, "aligner" -> "bwa-mem" + ), + "samtoolsview" -> Map("q" -> 10) + ) + + override def fixedValues = Map( + "samtoolsview" -> Map( + "h" -> true, + "b" -> true ) - ), super.defaults) + ) def summaryFile = new File(outputDir, "Carp.summary.json") @@ -78,6 +87,10 @@ class Carp(val root: Configurable) extends QScript with MultiSampleQScript with if (config.contains("R1")) { mapping.input_R1 = config("R1") if (config.contains("R2")) mapping.input_R2 = config("R2") + + inputFiles :+= new InputFile(mapping.input_R1, config("R1_md5")) + mapping.input_R2.foreach(inputFiles :+= new InputFile(_, config("R2_md5"))) + mapping.init() mapping.biopetScript() addAll(mapping.functions) @@ -89,6 +102,7 @@ class Carp(val root: Configurable) extends QScript with MultiSampleQScript with } val bamFile = createFile(".bam") + val bamFileFilter = createFile(".filter.bam") val controls: List[String] = config("control", default = Nil) def addJobs(): Unit = { @@ -107,13 +121,32 @@ class Carp(val root: Configurable) extends QScript with MultiSampleQScript with add(merge) } - val bamMetrics = BamMetrics(qscript, bamFile, new File(sampleDir, "metrics")) + val bamMetrics = BamMetrics(qscript, bamFile, new File(sampleDir, "metrics"), sampleId = Some(sampleId)) addAll(bamMetrics.functions) addSummaryQScript(bamMetrics) + + val bamMetricsFilter = BamMetrics(qscript, bamFileFilter, new File(sampleDir, "metrics-filter"), sampleId = Some(sampleId)) + addAll(bamMetricsFilter.functions) + bamMetricsFilter.summaryName = "bammetrics-filter" + addSummaryQScript(bamMetricsFilter) + addAll(Bam2Wig(qscript, bamFile).functions) + addAll(Bam2Wig(qscript, bamFileFilter).functions) + + val samtoolsView = new SamtoolsView(qscript) + samtoolsView.input = bamFile + samtoolsView.output = bamFileFilter + samtoolsView.b = true + samtoolsView.h = true + add(samtoolsView) + + val buildBamIndex = new BuildBamIndex(qscript) + buildBamIndex.input = bamFileFilter + buildBamIndex.output = swapExt(bamFileFilter.getParent, bamFileFilter, ".bam", ".bai") + add(buildBamIndex) val macs2 = new Macs2CallPeak(qscript) - macs2.treatment = bamFile + macs2.treatment = bamFileFilter macs2.name = Some(sampleId) macs2.outputdir = sampleDir + File.separator + "macs2" + File.separator + sampleId + File.separator add(macs2) @@ -151,8 +184,8 @@ class Carp(val root: Configurable) extends QScript with MultiSampleQScript with if (!samples.contains(controlId)) throw new IllegalStateException("For sample: " + sampleId + " this control: " + controlId + " does not exist") val macs2 = new Macs2CallPeak(this) - macs2.treatment = sample.bamFile - macs2.control = samples(controlId).bamFile + macs2.treatment = sample.bamFileFilter + macs2.control = samples(controlId).bamFileFilter macs2.name = Some(sampleId + "_VS_" + controlId) macs2.outputdir = sample.sampleDir + File.separator + "macs2" + File.separator + macs2.name.get + File.separator add(macs2) diff --git a/public/carp/src/main/scala/nl/lumc/sasc/biopet/pipelines/carp/CarpReport.scala b/public/carp/src/main/scala/nl/lumc/sasc/biopet/pipelines/carp/CarpReport.scala index 0a25aaa1c65dca1a6304e217eb6d254b7d7d0e65..59a07dca040ff07f81438b59697953563b97d972 100644 --- a/public/carp/src/main/scala/nl/lumc/sasc/biopet/pipelines/carp/CarpReport.scala +++ b/public/carp/src/main/scala/nl/lumc/sasc/biopet/pipelines/carp/CarpReport.scala @@ -15,7 +15,7 @@ */ package nl.lumc.sasc.biopet.pipelines.carp -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.report.{ ReportBuilderExtension, ReportSection, ReportPage, MultisampleReportBuilder } import nl.lumc.sasc.biopet.pipelines.bammetrics.BammetricsReport import nl.lumc.sasc.biopet.pipelines.flexiprep.FlexiprepReport @@ -76,7 +76,7 @@ object CarpReport extends MultisampleReportBuilder { ), List( "Alignment" -> ReportSection("/nl/lumc/sasc/biopet/pipelines/bammetrics/alignmentSummary.ssp", if (summary.libraries(sampleId).size > 1) Map("showPlot" -> true) else Map()), - "Preprocessing" -> ReportSection("/nl/lumc/sasc/biopet/pipelines/bammetrics/alignmentSummary.ssp", Map("sampleLevel" -> true)), + "Merged" -> ReportSection("/nl/lumc/sasc/biopet/pipelines/bammetrics/alignmentSummary.ssp", Map("sampleLevel" -> true)), "QC reads" -> ReportSection("/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepReadSummary.ssp"), "QC bases" -> ReportSection("/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepBaseSummary.ssp") ), args) diff --git a/public/carp/src/test/scala/nl/lumc/sasc/biopet/pipelines/carp/CarpTest.scala b/public/carp/src/test/scala/nl/lumc/sasc/biopet/pipelines/carp/CarpTest.scala index 661aa78a459f5debd38b917f92fd6da349c42dde..da5d79939fd3729fb8838ab26bbe1a2d2d2bfb00 100644 --- a/public/carp/src/test/scala/nl/lumc/sasc/biopet/pipelines/carp/CarpTest.scala +++ b/public/carp/src/test/scala/nl/lumc/sasc/biopet/pipelines/carp/CarpTest.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.pipelines.carp import java.io.{ File, FileOutputStream } import com.google.common.io.Files -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import nl.lumc.sasc.biopet.extensions.bwa.BwaMem import nl.lumc.sasc.biopet.extensions.macs2.Macs2CallPeak import nl.lumc.sasc.biopet.extensions.picard.{ MergeSamFiles, SortSam } @@ -81,8 +81,8 @@ class CarpTest extends TestNGSuite with Matchers { val numberSamples = (if (sample1) 1 else 0) + (if (sample2) 1 else 0) + (if (sample3) 1 else 0) + (if (threatment) 1 else 0) + (if (control) 1 else 0) - carp.functions.count(_.isInstanceOf[BwaMem]) shouldBe numberLibs - carp.functions.count(_.isInstanceOf[SortSam]) shouldBe numberLibs + //carp.functions.count(_.isInstanceOf[BwaMem]) shouldBe numberLibs + //carp.functions.count(_.isInstanceOf[SortSam]) shouldBe numberLibs carp.functions.count(_.isInstanceOf[MergeSamFiles]) shouldBe (if (sample3) 1 else 0) carp.functions.count(_.isInstanceOf[Macs2CallPeak]) shouldBe (numberSamples + (if (threatment) 1 else 0)) @@ -97,6 +97,12 @@ class CarpTest extends TestNGSuite with Matchers { object CarpTest { val outputDir = Files.createTempDir() + new File(outputDir, "input").mkdirs() + def inputTouch(name: String): String = { + val file = new File(outputDir, "input" + File.separator + name) + Files.touch(file) + file.getAbsolutePath + } private def copyFile(name: String): Unit = { val is = getClass.getResourceAsStream("/" + name) @@ -110,7 +116,6 @@ object CarpTest { copyFile("ref.fa.fai") val executables = Map( - "reference" -> (outputDir + File.separator + "ref.fa"), "reference_fasta" -> (outputDir + File.separator + "ref.fa"), "fastqc" -> Map("exe" -> "test"), "seqtk" -> Map("exe" -> "test"), @@ -127,8 +132,8 @@ object CarpTest { val sample1 = Map( "samples" -> Map("sample1" -> Map("libraries" -> Map( "lib1" -> Map( - "R1" -> "1_1_R1.fq", - "R2" -> "1_1_R2.fq" + "R1" -> inputTouch("1_1_R1.fq"), + "R2" -> inputTouch("1_1_R2.fq") ) ) ))) @@ -136,8 +141,8 @@ object CarpTest { val sample2 = Map( "samples" -> Map("sample2" -> Map("libraries" -> Map( "lib1" -> Map( - "R1" -> "2_1_R1.fq", - "R2" -> "2_1_R2.fq" + "R1" -> inputTouch("2_1_R1.fq"), + "R2" -> inputTouch("2_1_R2.fq") ) ) ))) @@ -145,12 +150,12 @@ object CarpTest { val sample3 = Map( "samples" -> Map("sample3" -> Map("libraries" -> Map( "lib1" -> Map( - "R1" -> "3_1_R1.fq", - "R2" -> "3_1_R2.fq" + "R1" -> inputTouch("3_1_R1.fq"), + "R2" -> inputTouch("3_1_R2.fq") ), "lib2" -> Map( - "R1" -> "3_2_R1.fq", - "R2" -> "3_2_R2.fq" + "R1" -> inputTouch("3_2_R1.fq"), + "R2" -> inputTouch("3_2_R2.fq") ) ) ))) @@ -158,8 +163,8 @@ object CarpTest { val threatment1 = Map( "samples" -> Map("threatment" -> Map("control" -> "control1", "libraries" -> Map( "lib1" -> Map( - "R1" -> "threatment_1_R1.fq", - "R2" -> "threatment_1_R2.fq" + "R1" -> inputTouch("threatment_1_R1.fq"), + "R2" -> inputTouch("threatment_1_R2.fq") ) ) ))) @@ -167,8 +172,8 @@ object CarpTest { val control1 = Map( "samples" -> Map("control1" -> Map("libraries" -> Map( "lib1" -> Map( - "R1" -> "control_1_R1.fq", - "R2" -> "control_1_R2.fq" + "R1" -> inputTouch("control_1_R1.fq"), + "R2" -> inputTouch("control_1_R2.fq") ) ) ))) diff --git a/public/flexiprep/pom.xml b/public/flexiprep/pom.xml index f96546c331691ae192486c251b31460fc0e73464..60077ccf28c406b7358d247edffb101f2c092c9c 100644 --- a/public/flexiprep/pom.xml +++ b/public/flexiprep/pom.xml @@ -25,7 +25,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -35,7 +35,12 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetToolsExtensions</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepBaseSummary.ssp b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepBaseSummary.ssp index 23592e9d2fa8fa72f8a28c2daa6a5c8cb318549e..b2f0057b3dc437c496c4a161af515c11f9a3d80e 100644 --- a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepBaseSummary.ssp +++ b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepBaseSummary.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(nl.lumc.sasc.biopet.pipelines.flexiprep.FlexiprepReport) #import(java.io.File) @@ -129,7 +129,7 @@ #if (read == "R2") </tr><tr> #end #{ val beforeTotal = summary.getLibraryValue(sample, libId, "flexiprep", "stats", "seqstat_" + read, "bases", "num_total").getOrElse(0).toString.toLong - val afterTotal = summary.getLibraryValue(sample, libId, "flexiprep", "stats", "seqstat_" + read + "_after", "bases", "num_total").getOrElse(0).toString.toLong + val afterTotal = summary.getLibraryValue(sample, libId, "flexiprep", "stats", "seqstat_" + read + "_qc", "bases", "num_total").getOrElse(0).toString.toLong }# <td>${read}</td> <td>${beforeTotal}</td> diff --git a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepFastaqcPlot.ssp b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepFastaqcPlot.ssp index b28ae9049e4920ca113e0cae2f73cb08f070c94a..0ad776bd7cd861d02b3e8be11b9fa9dad6234008 100644 --- a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepFastaqcPlot.ssp +++ b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepFastaqcPlot.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(org.apache.commons.io.FileUtils) #import(java.io.File) diff --git a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepFront.ssp b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepFront.ssp index 1eadda6d44c2f6befd49767896d9c260a3ec7658..006388f6bae5570e4f7ec4bcabca5a5439f9e95b 100644 --- a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepFront.ssp +++ b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepFront.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) <%@ var summary: Summary %> <%@ var rootPath: String %> diff --git a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepInputfiles.ssp b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepInputfiles.ssp index 0ecc838cc733dd80d04455060071e6295246175e..dc0ee78a3334c19eb30733bb76e0d97b73c789d8 100644 --- a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepInputfiles.ssp +++ b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepInputfiles.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(nl.lumc.sasc.biopet.pipelines.flexiprep.FlexiprepReport) #import(java.io.File) diff --git a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepOutputfiles.ssp b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepOutputfiles.ssp index 3807e6c8790c21948927f3e25323d59bc620eee0..f91ba1ea26cadc287ec469a6d4205d7aa6ad5c5e 100644 --- a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepOutputfiles.ssp +++ b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepOutputfiles.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(nl.lumc.sasc.biopet.pipelines.flexiprep.FlexiprepReport) #import(java.io.File) diff --git a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepReadSummary.ssp b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepReadSummary.ssp index 1ec4e7c61d35fbc33fddbfdbd759d71ae1dc38a2..4a0a60659f5276f7230fef7fabd4cc12f23ae4a4 100644 --- a/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepReadSummary.ssp +++ b/public/flexiprep/src/main/resources/nl/lumc/sasc/biopet/pipelines/flexiprep/flexiprepReadSummary.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(nl.lumc.sasc.biopet.pipelines.flexiprep.FlexiprepReport) #import(java.io.File) @@ -133,7 +133,7 @@ #if (read == "R2") </tr><tr> #end #{ val beforeTotal = summary.getLibraryValue(sample, libId, "flexiprep", "stats", "seqstat_" + read, "reads", "num_total") - val afterTotal = summary.getLibraryValue(sample, libId, "flexiprep", "stats", "seqstat_" + read + "_after", "reads", "num_total") + val afterTotal = summary.getLibraryValue(sample, libId, "flexiprep", "stats", "seqstat_" + read + "_qc", "reads", "num_total") val clippingDiscardedToShort = summary.getLibraryValue(sample, libId, "flexiprep", "stats", "clipping_" + read, "num_reads_discarded_too_short").getOrElse(0).toString.toLong val clippingDiscardedToLong = summary.getLibraryValue(sample, libId, "flexiprep", "stats", "clipping_" + read, "num_reads_discarded_too_long").getOrElse(0).toString.toLong val trimmingDiscarded = summary.getLibraryValue(sample, libId, "flexiprep", "stats", "trimming", "num_reads_discarded_" + read).getOrElse(0).toString.toLong diff --git a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Cutadapt.scala b/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Cutadapt.scala deleted file mode 100644 index 5e8936c8a3b7eeaf927113db93eebd63dbecb708..0000000000000000000000000000000000000000 --- a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Cutadapt.scala +++ /dev/null @@ -1,87 +0,0 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.pipelines.flexiprep - -import java.io.File - -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.extensions.Ln - -import scala.collection.mutable -import scala.io.Source - -class Cutadapt(root: Configurable) extends nl.lumc.sasc.biopet.extensions.Cutadapt(root) { - var fastqc: Fastqc = _ - - override def beforeCmd() { - super.beforeCmd() - - val foundAdapters = fastqc.foundAdapters.map(_.seq) - if (default_clip_mode == "3") opt_adapter ++= foundAdapters - else if (default_clip_mode == "5") opt_front ++= foundAdapters - else if (default_clip_mode == "both") opt_anywhere ++= foundAdapters - } - - override def summaryStats: Map[String, Any] = { - val trimR = """.*Trimmed reads: *(\d*) .*""".r - val tooShortR = """.*Too short reads: *(\d*) .*""".r - val tooLongR = """.*Too long reads: *(\d*) .*""".r - val adapterR = """Adapter '([C|T|A|G]*)'.*trimmed (\d*) times.""".r - - val stats: mutable.Map[String, Int] = mutable.Map("trimmed" -> 0, "tooshort" -> 0, "toolong" -> 0) - val adapter_stats: mutable.Map[String, List[Any]] = mutable.Map() - - if (stats_output.exists) for (line <- Source.fromFile(stats_output).getLines()) { - line match { - case trimR(m) => stats += ("trimmed" -> m.toInt) - case tooShortR(m) => stats += ("tooshort" -> m.toInt) - case tooLongR(m) => stats += ("toolong" -> m.toInt) - case adapterR(adapter, count) => - val adapterName = fastqc.foundAdapters.find(_.seq == adapter) match { - case None => "unknown" - case Some(a) => a.name - } - adapter_stats += (adapterName -> List(adapter, count.toInt)) - case _ => - } - } - - Map("num_reads_affected" -> stats("trimmed"), - "num_reads_discarded_too_short" -> stats("tooshort"), - "num_reads_discarded_too_long" -> stats("toolong"), - "adapters" -> adapter_stats.toMap - ) - } - override def cmdLine = { - if (opt_adapter.nonEmpty || opt_anywhere.nonEmpty || opt_front.nonEmpty) { - analysisName = getClass.getSimpleName - super.cmdLine - } else { - analysisName = getClass.getSimpleName + "-ln" - Ln(this, fastq_input, fastq_output, relative = true).cmd - } - } -} - -object Cutadapt { - def apply(root: Configurable, input: File, output: File): Cutadapt = { - val cutadapt = new Cutadapt(root) - cutadapt.fastq_input = input - cutadapt.fastq_output = output - cutadapt.stats_output = new File(output.getAbsolutePath.substring(0, output.getAbsolutePath.lastIndexOf(".")) + ".stats") - cutadapt - } -} diff --git a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Fastqc.scala b/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Fastqc.scala index 0075eb7c5e4350ab93a46f06cfd5faf95246efd0..8de84c5081db4b9801222c6853e2b185a54741e6 100644 --- a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Fastqc.scala +++ b/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Fastqc.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.pipelines.flexiprep import java.io.{ File, FileNotFoundException } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.Summarizable import org.broadinstitute.gatk.utils.commandline.Output @@ -181,9 +181,6 @@ class Fastqc(root: Configurable) extends nl.lumc.sasc.biopet.extensions.Fastqc(r } else Set() } - @Output - var outputFiles: List[File] = Nil - def summaryFiles: Map[String, File] = { val outputFiles = Map("plot_duplication_levels" -> ("Images" + File.separator + "duplication_levels.png"), "plot_kmer_profiles" -> ("Images" + File.separator + "kmer_profiles.png"), @@ -204,7 +201,8 @@ class Fastqc(root: Configurable) extends nl.lumc.sasc.biopet.extensions.Fastqc(r def summaryStats: Map[String, Any] = Map( "per_base_sequence_quality" -> perBaseSequenceQuality, - "per_base_sequence_content" -> perBaseSequenceContent) + "per_base_sequence_content" -> perBaseSequenceContent, + "adapters" -> foundAdapters.map(x => x.name -> x.seq).toMap) } object Fastqc { diff --git a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Flexiprep.scala b/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Flexiprep.scala index d6eae4e6214dec29146faccdec6ffd3fa13d2182..e53c55cc53bd4cddab07de2d1fe7a1f5bc25feb9 100644 --- a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Flexiprep.scala +++ b/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/Flexiprep.scala @@ -15,11 +15,11 @@ */ package nl.lumc.sasc.biopet.pipelines.flexiprep -import nl.lumc.sasc.biopet.core.config.Configurable import nl.lumc.sasc.biopet.core.summary.SummaryQScript -import nl.lumc.sasc.biopet.core.{ PipelineCommand, SampleLibraryTag } -import nl.lumc.sasc.biopet.extensions._ -import nl.lumc.sasc.biopet.tools.{ FastqSync, SeqStat } +import nl.lumc.sasc.biopet.core.{ BiopetFifoPipe, PipelineCommand, SampleLibraryTag } +import nl.lumc.sasc.biopet.extensions.{ Zcat, Gzip } +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.extensions.tools.{ SeqStat, FastqSync } import org.broadinstitute.gatk.queue.QScript class Flexiprep(val root: Configurable) extends QScript with SummaryQScript with SampleLibraryTag { @@ -37,13 +37,16 @@ class Flexiprep(val root: Configurable) extends QScript with SummaryQScript with /** Skip Clip fastq files */ var skipClip: Boolean = config("skip_clip", default = false) + /** Make a final fastq files, by default only when flexiprep is the main pipeline */ + var keepQcFastqFiles: Boolean = config("keepQcFastqFiles", default = root == null) + /** Location of summary file */ def summaryFile = new File(outputDir, sampleId.getOrElse("x") + "-" + libId.getOrElse("x") + ".qc.summary.json") /** Returns files to store in summary */ def summaryFiles: Map[String, File] = { - Map("input_R1" -> input_R1, "output_R1" -> outputFiles("output_R1_gzip")) ++ - (if (paired) Map("input_R2" -> input_R2.get, "output_R2" -> outputFiles("output_R2_gzip")) else Map()) + Map("input_R1" -> input_R1, "output_R1" -> fastqR1Qc) ++ + (if (paired) Map("input_R2" -> input_R2.get, "output_R2" -> fastqR2Qc.get) else Map()) } /** returns settings to store in summary */ @@ -79,6 +82,9 @@ class Flexiprep(val root: Configurable) extends QScript with SummaryQScript with paired = input_R2.isDefined + inputFiles :+= new InputFile(input_R1) + input_R2.foreach(inputFiles :+= new InputFile(_)) + if (input_R1.endsWith(".gz")) R1_name = input_R1.getName.substring(0, input_R1.getName.lastIndexOf(".gz")) else if (input_R1.endsWith(".gzip")) R1_name = input_R1.getName.substring(0, input_R1.getName.lastIndexOf(".gzip")) else R1_name = input_R1.getName @@ -101,8 +107,8 @@ class Flexiprep(val root: Configurable) extends QScript with SummaryQScript with def biopetScript() { runInitialJobs() - val out = if (paired) runTrimClip(outputFiles("fastq_input_R1"), Some(outputFiles("fastq_input_R2")), outputDir) - else runTrimClip(outputFiles("fastq_input_R1"), outputDir) + if (paired) runTrimClip(input_R1, input_R2, outputDir) + else runTrimClip(input_R1, outputDir) val R1_files = for ((k, v) <- outputFiles if k.endsWith("output_R1")) yield v val R2_files = for ((k, v) <- outputFiles if k.endsWith("output_R2")) yield v @@ -111,8 +117,8 @@ class Flexiprep(val root: Configurable) extends QScript with SummaryQScript with /** Add init non chunkable jobs */ def runInitialJobs() { - outputFiles += ("fastq_input_R1" -> extractIfNeeded(input_R1, outputDir)) - if (paired) outputFiles += ("fastq_input_R2" -> extractIfNeeded(input_R2.get, outputDir)) + outputFiles += ("fastq_input_R1" -> input_R1) + if (paired) outputFiles += ("fastq_input_R2" -> input_R2.get) fastqc_R1 = Fastqc(this, input_R1, new File(outputDir, R1_name + ".fastqc/")) add(fastqc_R1) @@ -125,184 +131,153 @@ class Flexiprep(val root: Configurable) extends QScript with SummaryQScript with addSummarizable(fastqc_R2, "fastqc_R2") outputFiles += ("fastqc_R2" -> fastqc_R2.output) } + + val seqstat_R1 = SeqStat(this, input_R1, outputDir) + seqstat_R1.isIntermediate = true + add(seqstat_R1) + addSummarizable(seqstat_R1, "seqstat_R1") + + if (paired) { + val seqstat_R2 = SeqStat(this, input_R2.get, outputDir) + seqstat_R2.isIntermediate = true + add(seqstat_R2) + addSummarizable(seqstat_R2, "seqstat_R2") + } } - //TODO: Refactor need to combine all this functions + def fastqR1Qc = if (paired) + new File(outputDir, s"${sampleId.getOrElse("x")}-${libId.getOrElse("x")}.R1.qc.sync.fq.gz") + else new File(outputDir, s"${sampleId.getOrElse("x")}-${libId.getOrElse("x")}.R1.qc.fq.gz") + def fastqR2Qc = if (paired) + Some(new File(outputDir, s"${sampleId.getOrElse("x")}-${libId.getOrElse("x")}.R2.qc.sync.fq.gz")) + else None /** Adds all chunkable jobs of flexiprep */ - def runTrimClip(R1_in: File, outDir: File, chunk: String): (File, Option[File], List[File]) = + def runTrimClip(R1_in: File, outDir: File, chunk: String): (File, Option[File]) = runTrimClip(R1_in, None, outDir, chunk) /** Adds all chunkable jobs of flexiprep */ - def runTrimClip(R1_in: File, outDir: File): (File, Option[File], List[File]) = + def runTrimClip(R1_in: File, outDir: File): (File, Option[File]) = runTrimClip(R1_in, None, outDir, "") /** Adds all chunkable jobs of flexiprep */ - def runTrimClip(R1_in: File, R2_in: Option[File], outDir: File): (File, Option[File], List[File]) = + def runTrimClip(R1_in: File, R2_in: Option[File], outDir: File): (File, Option[File]) = runTrimClip(R1_in, R2_in, outDir, "") /** Adds all chunkable jobs of flexiprep */ - def runTrimClip(R1_in: File, R2_in: Option[File], outDir: File, chunkarg: String): (File, Option[File], List[File]) = { + def runTrimClip(R1_in: File, + R2_in: Option[File], + outDir: File, + chunkarg: String): (File, Option[File]) = { val chunk = if (chunkarg.isEmpty || chunkarg.endsWith("_")) chunkarg else chunkarg + "_" - var results: Map[String, File] = Map() var R1 = R1_in var R2 = R2_in - var deps_R1 = R1 :: Nil - var deps_R2 = if (paired) R2.get :: Nil else Nil - def deps: List[File] = deps_R1 ::: deps_R2 - - val seqtkSeq_R1 = SeqtkSeq(this, R1, swapExt(outDir, R1, R1_ext, ".sanger" + R1_ext), fastqc_R1) - seqtkSeq_R1.isIntermediate = true - add(seqtkSeq_R1) - addSummarizable(seqtkSeq_R1, "seqtkSeq_R1") - R1 = seqtkSeq_R1.output - deps_R1 ::= R1 - if (paired) { - val seqtkSeq_R2 = SeqtkSeq(this, R2.get, swapExt(outDir, R2.get, R2_ext, ".sanger" + R2_ext), fastqc_R2) - seqtkSeq_R2.isIntermediate = true - add(seqtkSeq_R2) - addSummarizable(seqtkSeq_R2, "seqtkSeq_R2") - R2 = Some(seqtkSeq_R2.output) - deps_R2 ::= R2.get - } - - val seqstat_R1 = SeqStat(this, R1, outDir) - seqstat_R1.isIntermediate = true - seqstat_R1.deps = deps_R1 - add(seqstat_R1) - addSummarizable(seqstat_R1, "seqstat_R1") + val qcCmdR1 = new QcCommand(this, fastqc_R1) + qcCmdR1.input = R1_in + qcCmdR1.read = "R1" + qcCmdR1.output = if (paired) new File(outDir, fastqR1Qc.getName.stripSuffix(".gz")) + else fastqR1Qc + qcCmdR1.deps :+= fastqc_R1.output + qcCmdR1.isIntermediate = paired || !keepQcFastqFiles + addSummarizable(qcCmdR1, "qc_command_R1") if (paired) { - val seqstat_R2 = SeqStat(this, R2.get, outDir) - seqstat_R2.isIntermediate = true - seqstat_R2.deps = deps_R2 - add(seqstat_R2) - addSummarizable(seqstat_R2, "seqstat_R2") - } - - if (!skipClip) { // Adapter clipping - - val cutadapt_R1 = Cutadapt(this, R1, swapExt(outDir, R1, R1_ext, ".clip" + R1_ext)) - cutadapt_R1.fastqc = fastqc_R1 - cutadapt_R1.isIntermediate = true - cutadapt_R1.deps = deps_R1 - add(cutadapt_R1) - addSummarizable(cutadapt_R1, "clipping_R1") - R1 = cutadapt_R1.fastq_output - deps_R1 ::= R1 - outputFiles += ("cutadapt_R1_stats" -> cutadapt_R1.stats_output) - - if (paired) { - val cutadapt_R2 = Cutadapt(this, R2.get, swapExt(outDir, R2.get, R2_ext, ".clip" + R2_ext)) - outputFiles += ("cutadapt_R2_stats" -> cutadapt_R2.stats_output) - cutadapt_R2.fastqc = fastqc_R2 - cutadapt_R2.isIntermediate = true - cutadapt_R2.deps = deps_R2 - add(cutadapt_R2) - addSummarizable(cutadapt_R2, "clipping_R2") - R2 = Some(cutadapt_R2.fastq_output) - deps_R2 ::= R2.get - - val fqSync = new FastqSync(this) - fqSync.refFastq = cutadapt_R1.fastq_input - fqSync.inputFastq1 = cutadapt_R1.fastq_output - fqSync.inputFastq2 = cutadapt_R2.fastq_output - fqSync.outputFastq1 = swapExt(outDir, R1, R1_ext, ".sync" + R1_ext) - fqSync.outputFastq2 = swapExt(outDir, R2.get, R2_ext, ".sync" + R2_ext) - fqSync.outputStats = swapExt(outDir, R1, R1_ext, ".sync.stats") - fqSync.deps :::= deps - add(fqSync) - addSummarizable(fqSync, "fastq_sync") - outputFiles += ("syncStats" -> fqSync.outputStats) - R1 = fqSync.outputFastq1 - R2 = Some(fqSync.outputFastq2) - deps_R1 ::= R1 - deps_R2 ::= R2.get + val qcCmdR2 = new QcCommand(this, fastqc_R2) + qcCmdR2.input = R2_in.get + qcCmdR2.output = new File(outDir, fastqR2Qc.get.getName.stripSuffix(".gz")) + qcCmdR2.read = "R2" + addSummarizable(qcCmdR2, "qc_command_R2") + + qcCmdR1.compress = false + qcCmdR2.compress = false + + val fqSync = new FastqSync(this) + fqSync.refFastq = R1_in + fqSync.inputFastq1 = qcCmdR1.output + fqSync.inputFastq2 = qcCmdR2.output + fqSync.outputFastq1 = new File(outDir, fastqR1Qc.getName) + fqSync.outputFastq2 = new File(outDir, fastqR2Qc.get.getName) + fqSync.outputStats = new File(outDir, s"${sampleId.getOrElse("x")}-${libId.getOrElse("x")}.sync.stats") + + val pipe = new BiopetFifoPipe(this, fqSync :: Nil) { + override def configName = "qc-cmd" + + override def beforeGraph(): Unit = { + fqSync.beforeGraph() + super.beforeGraph() + } + + override def beforeCmd(): Unit = { + qcCmdR1.beforeCmd() + qcCmdR2.beforeCmd() + fqSync.beforeCmd() + commands = qcCmdR1.jobs ::: qcCmdR2.jobs ::: fqSync :: Nil + super.beforeCmd() + } } - } - if (!skipTrim) { // Quality trimming - val sickle = new Sickle(this) - sickle.input_R1 = R1 - sickle.output_R1 = swapExt(outDir, R1, R1_ext, ".trim" + R1_ext) - if (paired) { - sickle.input_R2 = R2.get - sickle.output_R2 = swapExt(outDir, R2.get, R2_ext, ".trim" + R2_ext) - sickle.output_singles = swapExt(outDir, R2.get, R2_ext, ".trim.singles" + R1_ext) - } - sickle.output_stats = swapExt(outDir, R1, R1_ext, ".trim.stats") - sickle.deps = deps - sickle.isIntermediate = true - add(sickle) - addSummarizable(sickle, "trimming") - R1 = sickle.output_R1 - if (paired) R2 = Some(sickle.output_R2) + pipe.deps ::= fastqc_R1.output + pipe.deps ::= fastqc_R2.output + pipe.isIntermediate = !keepQcFastqFiles + add(pipe) + + addSummarizable(fqSync, "fastq_sync") + outputFiles += ("syncStats" -> fqSync.outputStats) + R1 = fqSync.outputFastq1 + R2 = Some(fqSync.outputFastq2) + } else { + add(qcCmdR1) + R1 = qcCmdR1.output } val seqstat_R1_after = SeqStat(this, R1, outDir) - seqstat_R1_after.deps = deps_R1 add(seqstat_R1_after) addSummarizable(seqstat_R1_after, "seqstat_R1_qc") if (paired) { val seqstat_R2_after = SeqStat(this, R2.get, outDir) - seqstat_R2_after.deps = deps_R2 add(seqstat_R2_after) addSummarizable(seqstat_R2_after, "seqstat_R2_qc") } outputFiles += (chunk + "output_R1" -> R1) if (paired) outputFiles += (chunk + "output_R2" -> R2.get) - (R1, R2, deps) + (R1, R2) } /** Adds last non chunkable jobs */ def runFinalize(fastq_R1: List[File], fastq_R2: List[File]) { - if (fastq_R1.length != fastq_R2.length && paired) throw new IllegalStateException("R1 and R2 file number is not the same") - val R1 = new File(outputDir, R1_name + ".qc" + R1_ext + ".gz") - val R2 = new File(outputDir, R2_name + ".qc" + R2_ext + ".gz") + if (fastq_R1.length != fastq_R2.length && paired) + throw new IllegalStateException("R1 and R2 file number is not the same") - add(Gzip(this, fastq_R1, R1)) - if (paired) add(Gzip(this, fastq_R2, R2)) + if (fastq_R1.length > 1) { + val zcat = new Zcat(this) + zcat.input = fastq_R1 + add(zcat | new Gzip(this) > fastqR1Qc) + if (paired) { + val zcat = new Zcat(this) + zcat.input = fastq_R2 + add(zcat | new Gzip(this) > fastqR2Qc.get) + } + } - outputFiles += ("output_R1_gzip" -> R1) - if (paired) outputFiles += ("output_R2_gzip" -> R2) + outputFiles += ("output_R1_gzip" -> fastqR1Qc) + if (paired) outputFiles += ("output_R2_gzip" -> fastqR2Qc.get) - if (!skipTrim || !skipClip) { - fastqc_R1_after = Fastqc(this, R1, new File(outputDir, R1_name + ".qc.fastqc/")) - add(fastqc_R1_after) - addSummarizable(fastqc_R1_after, "fastqc_R1_qc") + fastqc_R1_after = Fastqc(this, fastqR1Qc, new File(outputDir, R1_name + ".qc.fastqc/")) + add(fastqc_R1_after) + addSummarizable(fastqc_R1_after, "fastqc_R1_qc") - if (paired) { - fastqc_R2_after = Fastqc(this, R2, new File(outputDir, R2_name + ".qc.fastqc/")) - add(fastqc_R2_after) - addSummarizable(fastqc_R2_after, "fastqc_R2_qc") - } + if (paired) { + fastqc_R2_after = Fastqc(this, fastqR2Qc.get, new File(outputDir, R2_name + ".qc.fastqc/")) + add(fastqc_R2_after) + addSummarizable(fastqc_R2_after, "fastqc_R2_qc") } addSummaryJobs() } - - /** Extracts file if file is compressed */ - def extractIfNeeded(file: File, runDir: File): File = { - if (file == null) file - else if (file.getName.endsWith(".gz") || file.getName.endsWith(".gzip")) { - var newFile: File = swapExt(runDir, file, ".gz", "") - if (file.getName.endsWith(".gzip")) newFile = swapExt(runDir, file, ".gzip", "") - val zcatCommand = Zcat(this, file, newFile) - zcatCommand.isIntermediate = true - add(zcatCommand) - newFile - } else if (file.getName.endsWith(".bz2")) { - val newFile = swapExt(runDir, file, ".bz2", "") - val pbzip2 = Pbzip2(this, file, newFile) - pbzip2.isIntermediate = true - add(pbzip2) - newFile - } else file - } } object Flexiprep extends PipelineCommand diff --git a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/FlexiprepReport.scala b/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/FlexiprepReport.scala index ab8846bc1ea091a1abded962181e63d49af3adb2..075077872a314c2a510478ff8a151a4ed7f5ba9c 100644 --- a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/FlexiprepReport.scala +++ b/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/FlexiprepReport.scala @@ -17,10 +17,10 @@ package nl.lumc.sasc.biopet.pipelines.flexiprep import java.io.{ File, PrintWriter } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.report.{ ReportBuilderExtension, ReportBuilder, ReportPage, ReportSection } -import nl.lumc.sasc.biopet.core.summary.{ Summary, SummaryValue } -import nl.lumc.sasc.biopet.extensions.rscript.StackedBarPlot +import nl.lumc.sasc.biopet.utils.rscript.StackedBarPlot +import nl.lumc.sasc.biopet.utils.summary.{ Summary, SummaryValue } class FlexiprepReport(val root: Configurable) extends ReportBuilderExtension { val builder = FlexiprepReport @@ -95,7 +95,7 @@ object FlexiprepReport extends ReportBuilder { def getLine(summary: Summary, sample: String, lib: String): String = { val beforeTotal = new SummaryValue(List("flexiprep", "stats", "seqstat_" + read, "reads", "num_total"), summary, Some(sample), Some(lib)).value.getOrElse(0).toString.toLong - val afterTotal = new SummaryValue(List("flexiprep", "stats", "seqstat_" + read + "_after", "reads", "num_total"), + val afterTotal = new SummaryValue(List("flexiprep", "stats", "seqstat_" + read + "_qc", "reads", "num_total"), summary, Some(sample), Some(lib)).value.getOrElse(0).toString.toLong val clippingDiscardedToShort = new SummaryValue(List("flexiprep", "stats", "clipping_" + read, "num_reads_discarded_too_short"), summary, Some(sample), Some(lib)).value.getOrElse(0).toString.toLong @@ -152,7 +152,7 @@ object FlexiprepReport extends ReportBuilder { def getLine(summary: Summary, sample: String, lib: String): String = { val beforeTotal = new SummaryValue(List("flexiprep", "stats", "seqstat_" + read, "bases", "num_total"), summary, Some(sample), Some(lib)).value.getOrElse(0).toString.toLong - val afterTotal = new SummaryValue(List("flexiprep", "stats", "seqstat_" + read + "_after", "bases", "num_total"), + val afterTotal = new SummaryValue(List("flexiprep", "stats", "seqstat_" + read + "_qc", "bases", "num_total"), summary, Some(sample), Some(lib)).value.getOrElse(0).toString.toLong val sb = new StringBuffer() diff --git a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/QcCommand.scala b/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/QcCommand.scala new file mode 100644 index 0000000000000000000000000000000000000000..3e516b638398f008435c5c5a982e09efcc335a86 --- /dev/null +++ b/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/QcCommand.scala @@ -0,0 +1,144 @@ +package nl.lumc.sasc.biopet.pipelines.flexiprep + +import java.io.File + +import nl.lumc.sasc.biopet.core.summary.{ SummaryQScript, Summarizable } +import nl.lumc.sasc.biopet.core.{ BiopetFifoPipe, BiopetCommandLineFunction, BiopetPipe } +import nl.lumc.sasc.biopet.extensions.{ Cat, Gzip, Sickle, Cutadapt } +import nl.lumc.sasc.biopet.extensions.seqtk.SeqtkSeq +import nl.lumc.sasc.biopet.utils.config.Configurable +import org.broadinstitute.gatk.utils.commandline.{ Output, Input } + +/** + * Created by pjvan_thof on 9/22/15. + */ +class QcCommand(val root: Configurable, val fastqc: Fastqc) extends BiopetCommandLineFunction with Summarizable { + + val flexiprep = root match { + case f: Flexiprep => f + case _ => throw new IllegalArgumentException("This class may only be used inside Flexiprep") + } + + @Input(required = true) + var input: File = _ + + @Output(required = true) + var output: File = _ + + var compress = true + + var read: String = _ + + override def defaultCoreMemory = 2.0 + override def defaultThreads = 3 + + val seqtk = new SeqtkSeq(root) + var clip: Option[Cutadapt] = None + var trim: Option[Sickle] = None + var outputCommand: BiopetCommandLineFunction = null + + def jobs = (Some(seqtk) :: clip :: trim :: Some(outputCommand) :: Nil).flatten + + def summaryFiles = Map() + + def summaryStats = Map() + + override def addToQscriptSummary(qscript: SummaryQScript, name: String): Unit = { + clip match { + case Some(job) => qscript.addSummarizable(job, s"clipping_$read") + case _ => + } + trim match { + case Some(job) => qscript.addSummarizable(job, s"trimming_$read") + case _ => + } + } + + override def beforeGraph(): Unit = { + super.beforeGraph() + require(read != null) + deps ::= input + outputFiles :+= output + } + + override def beforeCmd(): Unit = { + seqtk.input = input + seqtk.output = new File(output.getParentFile, input.getName + ".seqtk.fq") + seqtk.Q = fastqc.encoding match { + case null => None + case enc if enc.contains("Sanger / Illumina 1.9") => None + case enc if enc.contains("Illumina <1.3") => Option(64) + case enc if enc.contains("Illumina 1.3") => Option(64) + case enc if enc.contains("Illumina 1.5") => Option(64) + case _ => None + } + if (seqtk.Q.isDefined) seqtk.V = true + + clip = if (!flexiprep.skipClip) { + val foundAdapters = fastqc.foundAdapters.map(_.seq) + if (foundAdapters.nonEmpty) { + val cutadept = new Cutadapt(root) + cutadept.fastq_input = seqtk.output + cutadept.fastq_output = new File(output.getParentFile, input.getName + ".cutadept.fq") + cutadept.stats_output = new File(flexiprep.outputDir, s"${flexiprep.sampleId.getOrElse("x")}-${flexiprep.libId.getOrElse("x")}.$read.clip.stats") + if (cutadept.default_clip_mode == "3") cutadept.opt_adapter ++= foundAdapters + else if (cutadept.default_clip_mode == "5") cutadept.opt_front ++= foundAdapters + else if (cutadept.default_clip_mode == "both") cutadept.opt_anywhere ++= foundAdapters + Some(cutadept) + } else None + } else None + + trim = if (!flexiprep.skipTrim) { + val sickle = new Sickle(root) + sickle.output_stats = new File(flexiprep.outputDir, s"${flexiprep.sampleId.getOrElse("x")}-${flexiprep.libId.getOrElse("x")}.$read.trim.stats") + sickle.input_R1 = clip match { + case Some(clip) => clip.fastq_output + case _ => seqtk.output + } + sickle.output_R1 = new File(output.getParentFile, input.getName + ".sickle.fq") + Some(sickle) + } else None + + val outputFile = (clip, trim) match { + case (_, Some(trim)) => trim.output_R1 + case (Some(clip), _) => clip.fastq_output + case _ => seqtk.output + } + + if (compress) outputCommand = { + val gzip = new Gzip(root) + gzip.output = output + outputFile :<: gzip + } + else outputCommand = { + val cat = new Cat(root) + cat.input = outputFile :: Nil + cat.output = output + cat + } + + seqtk.beforeGraph() + clip.foreach(_.beforeGraph()) + trim.foreach(_.beforeGraph()) + outputCommand.beforeGraph() + + seqtk.beforeCmd() + clip.foreach(_.beforeCmd()) + trim.foreach(_.beforeCmd()) + outputCommand.beforeCmd() + } + + def cmdLine = { + + val cmd = (clip, trim) match { + case (Some(clip), Some(trim)) => new BiopetFifoPipe(root, seqtk :: clip :: trim :: outputCommand :: Nil) + case (Some(clip), _) => new BiopetFifoPipe(root, seqtk :: clip :: outputCommand :: Nil) + case (_, Some(trim)) => new BiopetFifoPipe(root, seqtk :: trim :: outputCommand :: Nil) + case _ => new BiopetFifoPipe(root, seqtk :: outputCommand :: Nil) + } + + //val cmds = (Some(seqtk) :: clip :: trim :: Some(new Gzip(root)) :: Nil).flatten + cmd.beforeGraph() + cmd.commandLine + } +} diff --git a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/SeqtkSeq.scala b/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/SeqtkSeq.scala deleted file mode 100644 index a6aeac90951fbf0e52b1a1fe283a3a5a625c8306..0000000000000000000000000000000000000000 --- a/public/flexiprep/src/main/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/SeqtkSeq.scala +++ /dev/null @@ -1,65 +0,0 @@ -/** - * Biopet is built on top of GATK Queue for building bioinformatic - * pipelines. It is mainly intended to support LUMC SHARK cluster which is running - * SGE. But other types of HPC that are supported by GATK Queue (such as PBS) - * should also be able to execute Biopet tools and pipelines. - * - * Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center - * - * Contact us at: sasc@lumc.nl - * - * A dual licensing mode is applied. The source code within this project that are - * not part of GATK Queue is freely available for non-commercial use under an AGPL - * license; For commercial users or users who do not want to follow the AGPL - * license, please contact us to obtain a separate license. - */ -package nl.lumc.sasc.biopet.pipelines.flexiprep - -import java.io.File - -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.extensions.Ln - -class SeqtkSeq(root: Configurable) extends nl.lumc.sasc.biopet.extensions.seqtk.SeqtkSeq(root) { - var fastqc: Fastqc = _ - - override def beforeCmd() { - super.beforeCmd() - if (fastqc != null && Q.isEmpty) { - val encoding = fastqc.encoding - Q = encoding match { - case null => None - case enc if enc.contains("Sanger / Illumina 1.9") => None - case enc if enc.contains("Illumina <1.3") => Option(64) - case enc if enc.contains("Illumina 1.3") => Option(64) - case enc if enc.contains("Illumina 1.5") => Option(64) - case _ => None - } - if (Q.isDefined) V = true - } - } - - override def beforeGraph() { - if (fastqc != null) deps ::= fastqc.output - } - - override def cmdLine = { - if (Q.isDefined) { - analysisName = getClass.getSimpleName - super.cmdLine - } else { - analysisName = getClass.getSimpleName + "-ln" - Ln(this, input, output).cmd - } - } -} - -object SeqtkSeq { - def apply(root: Configurable, input: File, output: File, fastqc: Fastqc = null): SeqtkSeq = { - val seqtkSeq = new SeqtkSeq(root) - seqtkSeq.input = input - seqtkSeq.output = output - seqtkSeq.fastqc = fastqc - seqtkSeq - } -} \ No newline at end of file diff --git a/public/flexiprep/src/test/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/FlexiprepTest.scala b/public/flexiprep/src/test/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/FlexiprepTest.scala index 88a9ad1bcc1ce57207b31c5c8a587723d932dba6..cce952717b29e94556d3535cab4c5830d33a1cfe 100644 --- a/public/flexiprep/src/test/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/FlexiprepTest.scala +++ b/public/flexiprep/src/test/scala/nl/lumc/sasc/biopet/pipelines/flexiprep/FlexiprepTest.scala @@ -18,9 +18,9 @@ package nl.lumc.sasc.biopet.pipelines.flexiprep import java.io.File import com.google.common.io.Files -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import nl.lumc.sasc.biopet.extensions.{ Gzip, Sickle, Zcat } -import nl.lumc.sasc.biopet.tools.{ FastqSync, SeqStat } +import nl.lumc.sasc.biopet.extensions.tools.{ FastqSync, SeqStat } import nl.lumc.sasc.biopet.utils.ConfigUtils import org.apache.commons.io.FileUtils import org.broadinstitute.gatk.queue.QSettings @@ -67,24 +67,14 @@ class FlexiprepTest extends TestNGSuite with Matchers { ), Map(FlexiprepTest.executables.toSeq: _*)) val flexiprep: Flexiprep = initPipeline(map) - flexiprep.input_R1 = new File(flexiprep.outputDir, "bla_R1.fq" + (if (zipped) ".gz" else "")) - if (paired) flexiprep.input_R2 = Some(new File(flexiprep.outputDir, "bla_R2.fq" + (if (zipped) ".gz" else ""))) + flexiprep.input_R1 = (if (zipped) FlexiprepTest.r1Zipped else FlexiprepTest.r1) + if (paired) flexiprep.input_R2 = Some((if (zipped) FlexiprepTest.r2Zipped else FlexiprepTest.r2)) flexiprep.sampleId = Some("1") flexiprep.libId = Some("1") flexiprep.script() - flexiprep.functions.count(_.isInstanceOf[Fastqc]) shouldBe ( - if (paired && (skipClip && skipTrim)) 2 - else if (!paired && (skipClip && skipTrim)) 1 - else if (paired && !(skipClip && skipTrim)) 4 - else if (!paired && !(skipClip && skipTrim)) 2) + flexiprep.functions.count(_.isInstanceOf[Fastqc]) shouldBe (if (paired) 4 else 2) flexiprep.functions.count(_.isInstanceOf[SeqStat]) shouldBe (if (paired) 4 else 2) - flexiprep.functions.count(_.isInstanceOf[Zcat]) shouldBe (if (zipped) if (paired) 2 else 1 else 0) - flexiprep.functions.count(_.isInstanceOf[SeqtkSeq]) shouldBe (if (paired) 2 else 1) - flexiprep.functions.count(_.isInstanceOf[Cutadapt]) shouldBe (if (skipClip) 0 else if (paired) 2 else 1) - flexiprep.functions.count(_.isInstanceOf[FastqSync]) shouldBe (if (skipClip) 0 else if (paired) 1 else 0) - flexiprep.functions.count(_.isInstanceOf[Sickle]) shouldBe (if (skipTrim) 0 else 1) - flexiprep.functions.count(_.isInstanceOf[Gzip]) shouldBe (if (paired) 2 else 1) } // remove temporary run directory all tests in the class have been run @@ -95,6 +85,16 @@ class FlexiprepTest extends TestNGSuite with Matchers { object FlexiprepTest { val outputDir = Files.createTempDir() + new File(outputDir, "input").mkdirs() + + val r1 = new File(outputDir, "input" + File.separator + "R1.fq") + Files.touch(r1) + val r2 = new File(outputDir, "input" + File.separator + "R2.fq") + Files.touch(r2) + val r1Zipped = new File(outputDir, "input" + File.separator + "R1.fq.gz") + Files.touch(r1Zipped) + val r2Zipped = new File(outputDir, "input" + File.separator + "R2.fq.gz") + Files.touch(r2Zipped) val executables = Map( "seqstat" -> Map("exe" -> "test"), diff --git a/public/gears/pom.xml b/public/gears/pom.xml index 78efac5b37dc935d21304ce8a18388d284eb2b95..8d09f66d1528a295e18ef5467f2ebd3fa99d8657 100644 --- a/public/gears/pom.xml +++ b/public/gears/pom.xml @@ -22,7 +22,7 @@ <parent> <artifactId>Biopet</artifactId> <groupId>nl.lumc.sasc</groupId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> </parent> <modelVersion>4.0.0</modelVersion> @@ -32,7 +32,7 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/public/gears/src/main/scala/nl/lumc/sasc/biopet/pipelines/gears/Gears.scala b/public/gears/src/main/scala/nl/lumc/sasc/biopet/pipelines/gears/Gears.scala index 0e14359fd649a6d1d56ec5c3aea23fc53f1cbec7..c630b0b3e2d13e4c31466da08d984e6269e18d65 100644 --- a/public/gears/src/main/scala/nl/lumc/sasc/biopet/pipelines/gears/Gears.scala +++ b/public/gears/src/main/scala/nl/lumc/sasc/biopet/pipelines/gears/Gears.scala @@ -16,16 +16,15 @@ package nl.lumc.sasc.biopet.pipelines.gears import htsjdk.samtools.SamReaderFactory -import nl.lumc.sasc.biopet.FullVersion import nl.lumc.sasc.biopet.core.{ PipelineCommand, MultiSampleQScript } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.extensions.Ln import nl.lumc.sasc.biopet.extensions.kraken.{ Kraken, KrakenReport } import nl.lumc.sasc.biopet.extensions.picard.{ AddOrReplaceReadGroups, MarkDuplicates, MergeSamFiles, SamToFastq } import nl.lumc.sasc.biopet.extensions.sambamba.SambambaView import nl.lumc.sasc.biopet.pipelines.bammetrics.BamMetrics import nl.lumc.sasc.biopet.pipelines.mapping.Mapping -import nl.lumc.sasc.biopet.tools.FastqSync +import nl.lumc.sasc.biopet.extensions.tools.FastqSync import org.broadinstitute.gatk.queue.QScript import scala.collection.JavaConversions._ diff --git a/public/gentrap/pom.xml b/public/gentrap/pom.xml index 1066c2de6e9ec0677c2ddf9a41f8047e818fc1b0..44da56eaf57596f6999276e183f1e1808f9d26b6 100644 --- a/public/gentrap/pom.xml +++ b/public/gentrap/pom.xml @@ -25,7 +25,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -33,11 +33,6 @@ <name>Gentrap</name> <dependencies> - <dependency> - <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> - <version>${project.version}</version> - </dependency> <dependency> <groupId>nl.lumc.sasc</groupId> <artifactId>Mapping</artifactId> diff --git a/public/gentrap/src/main/resources/nl/lumc/sasc/biopet/pipelines/gentrap/gentrapFront.ssp b/public/gentrap/src/main/resources/nl/lumc/sasc/biopet/pipelines/gentrap/gentrapFront.ssp index 89d46c7d307ba6839d3eaa40e462764dc12af67a..c1ceb9da2a57464f9b2efe7ecc0ff21b6961c813 100644 --- a/public/gentrap/src/main/resources/nl/lumc/sasc/biopet/pipelines/gentrap/gentrapFront.ssp +++ b/public/gentrap/src/main/resources/nl/lumc/sasc/biopet/pipelines/gentrap/gentrapFront.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) <%@ var summary: Summary %> <table class="table"> <tbody> diff --git a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/Gentrap.scala b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/Gentrap.scala index 27d5f9f0844d3435053f5e473c462e830ba76b6c..724470df1091228bbdaf928b49ae7024f2bb70bd 100644 --- a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/Gentrap.scala +++ b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/Gentrap.scala @@ -19,7 +19,7 @@ import java.io.File import nl.lumc.sasc.biopet.FullVersion import nl.lumc.sasc.biopet.core._ -import nl.lumc.sasc.biopet.core.config._ +import nl.lumc.sasc.biopet.utils.config._ import nl.lumc.sasc.biopet.core.summary._ import nl.lumc.sasc.biopet.extensions.picard.{ MergeSamFiles, SortSam } import nl.lumc.sasc.biopet.extensions.samtools.SamtoolsView @@ -29,7 +29,7 @@ import nl.lumc.sasc.biopet.pipelines.bamtobigwig.Bam2Wig import nl.lumc.sasc.biopet.pipelines.gentrap.extensions.{ CustomVarScan, Pdflatex, RawBaseCounter } import nl.lumc.sasc.biopet.pipelines.gentrap.scripts.{ AggrBaseCount, PdfReportTemplateWriter, PlotHeatmap } import nl.lumc.sasc.biopet.pipelines.mapping.Mapping -import nl.lumc.sasc.biopet.tools.{ MergeTables, WipeReads } +import nl.lumc.sasc.biopet.extensions.tools.{ MergeTables, WipeReads } import nl.lumc.sasc.biopet.utils.ConfigUtils import org.broadinstitute.gatk.queue.QScript import org.broadinstitute.gatk.queue.function.QFunction @@ -100,24 +100,23 @@ class Gentrap(val root: Configurable) extends QScript }) /** Default pipeline config */ - override def defaults = ConfigUtils.mergeMaps( - Map( - "gsnap" -> Map( - "novelsplicing" -> 1, - "batch" -> 4, - "format" -> "sam" - ), - "cutadapt" -> Map("minimum_length" -> 20), - // avoid conflicts when merging since the MarkDuplicate tags often cause merges to fail - "picard" -> Map( - "programrecordid" -> "null" - ), - // disable markduplicates since it may not play well with all aligners (this can still be overriden via config) - "mapping" -> Map( - "skip_markduplicates" -> true, - "skip_metrics" -> true - ) - ), super.defaults) + override def defaults = Map( + "gsnap" -> Map( + "novelsplicing" -> 1, + "batch" -> 4, + "format" -> "sam" + ), + "cutadapt" -> Map("minimum_length" -> 20), + // avoid conflicts when merging since the MarkDuplicate tags often cause merges to fail + "picard" -> Map( + "programrecordid" -> "null" + ), + // disable markduplicates since it may not play well with all aligners (this can still be overriden via config) + "mapping" -> Map( + "skip_markduplicates" -> true, + "skip_metrics" -> true + ) + ) /** Adds output merge jobs for the given expression mode */ // TODO: can we combine the enum with the file extension (to reduce duplication and potential errors) @@ -552,7 +551,7 @@ class Gentrap(val root: Configurable) extends QScript job.input = alnFile job.b = true job.h = true - job.f = List("0x40") + job.f = List("0x80") job.F = List("0x10") job.output = createFile(".r2.bam") job.isIntermediate = true @@ -594,7 +593,7 @@ class Gentrap(val root: Configurable) extends QScript job.input = alnFile job.b = true job.h = true - job.f = List("0x80") + job.f = List("0x40") job.F = List("0x10") job.output = createFile(".r1.bam") job.isIntermediate = true @@ -844,6 +843,8 @@ class Gentrap(val root: Configurable) extends QScript def addJobs(): Unit = { // create per-library alignment file addAll(mappingJob.functions) + // Input file checking + inputFiles :::= mappingJob.inputFiles // add bigwig track addAll(bam2wigModule.functions) qscript.addSummaryQScript(mappingJob) diff --git a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/GentrapReport.scala b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/GentrapReport.scala index 369b7c6f1ab3d64cbc721df53f6ecad03db20f96..6bf57ea655f67b5ecbc919d4d5f4bd44153dccd7 100644 --- a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/GentrapReport.scala +++ b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/GentrapReport.scala @@ -15,7 +15,7 @@ */ package nl.lumc.sasc.biopet.pipelines.gentrap -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.report.{ ReportBuilderExtension, ReportSection, ReportPage, MultisampleReportBuilder } import nl.lumc.sasc.biopet.pipelines.bammetrics.BammetricsReport import nl.lumc.sasc.biopet.pipelines.flexiprep.FlexiprepReport diff --git a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/CustomVarScan.scala b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/CustomVarScan.scala index 1314e5f14537fd1ebcafe5fbc46405e4768d810e..8871f481f3075d7eef52784583ccb5aadaa5b715 100644 --- a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/CustomVarScan.scala +++ b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/CustomVarScan.scala @@ -17,26 +17,24 @@ package nl.lumc.sasc.biopet.pipelines.gentrap.extensions import java.io.File -import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.core.{ Reference, BiopetCommandLineFunction } +import nl.lumc.sasc.biopet.core.extensions.PythonCommandLineFunction +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.extensions.samtools.SamtoolsMpileup import nl.lumc.sasc.biopet.extensions.varscan.Mpileup2cns -import nl.lumc.sasc.biopet.extensions.{ Bgzip, PythonCommandLineFunction, Tabix } +import nl.lumc.sasc.biopet.extensions.{ Bgzip, Tabix } import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** Ad-hoc extension for VarScan variant calling that involves 6-command pipe */ // FIXME: generalize piping instead of building something by hand like this! // Better to do everything quick and dirty here rather than something half-implemented with the objects -class CustomVarScan(val root: Configurable) extends BiopetCommandLineFunction { wrapper => +class CustomVarScan(val root: Configurable) extends BiopetCommandLineFunction with Reference { wrapper => override def configName = "customvarscan" @Input(doc = "Input BAM file", required = true) var input: File = null - @Input(doc = "Reference FASTA file", required = true) - var reference: File = config("reference") - @Output(doc = "Output VCF file", required = true) var output: File = null @@ -48,7 +46,6 @@ class CustomVarScan(val root: Configurable) extends BiopetCommandLineFunction { this.input = List(wrapper.input) override def configName = wrapper.configName disableBaq = true - reference = config("reference") depth = Option(1000000) outputMappingQuality = true } @@ -91,7 +88,9 @@ class CustomVarScan(val root: Configurable) extends BiopetCommandLineFunction { } override def beforeGraph(): Unit = { + super.beforeGraph() require(output.toString.endsWith(".gz"), "Output must have a .gz file extension") + deps :+= referenceFasta() } def cmdLine: String = { diff --git a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/Pdflatex.scala b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/Pdflatex.scala index be9c5b8bbe5a946ccf7ac6f7e47a3f12de8993e6..4747856d6f58943b33f0407b63c37311ab1f67fb 100644 --- a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/Pdflatex.scala +++ b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/Pdflatex.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.pipelines.gentrap.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } /** diff --git a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/RawBaseCounter.scala b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/RawBaseCounter.scala index 8bc425dc30d54ffb5aa72ac74d0870275cd88c68..0dfbacd1aaebf7ef74b87a0125407ca29128c9bb 100644 --- a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/RawBaseCounter.scala +++ b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/extensions/RawBaseCounter.scala @@ -18,8 +18,8 @@ package nl.lumc.sasc.biopet.pipelines.gentrap.extensions import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.extensions.PythonCommandLineFunction +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.core.extensions.PythonCommandLineFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } import scala.language.reflectiveCalls diff --git a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/AggrBaseCount.scala b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/AggrBaseCount.scala index c0287fd31b4a45b6b0b68ab6b60e5d0dd781ad13..84941f6beeb080c1ffa2b7681eb8bb9e404d6449 100644 --- a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/AggrBaseCount.scala +++ b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/AggrBaseCount.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.pipelines.gentrap.scripts import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.pipelines.gentrap.extensions.RScriptCommandLineFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } diff --git a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/Hist2Count.scala b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/Hist2Count.scala index 077e75a96730175830373bb5d524ff2d882536a5..4a8002c9defcf843bfeb6f69b062d68dc6d3b7b6 100644 --- a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/Hist2Count.scala +++ b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/Hist2Count.scala @@ -17,8 +17,8 @@ package nl.lumc.sasc.biopet.pipelines.gentrap.scripts import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.extensions.PythonCommandLineFunction +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.core.extensions.PythonCommandLineFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PdfReportTemplateWriter.scala b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PdfReportTemplateWriter.scala index 6cf247b6ed8b73579fc659d09334ceba7fe81297..eeab93c18279cf3e0510e810ef1199dbb27ee58a 100644 --- a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PdfReportTemplateWriter.scala +++ b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PdfReportTemplateWriter.scala @@ -17,8 +17,8 @@ package nl.lumc.sasc.biopet.pipelines.gentrap.scripts import java.io.{ File, FileOutputStream } -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.extensions.PythonCommandLineFunction +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.core.extensions.PythonCommandLineFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PlotHeatmap.scala b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PlotHeatmap.scala index 189fccc05745f37de980bb49ac6f52cefcded4bd..e93049732b32bfe4451fdb6d18a122d070f98c13 100644 --- a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PlotHeatmap.scala +++ b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PlotHeatmap.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.pipelines.gentrap.scripts import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.pipelines.gentrap.extensions.RScriptCommandLineFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } diff --git a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PlotPca.scala b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PlotPca.scala index cee3d7f392f5a622b7aa47bd4d3f8ec1b4d93b25..dd13421069fd331c819653a9fcf2cb4cf3b7c851 100644 --- a/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PlotPca.scala +++ b/public/gentrap/src/main/scala/nl/lumc/sasc/biopet/pipelines/gentrap/scripts/PlotPca.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.pipelines.gentrap.scripts import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.pipelines.gentrap.extensions.RScriptCommandLineFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } diff --git a/public/gentrap/src/test/scala/nl/lumc/sasc/biopet/pipelines/gentrap/GentrapTest.scala b/public/gentrap/src/test/scala/nl/lumc/sasc/biopet/pipelines/gentrap/GentrapTest.scala index 48edf42eeb8ca055fbcc36a01661b2e68e80860d..575b20b3e9fe58ab5c1caca50aeb1308b6066bdf 100644 --- a/public/gentrap/src/test/scala/nl/lumc/sasc/biopet/pipelines/gentrap/GentrapTest.scala +++ b/public/gentrap/src/test/scala/nl/lumc/sasc/biopet/pipelines/gentrap/GentrapTest.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.pipelines.gentrap import java.io.{ File, FileOutputStream } import com.google.common.io.Files -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import nl.lumc.sasc.biopet.extensions._ import nl.lumc.sasc.biopet.pipelines.gentrap.scripts.AggrBaseCount import nl.lumc.sasc.biopet.utils.ConfigUtils @@ -43,8 +43,8 @@ class GentrapTest extends TestNGSuite with Matchers { /** Convenience method for making library config */ private def makeLibConfig(idx: Int, paired: Boolean = true) = { - val files = Map("R1" -> "test_R1.fq") - if (paired) (s"lib_$idx", files ++ Map("R2" -> "test_R2.fq")) + val files = Map("R1" -> GentrapTest.inputTouch("test_R1.fq")) + if (paired) (s"lib_$idx", files ++ Map("R2" -> GentrapTest.inputTouch("test_R2.fq"))) else (s"lib_$idx", files) } @@ -118,8 +118,6 @@ class GentrapTest extends TestNGSuite with Matchers { val functions = gentrap.functions.groupBy(_.getClass) val numSamples = sampleConfig("samples").size - functions(classOf[Gsnap]).size should be >= 1 - if (expMeasures.contains("fragments_per_gene")) { gentrap.functions .collect { case x: HtseqCount => x.output.toString.endsWith(".fragments_per_gene") }.size shouldBe numSamples @@ -179,6 +177,12 @@ class GentrapTest extends TestNGSuite with Matchers { object GentrapTest { val outputDir = Files.createTempDir() + new File(outputDir, "input").mkdirs() + def inputTouch(name: String): String = { + val file = new File(outputDir, "input" + File.separator + name) + Files.touch(file) + file.getAbsolutePath + } private def copyFile(name: String): Unit = { val is = getClass.getResourceAsStream("/" + name) @@ -192,7 +196,6 @@ object GentrapTest { copyFile("ref.fa.fai") val executables = Map( - "reference" -> (outputDir + File.separator + "ref.fa"), "reference_fasta" -> (outputDir + File.separator + "ref.fa"), "refFlat" -> "test", "annotation_gtf" -> "test", diff --git a/public/kopisu/pom.xml b/public/kopisu/pom.xml index 456a64c3c89490a726fd87d06e1deb92d7952bfd..21f4f0c60dab41cf8d7d7800e8e8ac7654ed6d81 100644 --- a/public/kopisu/pom.xml +++ b/public/kopisu/pom.xml @@ -25,7 +25,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -35,7 +35,12 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetExtensions</artifactId> <version>${project.version}</version> </dependency> </dependencies> diff --git a/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/ConiferPipeline.scala b/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/ConiferPipeline.scala index 6c8195968f1c05e2197935a9c279efa80f34289f..7265bee9a1ddeab64cfec191caa9958aa40c2076 100644 --- a/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/ConiferPipeline.scala +++ b/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/ConiferPipeline.scala @@ -17,7 +17,7 @@ package nl.lumc.sasc.biopet.pipelines.kopisu import java.io.File -import nl.lumc.sasc.biopet.core.config._ +import nl.lumc.sasc.biopet.utils.config._ import nl.lumc.sasc.biopet.core.{ PipelineCommand, _ } import nl.lumc.sasc.biopet.extensions.Ln import nl.lumc.sasc.biopet.extensions.conifer.{ ConiferAnalyze, ConiferCall, ConiferRPKM } diff --git a/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/ConiferSummary.scala b/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/ConiferSummary.scala index 385b211f117dc9bec6d0b756d330422c5e090a4e..6eddbad30c61e741a63687aa1a77df3604017f1a 100644 --- a/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/ConiferSummary.scala +++ b/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/ConiferSummary.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.pipelines.kopisu import java.io.{ BufferedWriter, File, FileWriter } import argonaut._ -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.queue.function.InProcessFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } diff --git a/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/Kopisu.scala b/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/Kopisu.scala index 5d434c510d5eac56d549d514cd71444a599bea99..9a6f002710a35203f9bff63dff9d776f2c95e246 100644 --- a/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/Kopisu.scala +++ b/public/kopisu/src/main/scala/nl/lumc/sasc/biopet/pipelines/kopisu/Kopisu.scala @@ -15,7 +15,7 @@ */ package nl.lumc.sasc.biopet.pipelines.kopisu -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ MultiSampleQScript, PipelineCommand } import org.broadinstitute.gatk.queue.QScript diff --git a/public/mapping/pom.xml b/public/mapping/pom.xml index ae1cf1433a4e0e2fd138cc50f8af378ea8860671..b5b45bb49c185ff3953a6a1623d154d88d8e7bf6 100644 --- a/public/mapping/pom.xml +++ b/public/mapping/pom.xml @@ -25,7 +25,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -35,7 +35,7 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/public/mapping/src/main/resources/nl/lumc/sasc/biopet/pipelines/mapping/mappingFront.ssp b/public/mapping/src/main/resources/nl/lumc/sasc/biopet/pipelines/mapping/mappingFront.ssp index a8ac1542daca37a7904efa51be3f3521a79fa6ff..84ca1370774a54402d6ae3c01208420df2a64617 100644 --- a/public/mapping/src/main/resources/nl/lumc/sasc/biopet/pipelines/mapping/mappingFront.ssp +++ b/public/mapping/src/main/resources/nl/lumc/sasc/biopet/pipelines/mapping/mappingFront.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) <%@ var summary: Summary %> <%@ var rootPath: String %> diff --git a/public/mapping/src/main/resources/nl/lumc/sasc/biopet/pipelines/mapping/outputBamfiles.ssp b/public/mapping/src/main/resources/nl/lumc/sasc/biopet/pipelines/mapping/outputBamfiles.ssp index d35962f5d53969c6d04700d9c1a76f747bc3b239..41d8249e75c416bebc27a3acff58c1e1498e17ae 100644 --- a/public/mapping/src/main/resources/nl/lumc/sasc/biopet/pipelines/mapping/outputBamfiles.ssp +++ b/public/mapping/src/main/resources/nl/lumc/sasc/biopet/pipelines/mapping/outputBamfiles.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(nl.lumc.sasc.biopet.pipelines.bammetrics.BammetricsReport) #import(java.io.File) diff --git a/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/Mapping.scala b/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/Mapping.scala index c1956221bb9034189fd808723896a14993cc3b12..4d051300a0ad5aed1946aa8a367c26c1fba15165 100644 --- a/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/Mapping.scala +++ b/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/Mapping.scala @@ -19,7 +19,7 @@ import java.io.File import java.util.Date import nl.lumc.sasc.biopet.core._ -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.SummaryQScript import nl.lumc.sasc.biopet.extensions.bwa.{ BwaAln, BwaMem, BwaSampe, BwaSamse } import nl.lumc.sasc.biopet.extensions.picard.{ AddOrReplaceReadGroups, MarkDuplicates, MergeSamFiles, ReorderSam, SortSam } @@ -28,7 +28,7 @@ import nl.lumc.sasc.biopet.pipelines.bammetrics.BamMetrics import nl.lumc.sasc.biopet.pipelines.bamtobigwig.Bam2Wig import nl.lumc.sasc.biopet.pipelines.flexiprep.Flexiprep import nl.lumc.sasc.biopet.pipelines.mapping.scripts.TophatRecondition -import nl.lumc.sasc.biopet.tools.FastqSplitter +import nl.lumc.sasc.biopet.extensions.tools.FastqSplitter import nl.lumc.sasc.biopet.utils.ConfigUtils import org.broadinstitute.gatk.queue.QScript @@ -97,13 +97,12 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S /** location of summary file */ def summaryFile = new File(outputDir, sampleId.getOrElse("x") + "-" + libId.getOrElse("x") + ".summary.json") - override def defaults = ConfigUtils.mergeMaps( - Map( - "gsnap" -> Map( - "batch" -> 4, - "format" -> "sam" - ) - ), super.defaults) + override def defaults = Map("gsnap" -> Map("batch" -> 4)) + + override def fixedValues = Map( + "gsnap" -> Map("format" -> "sam"), + "bowtie" -> Map("sam" -> true) + ) /** File to add to the summary */ def summaryFiles: Map[String, File] = Map("output_bamfile" -> finalBamFile, "input_R1" -> input_R1, @@ -117,7 +116,7 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S "skip_markduplicates" -> skipMarkduplicates, "aligner" -> aligner, "chunking" -> chunking, - "numberChunks" -> numberChunks.getOrElse(1) + "numberChunks" -> (if (chunking) numberChunks.getOrElse(1) else None) ) ++ (if (root == null) Map("reference" -> referenceSummary) else Map()) override def reportClass = { @@ -137,6 +136,9 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S require(sampleId.isDefined, "Missing sample ID on mapping module") require(libId.isDefined, "Missing library ID on mapping module") + inputFiles :+= new InputFile(input_R1) + input_R2.foreach(inputFiles :+= new InputFile(_)) + paired = input_R2.isDefined if (readgroupId == null) readgroupId = sampleId.get + "-" + libId.get @@ -173,26 +175,14 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S var fastq_R1_output: List[File] = Nil var fastq_R2_output: List[File] = Nil - def removeGz(file: File): File = { - val absPath = file.getAbsolutePath - if (absPath.endsWith(".gz")) new File(absPath.substring(0, absPath.lastIndexOf(".gz"))) - else if (absPath.endsWith(".gzip")) new File(absPath.substring(0, absPath.lastIndexOf(".gzip"))) - else file - } - val chunks: Map[File, (File, Option[File])] = { - if (chunking) { - (for (t <- 1 to numberChunks.getOrElse(1)) yield { - val chunkDir = new File(outputDir, "chunks" + File.separator + t) - chunkDir -> (removeGz(new File(chunkDir, input_R1.getName)), - if (paired) Some(removeGz(new File(chunkDir, input_R2.get.getName))) else None) - }).toMap - } else if (skipFlexiprep) { - Map(outputDir -> ( - extractIfNeeded(input_R1, flexiprep.outputDir), - if (paired) Some(extractIfNeeded(input_R2.get, outputDir)) else None) - ) - } else Map(outputDir -> (flexiprep.outputFiles("fastq_input_R1"), flexiprep.outputFiles.get("fastq_input_R2"))) + if (chunking) (for (t <- 1 to numberChunks.getOrElse(1)) yield { + val chunkDir = new File(outputDir, "chunks" + File.separator + t) + chunkDir -> (new File(chunkDir, input_R1.getName), + if (paired) Some(new File(chunkDir, input_R2.get.getName)) else None) + }).toMap + else if (skipFlexiprep) Map(outputDir -> (input_R1, if (paired) input_R2 else None)) + else Map(outputDir -> (flexiprep.input_R1, flexiprep.input_R2)) } if (chunking) { @@ -214,13 +204,11 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S for ((chunkDir, fastqfile) <- chunks) { var R1 = fastqfile._1 var R2 = fastqfile._2 - var deps: List[File] = Nil if (!skipFlexiprep) { val flexiout = flexiprep.runTrimClip(R1, R2, new File(chunkDir, "flexiprep"), chunkDir) logger.debug(chunkDir + " - " + flexiout) R1 = flexiout._1 if (paired) R2 = flexiout._2 - deps = flexiout._3 fastq_R1_output :+= R1 R2.foreach(R2 => fastq_R2_output :+= R2) } @@ -228,19 +216,19 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S val outputBam = new File(chunkDir, outputName + ".bam") bamFiles :+= outputBam aligner match { - case "bwa-mem" => addBwaMem(R1, R2, outputBam, deps) - case "bwa-aln" => addBwaAln(R1, R2, outputBam, deps) - case "bowtie" => addBowtie(R1, R2, outputBam, deps) - case "gsnap" => addGsnap(R1, R2, outputBam, deps) + case "bwa-mem" => addBwaMem(R1, R2, outputBam) + case "bwa-aln" => addBwaAln(R1, R2, outputBam) + case "bowtie" => addBowtie(R1, R2, outputBam) + case "gsnap" => addGsnap(R1, R2, outputBam) // TODO: make TopHat here accept multiple input files - case "tophat" => addTophat(R1, R2, outputBam, deps) - case "stampy" => addStampy(R1, R2, outputBam, deps) - case "star" => addStar(R1, R2, outputBam, deps) - case "star-2pass" => addStar2pass(R1, R2, outputBam, deps) + case "tophat" => addTophat(R1, R2, outputBam) + case "stampy" => addStampy(R1, R2, outputBam) + case "star" => addStar(R1, R2, outputBam) + case "star-2pass" => addStar2pass(R1, R2, outputBam) case _ => throw new IllegalStateException("Option aligner: '" + aligner + "' is not valid") } - if (config("chunk_metrics", default = false)) - addAll(BamMetrics(this, outputBam, new File(chunkDir, "metrics")).functions) + if (chunking && numberChunks.getOrElse(1) > 1 && config("chunk_metrics", default = false)) + addAll(BamMetrics(this, outputBam, new File(chunkDir, "metrics"), sampleId, libId).functions) } if (!skipFlexiprep) { flexiprep.runFinalize(fastq_R1_output, fastq_R2_output) @@ -261,7 +249,7 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S } if (!skipMetrics) { - val bamMetrics = BamMetrics(this, bamFile, new File(outputDir, "metrics")) + val bamMetrics = BamMetrics(this, bamFile, new File(outputDir, "metrics"), sampleId, libId) addAll(bamMetrics.functions) addSummaryQScript(bamMetrics) } @@ -277,10 +265,9 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S } /** Add bwa aln jobs */ - def addBwaAln(R1: File, R2: Option[File], output: File, deps: List[File]): File = { + def addBwaAln(R1: File, R2: Option[File], output: File): File = { val bwaAlnR1 = new BwaAln(this) bwaAlnR1.fastq = R1 - bwaAlnR1.deps = deps bwaAlnR1.output = swapExt(output.getParent, output, ".bam", ".R1.sai") bwaAlnR1.isIntermediate = true add(bwaAlnR1) @@ -288,7 +275,6 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S val samFile: File = if (paired) { val bwaAlnR2 = new BwaAln(this) bwaAlnR2.fastq = R2.get - bwaAlnR2.deps = deps bwaAlnR2.output = swapExt(output.getParent, output, ".bam", ".R2.sai") bwaAlnR2.isIntermediate = true add(bwaAlnR2) @@ -323,44 +309,47 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S } /** Adds bwa mem jobs */ - def addBwaMem(R1: File, R2: Option[File], output: File, deps: List[File]): File = { + def addBwaMem(R1: File, R2: Option[File], output: File): File = { val bwaCommand = new BwaMem(this) bwaCommand.R1 = R1 if (paired) bwaCommand.R2 = R2.get - bwaCommand.deps = deps bwaCommand.R = Some(getReadGroupBwa) - bwaCommand.output = swapExt(output.getParent, output, ".bam", ".sam") - bwaCommand.isIntermediate = true - add(bwaCommand) - val sortSam = SortSam(this, bwaCommand.output, output) - if (chunking || !skipMarkduplicates) sortSam.isIntermediate = true - add(sortSam) - sortSam.output + val sortSam = new SortSam(this) + sortSam.output = output + val pipe = bwaCommand | sortSam + pipe.isIntermediate = chunking || !skipMarkduplicates + pipe.threadsCorrection = -1 + add(pipe) + output } - def addGsnap(R1: File, R2: Option[File], output: File, deps: List[File]): File = { + def addGsnap(R1: File, R2: Option[File], output: File): File = { + val zcatR1 = extractIfNeeded(R1, output.getParentFile) + val zcatR2 = if (paired) Some(extractIfNeeded(R2.get, output.getParentFile)) else None val gsnapCommand = new Gsnap(this) - gsnapCommand.input = if (paired) List(R1, R2.get) else List(R1) - gsnapCommand.deps = deps - gsnapCommand.output = swapExt(output.getParent, output, ".bam", ".sam") - gsnapCommand.isIntermediate = true - add(gsnapCommand) + gsnapCommand.input = if (paired) List(zcatR1._2, zcatR2.get._2) else List(zcatR1._2) + gsnapCommand.output = swapExt(output.getParentFile, output, ".bam", ".sam") val reorderSam = new ReorderSam(this) reorderSam.input = gsnapCommand.output - reorderSam.output = swapExt(output.getParent, output, ".sorted.bam", ".reordered.bam") - add(reorderSam) - - addAddOrReplaceReadGroups(reorderSam.output, output) + reorderSam.output = swapExt(output.getParentFile, output, ".sorted.bam", ".reordered.bam") + + val ar = addAddOrReplaceReadGroups(reorderSam.output, output) + val pipe = new BiopetFifoPipe(this, (zcatR1._1 :: (if (paired) zcatR2.get._1 else None) :: + Some(gsnapCommand) :: Some(ar._1) :: Some(reorderSam) :: Nil).flatten) + pipe.threadsCorrection = -1 + zcatR1._1.foreach(x => pipe.threadsCorrection -= 1) + zcatR2.foreach(_._1.foreach(x => pipe.threadsCorrection -= 1)) + add(pipe) + ar._2 } - def addTophat(R1: File, R2: Option[File], output: File, deps: List[File]): File = { + def addTophat(R1: File, R2: Option[File], output: File): File = { // TODO: merge mapped and unmapped BAM ~ also dealing with validation errors in the unmapped BAM val tophat = new Tophat(this) tophat.R1 = tophat.R1 :+ R1 if (paired) tophat.R2 = tophat.R2 :+ R2.get tophat.output_dir = new File(outputDir, "tophat_out") - tophat.deps = deps // always output BAM tophat.no_convert_bam = false // and always keep input ordering @@ -394,10 +383,12 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S reorderSam.output = swapExt(output.getParent, output, ".merge.bam", ".reordered.bam") add(reorderSam) - addAddOrReplaceReadGroups(reorderSam.output, output) + val ar = addAddOrReplaceReadGroups(reorderSam.output, output) + add(ar._1) + ar._2 } /** Adds stampy jobs */ - def addStampy(R1: File, R2: Option[File], output: File, deps: List[File]): File = { + def addStampy(R1: File, R2: Option[File], output: File): File = { var RG: String = "ID:" + readgroupId + "," RG += "SM:" + sampleId.get + "," @@ -412,10 +403,9 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S val stampyCmd = new Stampy(this) stampyCmd.R1 = R1 if (paired) stampyCmd.R2 = R2.get - stampyCmd.deps = deps stampyCmd.readgroup = RG stampyCmd.sanger = true - stampyCmd.output = this.swapExt(output.getParent, output, ".bam", ".sam") + stampyCmd.output = this.swapExt(output.getParentFile, output, ".bam", ".sam") stampyCmd.isIntermediate = true add(stampyCmd) val sortSam = SortSam(this, stampyCmd.output, output) @@ -425,33 +415,54 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S } /** Adds bowtie jobs */ - def addBowtie(R1: File, R2: Option[File], output: File, deps: List[File]): File = { + def addBowtie(R1: File, R2: Option[File], output: File): File = { + val zcatR1 = extractIfNeeded(R1, output.getParentFile) + val zcatR2 = if (paired) Some(extractIfNeeded(R2.get, output.getParentFile)) else None + zcatR1._1.foreach(add(_)) + zcatR2.foreach(_._1.foreach(add(_))) val bowtie = new Bowtie(this) - bowtie.R1 = R1 - if (paired) bowtie.R2 = R2 - bowtie.deps = deps - bowtie.output = this.swapExt(output.getParent, output, ".bam", ".sam") + bowtie.R1 = zcatR1._2 + if (paired) bowtie.R2 = Some(zcatR2.get._2) + bowtie.output = this.swapExt(output.getParentFile, output, ".bam", ".sam") bowtie.isIntermediate = true - add(bowtie) - addAddOrReplaceReadGroups(bowtie.output, output) + val ar = addAddOrReplaceReadGroups(bowtie.output, output) + val pipe = new BiopetFifoPipe(this, (Some(bowtie) :: Some(ar._1) :: Nil).flatten) + pipe.threadsCorrection = -1 + add(pipe) + ar._2 } /** Adds Star jobs */ - def addStar(R1: File, R2: Option[File], output: File, deps: List[File]): File = { - val starCommand = Star(this, R1, R2, outputDir, isIntermediate = true, deps = deps) - add(starCommand) - addAddOrReplaceReadGroups(starCommand.outputSam, output) + def addStar(R1: File, R2: Option[File], output: File): File = { + val zcatR1 = extractIfNeeded(R1, output.getParentFile) + val zcatR2 = if (paired) Some(extractIfNeeded(R2.get, output.getParentFile)) else None + val starCommand = Star(this, zcatR1._2, zcatR2.map(_._2), outputDir, isIntermediate = true) + val ar = addAddOrReplaceReadGroups(starCommand.outputSam, output) + val pipe = new BiopetFifoPipe(this, (zcatR1._1 :: (if (paired) zcatR2.get._1 else None) :: + Some(starCommand) :: Some(ar._1) :: Nil).flatten) + pipe.threadsCorrection = -1 + zcatR1._1.foreach(x => pipe.threadsCorrection -= 1) + zcatR2.foreach(_._1.foreach(x => pipe.threadsCorrection -= 1)) + add(pipe) + ar._2 } /** Adds Start 2 pass jobs */ - def addStar2pass(R1: File, R2: Option[File], output: File, deps: List[File]): File = { - val starCommand = Star._2pass(this, R1, R2, outputDir, isIntermediate = true, deps = deps) + def addStar2pass(R1: File, R2: Option[File], output: File): File = { + val zcatR1 = extractIfNeeded(R1, output.getParentFile) + val zcatR2 = if (paired) Some(extractIfNeeded(R2.get, output.getParentFile)) else None + zcatR1._1.foreach(add(_)) + zcatR2.foreach(_._1.foreach(add(_))) + + val starCommand = Star._2pass(this, zcatR1._2, zcatR2.map(_._2), outputDir, isIntermediate = true) addAll(starCommand._2) - addAddOrReplaceReadGroups(starCommand._1, output) + val ar = addAddOrReplaceReadGroups(starCommand._1, output) + add(ar._1) + ar._2 } /** Adds AddOrReplaceReadGroups */ - def addAddOrReplaceReadGroups(input: File, output: File): File = { + def addAddOrReplaceReadGroups(input: File, output: File): (AddOrReplaceReadGroups, File) = { val addOrReplaceReadGroups = AddOrReplaceReadGroups(this, input, output) addOrReplaceReadGroups.createIndex = true @@ -463,9 +474,8 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S if (readgroupSequencingCenter.isDefined) addOrReplaceReadGroups.RGCN = readgroupSequencingCenter.get if (readgroupDescription.isDefined) addOrReplaceReadGroups.RGDS = readgroupDescription.get if (!skipMarkduplicates) addOrReplaceReadGroups.isIntermediate = true - add(addOrReplaceReadGroups) - addOrReplaceReadGroups.output + (addOrReplaceReadGroups, addOrReplaceReadGroups.output) } /** Returns readgroup for bwa */ @@ -490,22 +500,18 @@ class Mapping(val root: Configurable) extends QScript with SummaryQScript with S * @param runDir directory to extract when needed * @return returns extracted file */ - def extractIfNeeded(file: File, runDir: File): File = { - if (file == null) file - else if (file.getName.endsWith(".gz") || file.getName.endsWith(".gzip")) { + def extractIfNeeded(file: File, runDir: File): (Option[BiopetCommandLineFunction], File) = { + require(file != null) + if (file.getName.endsWith(".gz") || file.getName.endsWith(".gzip")) { var newFile: File = swapExt(runDir, file, ".gz", "") if (file.getName.endsWith(".gzip")) newFile = swapExt(runDir, file, ".gzip", "") val zcatCommand = Zcat(this, file, newFile) - zcatCommand.isIntermediate = true - add(zcatCommand) - newFile + (Some(zcatCommand), newFile) } else if (file.getName.endsWith(".bz2")) { val newFile = swapExt(runDir, file, ".bz2", "") val pbzip2 = Pbzip2(this, file, newFile) - pbzip2.isIntermediate = true - add(pbzip2) - newFile - } else file + (Some(pbzip2), newFile) + } else (None, file) } } diff --git a/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/MappingReport.scala b/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/MappingReport.scala index ab703e5b5e44a7cc0762e8cff51b1f3b23c66ab9..b2f1b7a846da3483eeed879e493e651f25a83759 100644 --- a/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/MappingReport.scala +++ b/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/MappingReport.scala @@ -15,7 +15,7 @@ */ package nl.lumc.sasc.biopet.pipelines.mapping -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.report.{ ReportBuilderExtension, ReportSection, ReportPage, ReportBuilder } import nl.lumc.sasc.biopet.pipelines.bammetrics.BammetricsReport import nl.lumc.sasc.biopet.pipelines.flexiprep.FlexiprepReport diff --git a/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/scripts/TophatRecondition.scala b/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/scripts/TophatRecondition.scala index e9b11906fe59e7ed768a71beab5ce94a3ccaef41..5ca8be2834904e9ebbc68722591a273f36d012ef 100644 --- a/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/scripts/TophatRecondition.scala +++ b/public/mapping/src/main/scala/nl/lumc/sasc/biopet/pipelines/mapping/scripts/TophatRecondition.scala @@ -17,8 +17,8 @@ package nl.lumc.sasc.biopet.pipelines.mapping.scripts import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable -import nl.lumc.sasc.biopet.extensions.PythonCommandLineFunction +import nl.lumc.sasc.biopet.utils.config.Configurable +import nl.lumc.sasc.biopet.core.extensions.PythonCommandLineFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } /** diff --git a/public/mapping/src/test/resources/ref.1.bt2 b/public/mapping/src/test/resources/ref.1.bt2 new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/public/mapping/src/test/resources/ref.1.ebwt b/public/mapping/src/test/resources/ref.1.ebwt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/public/mapping/src/test/scala/nl/lumc/sasc/biopet/pipelines/mapping/MappingTest.scala b/public/mapping/src/test/scala/nl/lumc/sasc/biopet/pipelines/mapping/MappingTest.scala index 78b9b3dd400fa9daf5725c3407e60f32f9f38eae..9b6c87a20e12be17d15adb529cf3c5f85b02c88f 100644 --- a/public/mapping/src/test/scala/nl/lumc/sasc/biopet/pipelines/mapping/MappingTest.scala +++ b/public/mapping/src/test/scala/nl/lumc/sasc/biopet/pipelines/mapping/MappingTest.scala @@ -18,12 +18,12 @@ package nl.lumc.sasc.biopet.pipelines.mapping import java.io.{ File, FileOutputStream } import com.google.common.io.Files -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import nl.lumc.sasc.biopet.extensions._ import nl.lumc.sasc.biopet.extensions.bwa.{ BwaAln, BwaMem, BwaSampe, BwaSamse } import nl.lumc.sasc.biopet.extensions.picard.{ AddOrReplaceReadGroups, MarkDuplicates, MergeSamFiles, SortSam } -import nl.lumc.sasc.biopet.pipelines.flexiprep.{ Cutadapt, Fastqc, SeqtkSeq } -import nl.lumc.sasc.biopet.tools.{ FastqSync, SeqStat } +import nl.lumc.sasc.biopet.pipelines.flexiprep.Fastqc +import nl.lumc.sasc.biopet.extensions.tools.{ FastqSync, SeqStat } import nl.lumc.sasc.biopet.utils.ConfigUtils import org.apache.commons.io.FileUtils import org.broadinstitute.gatk.queue.QSettings @@ -79,11 +79,11 @@ class MappingTest extends TestNGSuite with Matchers { val mapping: Mapping = initPipeline(map) if (zipped) { - mapping.input_R1 = new File(mapping.outputDir, "bla_R1.fq.gz") - if (paired) mapping.input_R2 = Some(new File(mapping.outputDir, "bla_R2.fq.gz")) + mapping.input_R1 = MappingTest.r1Zipped + if (paired) mapping.input_R2 = Some(MappingTest.r2Zipped) } else { - mapping.input_R1 = new File(mapping.outputDir, "bla_R1.fq") - if (paired) mapping.input_R2 = Some(new File(mapping.outputDir, "bla_R2.fq")) + mapping.input_R1 = MappingTest.r1 + if (paired) mapping.input_R2 = Some(MappingTest.r2) } mapping.sampleId = Some("1") mapping.libId = Some("1") @@ -91,36 +91,6 @@ class MappingTest extends TestNGSuite with Matchers { //Flexiprep mapping.functions.count(_.isInstanceOf[Fastqc]) shouldBe (if (skipFlexiprep) 0 else if (paired) 4 else 2) - mapping.functions.count(_.isInstanceOf[Zcat]) shouldBe (if (!zipped || (chunks > 1 && skipFlexiprep)) 0 else if (paired) 2 else 1) - mapping.functions.count(_.isInstanceOf[SeqStat]) shouldBe ((if (skipFlexiprep) 0 else if (paired) 4 else 2) * chunks) - mapping.functions.count(_.isInstanceOf[SeqtkSeq]) shouldBe ((if (skipFlexiprep) 0 else if (paired) 2 else 1) * chunks) - mapping.functions.count(_.isInstanceOf[Cutadapt]) shouldBe ((if (skipFlexiprep) 0 else if (paired) 2 else 1) * chunks) - mapping.functions.count(_.isInstanceOf[FastqSync]) shouldBe ((if (skipFlexiprep) 0 else if (paired && !skipFlexiprep) 1 else 0) * chunks) - mapping.functions.count(_.isInstanceOf[Sickle]) shouldBe ((if (skipFlexiprep) 0 else 1) * chunks) - mapping.functions.count(_.isInstanceOf[Gzip]) shouldBe (if (skipFlexiprep) 0 else if (paired) 2 else 1) - - //aligners - mapping.functions.count(_.isInstanceOf[BwaMem]) shouldBe ((if (aligner == "bwa-mem") 1 else 0) * chunks) - mapping.functions.count(_.isInstanceOf[BwaAln]) shouldBe ((if (aligner == "bwa-aln") if (paired) 2 else 1 else 0) * chunks) - mapping.functions.count(_.isInstanceOf[BwaSampe]) shouldBe ((if (aligner == "bwa-aln") if (paired) 1 else 0 else 0) * chunks) - mapping.functions.count(_.isInstanceOf[BwaSamse]) shouldBe ((if (aligner == "bwa-aln") if (paired) 0 else 1 else 0) * chunks) - mapping.functions.count(_.isInstanceOf[Star]) shouldBe ((if (aligner == "star") 1 else if (aligner == "star-2pass") 3 else 0) * chunks) - mapping.functions.count(_.isInstanceOf[Bowtie]) shouldBe ((if (aligner == "bowtie") 1 else 0) * chunks) - mapping.functions.count(_.isInstanceOf[Stampy]) shouldBe ((if (aligner == "stampy") 1 else 0) * chunks) - - // Sort sam or replace readgroup - val sort = aligner match { - case "bwa-mem" | "bwa-aln" | "stampy" => "sortsam" - case "star" | "star-2pass" | "bowtie" | "gsnap" | "tophat" => "replacereadgroups" - case _ => throw new IllegalArgumentException("aligner: " + aligner + " does not exist") - } - - if (aligner != "tophat") { // FIXME - mapping.functions.count(_.isInstanceOf[SortSam]) shouldBe ((if (sort == "sortsam") 1 else 0) * chunks) - mapping.functions.count(_.isInstanceOf[AddOrReplaceReadGroups]) shouldBe ((if (sort == "replacereadgroups") 1 else 0) * chunks) - mapping.functions.count(_.isInstanceOf[MergeSamFiles]) shouldBe (if (skipMarkDuplicate && chunks > 1) 1 else 0) - mapping.functions.count(_.isInstanceOf[MarkDuplicates]) shouldBe (if (skipMarkDuplicate) 0 else 1) - } } // remove temporary run directory all tests in the class have been run @@ -131,6 +101,16 @@ class MappingTest extends TestNGSuite with Matchers { object MappingTest { val outputDir = Files.createTempDir() + new File(outputDir, "input").mkdirs() + + val r1 = new File(outputDir, "input" + File.separator + "R1.fq") + Files.touch(r1) + val r2 = new File(outputDir, "input" + File.separator + "R2.fq") + Files.touch(r2) + val r1Zipped = new File(outputDir, "input" + File.separator + "R1.fq.gz") + Files.touch(r1Zipped) + val r2Zipped = new File(outputDir, "input" + File.separator + "R2.fq.gz") + Files.touch(r2Zipped) private def copyFile(name: String): Unit = { val is = getClass.getResourceAsStream("/" + name) @@ -142,11 +122,13 @@ object MappingTest { copyFile("ref.fa") copyFile("ref.dict") copyFile("ref.fa.fai") + copyFile("ref.1.bt2") + copyFile("ref.1.ebwt") val executables = Map( "reference_fasta" -> (outputDir + File.separator + "ref.fa"), "db" -> "test", - "bowtie_index" -> "test", + "bowtie_index" -> (outputDir + File.separator + "ref"), "fastqc" -> Map("exe" -> "test"), "seqtk" -> Map("exe" -> "test"), "gsnap" -> Map("exe" -> "test"), diff --git a/public/pom.xml b/public/pom.xml index 93acfae5e8cdc28359579b70eac5cddc2643f161..d1ac6e032625dff580e5ada2d4fd3bc04c04e40f 100644 --- a/public/pom.xml +++ b/public/pom.xml @@ -22,10 +22,10 @@ <groupId>nl.lumc.sasc</groupId> <name>Biopet</name> <packaging>pom</packaging> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <modules> - <module>biopet-framework</module> + <!--<module>biopet-framework</module>--> <module>biopet-public-package</module> <module>bammetrics</module> <module>flexiprep</module> @@ -40,11 +40,18 @@ <module>toucan</module> <module>shiva</module> <module>basty</module> + <module>biopet-core</module> + <module>biopet-utils</module> + <module>biopet-tools</module> + <module>biopet-tools-extensions</module> + <module>biopet-extensions</module> + <module>biopet-tools-package</module> </modules> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> - <scoverage.plugin.version>1.0.4</scoverage.plugin.version> + <scoverage.plugin.version>1.1.1</scoverage.plugin.version> + <scalaVersion>2.10.4</scalaVersion> </properties> <build> @@ -72,6 +79,7 @@ <artifactId>maven-surefire-plugin</artifactId> <version>2.18.1</version> <configuration> + <forkCount>1C</forkCount> <workingDirectory>${project.build.directory}</workingDirectory> </configuration> </plugin> @@ -153,6 +161,7 @@ <goal>format</goal> </goals> <configuration> + <baseDir>${basedir}/src</baseDir> <rewriteArrowSymbols>false</rewriteArrowSymbols> <alignParameters>true</alignParameters> <alignSingleLineCaseStatements_maxArrowIndent>40 @@ -235,10 +244,12 @@ <artifactId>scoverage-maven-plugin</artifactId> <version>${scoverage.plugin.version}</version> <configuration> - <scalaVersion>2.10.2</scalaVersion> + <scalaVersion>${scalaVersion}</scalaVersion> + <aggregate>true</aggregate> + <highlighting>true</highlighting> <!-- other parameters --> </configuration> - </plugin> + </plugin> </plugins> </build> <reporting> @@ -247,6 +258,17 @@ <groupId>org.scoverage</groupId> <artifactId>scoverage-maven-plugin</artifactId> <version>${scoverage.plugin.version}</version> + <configuration> + <aggregate>true</aggregate> <!-- for aggregated report --> + <highlighting>true</highlighting> + </configuration> + <reportSets> + <reportSet> + <reports> + <report>report</report> <!-- select only one report from: report, integration-report and report-only reporters --> + </reports> + </reportSet> + </reportSets> </plugin> </plugins> </reporting> diff --git a/public/sage/pom.xml b/public/sage/pom.xml index 395857229b51c70955485a9d06428c34308ea424..b88e699a28879de1fd3b92bf60a126006687f440 100644 --- a/public/sage/pom.xml +++ b/public/sage/pom.xml @@ -25,7 +25,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -35,7 +35,7 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/public/sage/src/main/scala/nl/lumc/sasc/biopet/pipelines/sage/Sage.scala b/public/sage/src/main/scala/nl/lumc/sasc/biopet/pipelines/sage/Sage.scala index b65019b97b8e83ff09af9f0d3f4a0a8d08d1e341..c8b81c0c4d31d6424db74390390c758096478838 100644 --- a/public/sage/src/main/scala/nl/lumc/sasc/biopet/pipelines/sage/Sage.scala +++ b/public/sage/src/main/scala/nl/lumc/sasc/biopet/pipelines/sage/Sage.scala @@ -15,15 +15,15 @@ */ package nl.lumc.sasc.biopet.pipelines.sage -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ MultiSampleQScript, PipelineCommand } import nl.lumc.sasc.biopet.extensions.Cat import nl.lumc.sasc.biopet.extensions.bedtools.BedtoolsCoverage import nl.lumc.sasc.biopet.extensions.picard.MergeSamFiles import nl.lumc.sasc.biopet.pipelines.flexiprep.Flexiprep import nl.lumc.sasc.biopet.pipelines.mapping.Mapping -import nl.lumc.sasc.biopet.scripts.SquishBed -import nl.lumc.sasc.biopet.tools.{ BedtoolsCoverageToCounts, PrefixFastq, SageCountFastq, SageCreateLibrary, SageCreateTagCounts } +import nl.lumc.sasc.biopet.extensions.tools.SquishBed +import nl.lumc.sasc.biopet.extensions.tools.{ BedtoolsCoverageToCounts, PrefixFastq, SageCountFastq, SageCreateLibrary, SageCreateTagCounts } import nl.lumc.sasc.biopet.utils.ConfigUtils import org.broadinstitute.gatk.queue.QScript @@ -36,21 +36,22 @@ class Sage(val root: Configurable) extends QScript with MultiSampleQScript { var transcriptome: Option[File] = config("transcriptome") var tagsLibrary: Option[File] = config("tags_library") - override def defaults = ConfigUtils.mergeMaps(Map("bowtie" -> Map( - "m" -> 1, - "k" -> 1, - "best" -> true, - "strata" -> true, - "seedmms" -> 1 - ), "mapping" -> Map( - "aligner" -> "bowtie", - "skip_flexiprep" -> true, - "skip_markduplicates" -> true - ), "flexiprep" -> Map( - "skip_clip" -> true, - "skip_trim" -> true + override def defaults = Map( + "bowtie" -> Map( + "m" -> 1, + "k" -> 1, + "best" -> true, + "strata" -> true, + "seedmms" -> 1 + ), "mapping" -> Map( + "aligner" -> "bowtie", + "skip_flexiprep" -> true, + "skip_markduplicates" -> true + ), "flexiprep" -> Map( + "skip_clip" -> true, + "skip_trim" -> true + ), "strandSensitive" -> true ) - ), super.defaults) def summaryFile: File = new File(outputDir, "Sage.summary.json") @@ -88,6 +89,8 @@ class Sage(val root: Configurable) extends QScript with MultiSampleQScript { mapping.sampleId = Some(sampleId) protected def addJobs(): Unit = { + inputFiles :+= new InputFile(inputFastq, config("R1_md5")) + flexiprep.outputDir = new File(libDir, "flexiprep/") flexiprep.input_R1 = inputFastq flexiprep.init() @@ -146,7 +149,9 @@ class Sage(val root: Configurable) extends QScript with MultiSampleQScript { } def biopetScript() { - val squishBed = SquishBed(this, countBed.get, outputDir) + val squishBed = new SquishBed(this) + squishBed.input = countBed.get + squishBed.output = new File(outputDir, countBed.get.getName.stripSuffix(".bed") + ".squish.bed") add(squishBed) squishedCountBed = squishBed.output diff --git a/public/shiva/pom.xml b/public/shiva/pom.xml index 36c227e8496da9a602bcdf4c60afe07880dc8af4..560818c30c444e703432b9160360c97337931025 100644 --- a/public/shiva/pom.xml +++ b/public/shiva/pom.xml @@ -22,7 +22,7 @@ <parent> <artifactId>Biopet</artifactId> <groupId>nl.lumc.sasc</groupId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> </parent> <modelVersion>4.0.0</modelVersion> @@ -32,7 +32,7 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> <version>${project.version}</version> </dependency> <dependency> diff --git a/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/outputVcfFiles.ssp b/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/outputVcfFiles.ssp index bd00a8618bee788821e683ac9c32f6837d3c6464..192cfd2700e8622a24e8c56ac746a7c3f2db4756 100644 --- a/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/outputVcfFiles.ssp +++ b/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/outputVcfFiles.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(nl.lumc.sasc.biopet.pipelines.bammetrics.BammetricsReport) #import(java.io.File) diff --git a/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/sampleVariants.ssp b/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/sampleVariants.ssp index 91247a368c7194dfbe0ab9cae4073c933e2d21ae..8bc095e84ac3d566d3b0f453ecb0eeccd93f553e 100644 --- a/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/sampleVariants.ssp +++ b/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/sampleVariants.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) #import(nl.lumc.sasc.biopet.pipelines.shiva.ShivaReport) #import(java.io.File) diff --git a/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/shivaFront.ssp b/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/shivaFront.ssp index 4ed090b7cc93d1919acf38924bfbc7c31ce585fc..5721d22515ced92c9102565df158c20be80a2807 100644 --- a/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/shivaFront.ssp +++ b/public/shiva/src/main/resources/nl/lumc/sasc/biopet/pipelines/shiva/shivaFront.ssp @@ -1,4 +1,4 @@ -#import(nl.lumc.sasc.biopet.core.summary.Summary) +#import(nl.lumc.sasc.biopet.utils.summary.Summary) #import(nl.lumc.sasc.biopet.core.report.ReportPage) <%@ var summary: Summary %> <%@ var rootPath: String %> diff --git a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/Shiva.scala b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/Shiva.scala index 668a01f226a43e8cf087cc575d3dab2ae71f0505..a7d04155b164df95ffad2753ff6a8395f57520c5 100644 --- a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/Shiva.scala +++ b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/Shiva.scala @@ -16,7 +16,7 @@ package nl.lumc.sasc.biopet.pipelines.shiva import nl.lumc.sasc.biopet.core.PipelineCommand -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.queue.QScript /** diff --git a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaReport.scala b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaReport.scala index bb5a7f765f69fd9dc7e63c2508f8c476a4b1d076..7e32d72fea38d2e2a2e243604a42438703817946 100644 --- a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaReport.scala +++ b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaReport.scala @@ -17,10 +17,10 @@ package nl.lumc.sasc.biopet.pipelines.shiva import java.io.{ File, PrintWriter } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.report._ -import nl.lumc.sasc.biopet.core.summary.{ Summary, SummaryValue } -import nl.lumc.sasc.biopet.extensions.rscript.StackedBarPlot +import nl.lumc.sasc.biopet.utils.summary.{ Summary, SummaryValue } +import nl.lumc.sasc.biopet.utils.rscript.StackedBarPlot import nl.lumc.sasc.biopet.pipelines.bammetrics.BammetricsReport import nl.lumc.sasc.biopet.pipelines.flexiprep.FlexiprepReport diff --git a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaSvCalling.scala b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaSvCalling.scala index e370e3b11034df12c5ebce0e90ce4eadae8b1006..79a5219751a35691885b4f61365f4d71b0333105 100644 --- a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaSvCalling.scala +++ b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaSvCalling.scala @@ -15,18 +15,16 @@ */ package nl.lumc.sasc.biopet.pipelines.shiva -import java.io.File - import htsjdk.samtools.SamReaderFactory -import nl.lumc.sasc.biopet.core.config.Configurable import nl.lumc.sasc.biopet.core.summary.SummaryQScript -import nl.lumc.sasc.biopet.core.{ PipelineCommand, BiopetQScript, Reference, SampleLibraryTag } +import nl.lumc.sasc.biopet.core.{ PipelineCommand, Reference, SampleLibraryTag } import nl.lumc.sasc.biopet.extensions.breakdancer.Breakdancer import nl.lumc.sasc.biopet.extensions.clever.CleverCaller import nl.lumc.sasc.biopet.extensions.delly.Delly -import nl.lumc.sasc.biopet.tools.VcfStats +import nl.lumc.sasc.biopet.utils.Logging +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.queue.QScript -import org.broadinstitute.gatk.utils.commandline.Input + import scala.collection.JavaConversions._ /** @@ -70,7 +68,7 @@ class ShivaSvCalling(val root: Configurable) extends QScript with SummaryQScript def biopetScript(): Unit = { for (cal <- configCallers) { if (!callersList.exists(_.name == cal)) - BiopetQScript.addError("variantcaller '" + cal + "' does not exist, possible to use: " + callersList.map(_.name).mkString(", ")) + Logging.addError("variantcaller '" + cal + "' does not exist, possible to use: " + callersList.map(_.name).mkString(", ")) } val callers = callersList.filter(x => configCallers.contains(x.name)) diff --git a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaTrait.scala b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaTrait.scala index 3f65bf2a20864e2017f512981daa0d37aca52772..dafb2e1ccc5a72807aa37cc6858711afbfe78c23 100644 --- a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaTrait.scala +++ b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaTrait.scala @@ -136,51 +136,56 @@ trait ShivaTrait extends MultiSampleQScript with SummaryQScript with Reference { case (true, _) => mapping.foreach(mapping => { mapping.input_R1 = config("R1") mapping.input_R2 = config("R2") + inputFiles :+= new InputFile(mapping.input_R1, config("R1_md5")) + mapping.input_R2.foreach(inputFiles :+= new InputFile(_, config("R2_md5"))) }) - case (false, true) => config("bam_to_fastq", default = false).asBoolean match { - case true => - val samToFastq = SamToFastq(qscript, config("bam"), - new File(libDir, sampleId + "-" + libId + ".R1.fastq"), - new File(libDir, sampleId + "-" + libId + ".R2.fastq")) - samToFastq.isIntermediate = true - qscript.add(samToFastq) - mapping.foreach(mapping => { - mapping.input_R1 = samToFastq.fastqR1 - mapping.input_R2 = Some(samToFastq.fastqR2) - }) - case false => - val inputSam = SamReaderFactory.makeDefault.open(config("bam")) - val readGroups = inputSam.getFileHeader.getReadGroups - - val readGroupOke = readGroups.forall(readGroup => { - if (readGroup.getSample != sampleId) logger.warn("Sample ID readgroup in bam file is not the same") - if (readGroup.getLibrary != libId) logger.warn("Library ID readgroup in bam file is not the same") - readGroup.getSample == sampleId && readGroup.getLibrary == libId - }) - inputSam.close() - - if (!readGroupOke) { - if (config("correct_readgroups", default = false).asBoolean) { - logger.info("Correcting readgroups, file:" + config("bam")) - val aorrg = AddOrReplaceReadGroups(qscript, config("bam"), bamFile.get) - aorrg.RGID = sampleId + "-" + libId - aorrg.RGLB = libId - aorrg.RGSM = sampleId - aorrg.isIntermediate = true - qscript.add(aorrg) - } else throw new IllegalStateException("Sample readgroup and/or library of input bamfile is not correct, file: " + bamFile + - "\nPlease note that it is possible to set 'correct_readgroups' to true in the config to automatic fix this") - } else { - val oldBamFile: File = config("bam") - val oldIndex: File = new File(oldBamFile.getAbsolutePath.stripSuffix(".bam") + ".bai") - val newIndex: File = new File(libDir, oldBamFile.getName.stripSuffix(".bam") + ".bai") - val baiLn = Ln(qscript, oldIndex, newIndex) - add(baiLn) - - val bamLn = Ln(qscript, oldBamFile, bamFile.get) - bamLn.deps :+= baiLn.output - add(bamLn) - } + case (false, true) => { + inputFiles :+= new InputFile(config("bam"), config("bam_md5")) + config("bam_to_fastq", default = false).asBoolean match { + case true => + val samToFastq = SamToFastq(qscript, config("bam"), + new File(libDir, sampleId + "-" + libId + ".R1.fastq"), + new File(libDir, sampleId + "-" + libId + ".R2.fastq")) + samToFastq.isIntermediate = true + qscript.add(samToFastq) + mapping.foreach(mapping => { + mapping.input_R1 = samToFastq.fastqR1 + mapping.input_R2 = Some(samToFastq.fastqR2) + }) + case false => + val inputSam = SamReaderFactory.makeDefault.open(config("bam")) + val readGroups = inputSam.getFileHeader.getReadGroups + + val readGroupOke = readGroups.forall(readGroup => { + if (readGroup.getSample != sampleId) logger.warn("Sample ID readgroup in bam file is not the same") + if (readGroup.getLibrary != libId) logger.warn("Library ID readgroup in bam file is not the same") + readGroup.getSample == sampleId && readGroup.getLibrary == libId + }) + inputSam.close() + + if (!readGroupOke) { + if (config("correct_readgroups", default = false).asBoolean) { + logger.info("Correcting readgroups, file:" + config("bam")) + val aorrg = AddOrReplaceReadGroups(qscript, config("bam"), bamFile.get) + aorrg.RGID = sampleId + "-" + libId + aorrg.RGLB = libId + aorrg.RGSM = sampleId + aorrg.isIntermediate = true + qscript.add(aorrg) + } else throw new IllegalStateException("Sample readgroup and/or library of input bamfile is not correct, file: " + bamFile + + "\nPlease note that it is possible to set 'correct_readgroups' to true in the config to automatic fix this") + } else { + val oldBamFile: File = config("bam") + val oldIndex: File = new File(oldBamFile.getAbsolutePath.stripSuffix(".bam") + ".bai") + val newIndex: File = new File(libDir, oldBamFile.getName.stripSuffix(".bam") + ".bai") + val baiLn = Ln(qscript, oldIndex, newIndex) + add(baiLn) + + val bamLn = Ln(qscript, oldBamFile, bamFile.get) + bamLn.deps :+= baiLn.output + add(bamLn) + } + } } case _ => logger.warn("Sample: " + sampleId + " Library: " + libId + ", no reads found") } @@ -294,7 +299,7 @@ trait ShivaTrait extends MultiSampleQScript with SummaryQScript with Reference { addAll(vc.functions) addSummaryQScript(vc) - if (config("annotation", default = true).asBoolean) { + if (config("annotation", default = false).asBoolean) { val toucan = new Toucan(this) toucan.outputDir = new File(outputDir, "annotation") toucan.inputVCF = vc.finalFile diff --git a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcalling.scala b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcalling.scala index 90b98462ff337c5140f274f64268bd93dd155e7d..d075619c1c39264adf9fba823b043110000b1668 100644 --- a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcalling.scala +++ b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcalling.scala @@ -16,7 +16,7 @@ package nl.lumc.sasc.biopet.pipelines.shiva import nl.lumc.sasc.biopet.core.PipelineCommand -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.queue.QScript /** diff --git a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcallingTrait.scala b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcallingTrait.scala index b55597d3f96994b3f6ec3a9fcf79d5f041443f1d..26302af07fb2abac8007faaf9dbc55e16fca23e1 100644 --- a/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcallingTrait.scala +++ b/public/shiva/src/main/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcallingTrait.scala @@ -18,13 +18,13 @@ package nl.lumc.sasc.biopet.pipelines.shiva import java.io.File import nl.lumc.sasc.biopet.core.summary.SummaryQScript -import nl.lumc.sasc.biopet.core.{ BiopetQScript, Reference, SampleLibraryTag } +import nl.lumc.sasc.biopet.core.{ Reference, SampleLibraryTag } import nl.lumc.sasc.biopet.extensions.bcftools.BcftoolsCall import nl.lumc.sasc.biopet.extensions.gatk.CombineVariants import nl.lumc.sasc.biopet.extensions.samtools.SamtoolsMpileup +import nl.lumc.sasc.biopet.extensions.tools.{ MpileupToVcf, VcfFilter, VcfStats } import nl.lumc.sasc.biopet.extensions.{ Bgzip, Tabix } -import nl.lumc.sasc.biopet.tools.{ MpileupToVcf, VcfFilter, VcfStats } -import nl.lumc.sasc.biopet.utils.ConfigUtils +import nl.lumc.sasc.biopet.utils.{ ConfigUtils, Logging } import org.broadinstitute.gatk.queue.function.CommandLineFunction import org.broadinstitute.gatk.utils.commandline.{ Input, Output } @@ -62,7 +62,7 @@ trait ShivaVariantcallingTrait extends SummaryQScript with SampleLibraryTag with def biopetScript(): Unit = { for (cal <- configCallers) { if (!callersList.exists(_.name == cal)) - BiopetQScript.addError("variantcaller '" + cal + "' does not exist, possible to use: " + callersList.map(_.name).mkString(", ")) + Logging.addError("variantcaller '" + cal + "' does not exist, possible to use: " + callersList.map(_.name).mkString(", ")) } val callers = callersList.filter(x => configCallers.contains(x.name)).sortBy(_.prio) @@ -199,11 +199,11 @@ trait ShivaVariantcallingTrait extends SummaryQScript with SampleLibraryTag with val vcfFilter = new VcfFilter(qscript) { override def configName = "vcffilter" - override def defaults = ConfigUtils.mergeMaps(Map("min_sample_depth" -> 8, + override def defaults = Map("min_sample_depth" -> 8, "min_alternate_depth" -> 2, "min_samples_pass" -> 1, "filter_ref_calls" -> true - ), super.defaults) + ) } vcfFilter.inputVcf = m2v.output vcfFilter.outputVcf = new File(outputDir, bamFile.getName.stripSuffix(".bam") + ".raw.filter.vcf.gz") diff --git a/public/shiva/src/test/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaTest.scala b/public/shiva/src/test/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaTest.scala index e37756ef76504268f746dfb7ceda62cf29ed2d75..31777668d11df48a3ad99a663d60aad6e7666966 100644 --- a/public/shiva/src/test/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaTest.scala +++ b/public/shiva/src/test/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaTest.scala @@ -18,10 +18,10 @@ package nl.lumc.sasc.biopet.pipelines.shiva import java.io.{ File, FileOutputStream } import com.google.common.io.Files -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import nl.lumc.sasc.biopet.extensions.bwa.BwaMem import nl.lumc.sasc.biopet.extensions.picard.{ MarkDuplicates, SortSam } -import nl.lumc.sasc.biopet.tools.VcfStats +import nl.lumc.sasc.biopet.extensions.tools.VcfStats import nl.lumc.sasc.biopet.utils.ConfigUtils import org.broadinstitute.gatk.queue.QSettings import org.scalatest.Matchers @@ -62,7 +62,6 @@ class ShivaTest extends TestNGSuite with Matchers { ConfigUtils.mergeMaps(Map("multisample_variantcalling" -> multi, "single_sample_variantcalling" -> single, "library_variantcalling" -> library), m) - } if (!sample1 && !sample2 && !sample3) { // When no samples @@ -76,8 +75,6 @@ class ShivaTest extends TestNGSuite with Matchers { val numberLibs = (if (sample1) 1 else 0) + (if (sample2) 1 else 0) + (if (sample3) 2 else 0) val numberSamples = (if (sample1) 1 else 0) + (if (sample2) 1 else 0) + (if (sample3) 1 else 0) - pipeline.functions.count(_.isInstanceOf[BwaMem]) shouldBe numberLibs - pipeline.functions.count(_.isInstanceOf[SortSam]) shouldBe numberLibs pipeline.functions.count(_.isInstanceOf[MarkDuplicates]) shouldBe (numberLibs + (if (sample3) 1 else 0)) pipeline.functions.count(_.isInstanceOf[VcfStats]) shouldBe (if (multi) 2 else 0) + @@ -88,6 +85,12 @@ class ShivaTest extends TestNGSuite with Matchers { object ShivaTest { val outputDir = Files.createTempDir() + new File(outputDir, "input").mkdirs() + def inputTouch(name: String): String = { + val file = new File(outputDir, "input" + File.separator + name) + Files.touch(file) + file.getAbsolutePath + } private def copyFile(name: String): Unit = { val is = getClass.getResourceAsStream("/" + name) @@ -106,7 +109,6 @@ object ShivaTest { "cache" -> true, "dir" -> "test", "vep_script" -> "test", - "reference" -> (outputDir + File.separator + "ref.fa"), "reference_fasta" -> (outputDir + File.separator + "ref.fa"), "gatk_jar" -> "test", "samtools" -> Map("exe" -> "test"), @@ -131,8 +133,8 @@ object ShivaTest { val sample1 = Map( "samples" -> Map("sample1" -> Map("libraries" -> Map( "lib1" -> Map( - "R1" -> "1_1_R1.fq", - "R2" -> "1_1_R2.fq" + "R1" -> inputTouch("1_1_R1.fq"), + "R2" -> inputTouch("1_1_R2.fq") ) ) ))) @@ -140,8 +142,8 @@ object ShivaTest { val sample2 = Map( "samples" -> Map("sample2" -> Map("libraries" -> Map( "lib1" -> Map( - "R1" -> "2_1_R1.fq", - "R2" -> "2_1_R2.fq" + "R1" -> inputTouch("2_1_R1.fq"), + "R2" -> inputTouch("2_1_R2.fq") ) ) ))) @@ -149,12 +151,12 @@ object ShivaTest { val sample3 = Map( "samples" -> Map("sample3" -> Map("libraries" -> Map( "lib1" -> Map( - "R1" -> "3_1_R1.fq", - "R2" -> "3_1_R2.fq" + "R1" -> inputTouch("3_1_R1.fq"), + "R2" -> inputTouch("3_1_R2.fq") ), "lib2" -> Map( - "R1" -> "3_2_R1.fq", - "R2" -> "3_2_R2.fq" + "R1" -> inputTouch("3_2_R1.fq"), + "R2" -> inputTouch("3_2_R2.fq") ) ) ))) diff --git a/public/shiva/src/test/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcallingTest.scala b/public/shiva/src/test/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcallingTest.scala index 82e21f195aefde3199f19f5a5edf8088ea835201..dae0b974329950501369990e3a53bae077f8f071 100644 --- a/public/shiva/src/test/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcallingTest.scala +++ b/public/shiva/src/test/scala/nl/lumc/sasc/biopet/pipelines/shiva/ShivaVariantcallingTest.scala @@ -18,10 +18,10 @@ package nl.lumc.sasc.biopet.pipelines.shiva import java.io.{ File, FileOutputStream } import com.google.common.io.Files -import nl.lumc.sasc.biopet.core.config.Config +import nl.lumc.sasc.biopet.utils.config.Config import nl.lumc.sasc.biopet.extensions.Freebayes import nl.lumc.sasc.biopet.extensions.gatk.CombineVariants -import nl.lumc.sasc.biopet.tools.{ MpileupToVcf, VcfFilter } +import nl.lumc.sasc.biopet.extensions.tools.{ MpileupToVcf, VcfFilter } import nl.lumc.sasc.biopet.utils.ConfigUtils import org.apache.commons.io.FileUtils import org.broadinstitute.gatk.queue.QSettings @@ -61,7 +61,7 @@ class ShivaVariantcallingTest extends TestNGSuite with Matchers { val map = Map("variantcallers" -> callers.toList) val pipeline = initPipeline(map) - pipeline.inputBams = (for (n <- 1 to bams) yield new File("bam_" + n + ".bam")).toList + pipeline.inputBams = (for (n <- 1 to bams) yield ShivaVariantcallingTest.inputTouch("bam_" + n + ".bam")).toList val illegalArgumentException = pipeline.inputBams.isEmpty || (!raw && !bcftools && !freebayes) @@ -88,6 +88,12 @@ class ShivaVariantcallingTest extends TestNGSuite with Matchers { object ShivaVariantcallingTest { val outputDir = Files.createTempDir() + new File(outputDir, "input").mkdirs() + def inputTouch(name: String): File = { + val file = new File(outputDir, "input" + File.separator + name).getAbsoluteFile + Files.touch(file) + file + } private def copyFile(name: String): Unit = { val is = getClass.getResourceAsStream("/" + name) @@ -106,7 +112,6 @@ object ShivaVariantcallingTest { "cache" -> true, "dir" -> "test", "vep_script" -> "test", - "reference" -> (outputDir + File.separator + "ref.fa"), "reference_fasta" -> (outputDir + File.separator + "ref.fa"), "gatk_jar" -> "test", "samtools" -> Map("exe" -> "test"), diff --git a/public/src/src/test/resources/log4j.properties b/public/src/src/test/resources/log4j.properties new file mode 100644 index 0000000000000000000000000000000000000000..52fb824b0a8088346ed39f9de816309d0569ecf6 --- /dev/null +++ b/public/src/src/test/resources/log4j.properties @@ -0,0 +1,15 @@ +# +# Due to the license issue with GATK, this part of Biopet can only be used inside the +# LUMC. Please refer to https://git.lumc.nl/biopet/biopet/wikis/home for instructions +# on how to use this protected part of biopet or contact us at sasc@lumc.nl +# + +# Set root logger level to DEBUG and its only appender to A1. +log4j.rootLogger=ERROR, A1 + +# A1 is set to be a ConsoleAppender. +log4j.appender.A1=org.apache.log4j.ConsoleAppender + +# A1 uses PatternLayout. +log4j.appender.A1.layout=org.apache.log4j.PatternLayout +log4j.appender.A1.layout.ConversionPattern=%-5p [%d] [%C{1}] - %m%n \ No newline at end of file diff --git a/public/toucan/pom.xml b/public/toucan/pom.xml index a427544909cb784d53e34504eea1c77bd7c3e5d5..ff6f74eba21281d4010925053466265e017f95ee 100644 --- a/public/toucan/pom.xml +++ b/public/toucan/pom.xml @@ -25,7 +25,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> @@ -35,7 +35,12 @@ <dependencies> <dependency> <groupId>nl.lumc.sasc</groupId> - <artifactId>BiopetFramework</artifactId> + <artifactId>BiopetCore</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>nl.lumc.sasc</groupId> + <artifactId>BiopetToolsExtensions</artifactId> <version>${project.version}</version> </dependency> </dependencies> diff --git a/public/toucan/src/main/scala/nl/lumc/sasc/biopet/pipelines/toucan/Toucan.scala b/public/toucan/src/main/scala/nl/lumc/sasc/biopet/pipelines/toucan/Toucan.scala index ccd47d551363be2bf65d0fd0176f02f8092060a1..6ee0776713a5391296719ac6edc83819777fcf31 100644 --- a/public/toucan/src/main/scala/nl/lumc/sasc/biopet/pipelines/toucan/Toucan.scala +++ b/public/toucan/src/main/scala/nl/lumc/sasc/biopet/pipelines/toucan/Toucan.scala @@ -15,11 +15,11 @@ */ package nl.lumc.sasc.biopet.pipelines.toucan -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.summary.SummaryQScript import nl.lumc.sasc.biopet.core.{ BiopetQScript, PipelineCommand, Reference } import nl.lumc.sasc.biopet.extensions.VariantEffectPredictor -import nl.lumc.sasc.biopet.tools.{ VcfWithVcf, VepNormalizer } +import nl.lumc.sasc.biopet.extensions.tools.{ VcfWithVcf, VepNormalizer } import nl.lumc.sasc.biopet.utils.ConfigUtils import org.broadinstitute.gatk.queue.QScript @@ -35,11 +35,12 @@ class Toucan(val root: Configurable) extends QScript with BiopetQScript with Sum var inputVCF: File = _ def init(): Unit = { + inputFiles :+= new InputFile(inputVCF) } - override def defaults = ConfigUtils.mergeMaps(Map( + override def defaults = Map( "varianteffectpredictor" -> Map("everything" -> true) - ), super.defaults) + ) //defaults ++= Map("varianteffectpredictor" -> Map("everything" -> true)) diff --git a/public/yamsvp/pom.xml b/public/yamsvp/pom.xml index 4cf7ab912c09a8e9e07d8449a2cc206cf0b45905..48742ab6b693c9d2be72cbc2b3b61682ed3ffe14 100644 --- a/public/yamsvp/pom.xml +++ b/public/yamsvp/pom.xml @@ -25,7 +25,7 @@ <parent> <groupId>nl.lumc.sasc</groupId> <artifactId>Biopet</artifactId> - <version>0.5.0-DEV</version> + <version>0.5.0-SNAPSHOT</version> <relativePath>../</relativePath> </parent> diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/pindel/Pindel.scala b/public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/extensions/pindel/Pindel.scala similarity index 98% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/pindel/Pindel.scala rename to public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/extensions/pindel/Pindel.scala index 1939445765451d40f3c8596f04370d21e4c889cd..10795533490ad7f0b696de22a64ca86d54a3492d 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/pindel/Pindel.scala +++ b/public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/extensions/pindel/Pindel.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.pindel import java.io.File import nl.lumc.sasc.biopet.core.{ BiopetQScript, PipelineCommand } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.queue.QScript /// Pindel is actually a mini pipeline executing binaries from the pindel package diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelCaller.scala b/public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelCaller.scala similarity index 97% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelCaller.scala rename to public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelCaller.scala index e2d5ce2db18367d259af3c2ee21b1d3c0870acd4..cbe957e79fe1c3e013fb399fe30859e144ae2ed5 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelCaller.scala +++ b/public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelCaller.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.pindel import java.io.File import nl.lumc.sasc.biopet.core.BiopetCommandLineFunction -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } class PindelCaller(val root: Configurable) extends BiopetCommandLineFunction { diff --git a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelConfig.scala b/public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelConfig.scala similarity index 96% rename from public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelConfig.scala rename to public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelConfig.scala index 4ca3f9e2ca5ea2ea2f791dab2cdd6b577559304a..497fe21e342ed754e21f6be253b1ae5fdd0813fb 100644 --- a/public/biopet-framework/src/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelConfig.scala +++ b/public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/extensions/pindel/PindelConfig.scala @@ -18,7 +18,7 @@ package nl.lumc.sasc.biopet.extensions.pindel import java.io.File import nl.lumc.sasc.biopet.core.{ BiopetJavaCommandLineFunction, ToolCommand } -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import org.broadinstitute.gatk.utils.commandline.{ Argument, Input, Output } class PindelConfig(val root: Configurable) extends BiopetJavaCommandLineFunction { @@ -32,7 +32,7 @@ class PindelConfig(val root: Configurable) extends BiopetJavaCommandLineFunction @Argument(doc = "Insertsize") var insertsize: Option[Int] = _ - override def commandLine = super.commandLine + + override def cmdLine = super.cmdLine + "-i" + required(input) + "-s" + required(insertsize) + "-o" + required(output) diff --git a/public/yamsvp/src/main/scala/nl/lumc/sasc/biopet/pipelines/yamsvp/Yamsvp.scala b/public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/pipelines/yamsvp/Yamsvp.scala similarity index 99% rename from public/yamsvp/src/main/scala/nl/lumc/sasc/biopet/pipelines/yamsvp/Yamsvp.scala rename to public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/pipelines/yamsvp/Yamsvp.scala index e6def691af3d699c5087be58d94366bd85ae80d5..a0ade5706206c60d9a1a470002c783d0b98a3590 100644 --- a/public/yamsvp/src/main/scala/nl/lumc/sasc/biopet/pipelines/yamsvp/Yamsvp.scala +++ b/public/yamsvp/src_old/main/scala/nl/lumc/sasc/biopet/pipelines/yamsvp/Yamsvp.scala @@ -21,7 +21,7 @@ package nl.lumc.sasc.biopet.pipelines.yamsvp import java.io.File -import nl.lumc.sasc.biopet.core.config.Configurable +import nl.lumc.sasc.biopet.utils.config.Configurable import nl.lumc.sasc.biopet.core.{ MultiSampleQScript, PipelineCommand } import nl.lumc.sasc.biopet.extensions.Ln import nl.lumc.sasc.biopet.extensions.breakdancer.Breakdancer diff --git a/public/yamsvp/src_old/test/resources/log4j.properties b/public/yamsvp/src_old/test/resources/log4j.properties new file mode 100644 index 0000000000000000000000000000000000000000..501af67582a546db584c8538b28cb6f9e07f1692 --- /dev/null +++ b/public/yamsvp/src_old/test/resources/log4j.properties @@ -0,0 +1,25 @@ +# +# Biopet is built on top of GATK Queue for building bioinformatic +# pipelines. It is mainly intended to support LUMC SHARK cluster which is running +# SGE. But other types of HPC that are supported by GATK Queue (such as PBS) +# should also be able to execute Biopet tools and pipelines. +# +# Copyright 2014 Sequencing Analysis Support Core - Leiden University Medical Center +# +# Contact us at: sasc@lumc.nl +# +# A dual licensing mode is applied. The source code within this project that are +# not part of GATK Queue is freely available for non-commercial use under an AGPL +# license; For commercial users or users who do not want to follow the AGPL +# license, please contact us to obtain a separate license. +# + +# Set root logger level to DEBUG and its only appender to A1. +log4j.rootLogger=ERROR, A1 + +# A1 is set to be a ConsoleAppender. +log4j.appender.A1=org.apache.log4j.ConsoleAppender + +# A1 uses PatternLayout. +log4j.appender.A1.layout=org.apache.log4j.PatternLayout +log4j.appender.A1.layout.ConversionPattern=%-5p [%d] [%C{1}] - %m%n \ No newline at end of file