Commit acbbc173 authored by Wai Yi Leung's avatar Wai Yi Leung
Browse files

Merge branch 'feature-qiime' into 'develop'

Feature qiime

Mainly for project 122

See merge request !287
parents 52776123 6e62a69e
......@@ -4,49 +4,52 @@
Gears is a metagenomics pipeline. (``GE``nome ``A``nnotation of ``R``esidual ``S``equences). One can use this pipeline to identify contamination in sequencing runs on either raw FastQ files or BAM files.
In case of BAM file as input, it will extract the unaligned read(pair) sequences for analysis.
Analysis result is reported in a sunburst graph, which is visible and navigatable in a webbrowser.
Analysis result is reported in a krona graph, which is visible and navigatable in a webbrowser.
Pipeline analysis components include:
- Kraken, DerrickWood [GitHub](https://github.com/DerrickWood/kraken)
- [Kraken, DerrickWood](https://github.com/DerrickWood/kraken)
- [Qiime closed reference](http://qiime.org)
- [Qiime rtax](http://qiime.org) (**Experimental**)
- SeqCount (**Experimental**)
## Gears
## Example
This pipeline is used to analyse a group of samples. This pipeline only accepts fastq files. The fastq files first get trimmed and clipped with [Flexiprep](Flexiprep). This can be disabled with the config flags of [Flexiprep](Flexiprep). The samples can be specified with a sample config file, see [Config](../general/Config)
To get the help menu:
### Config
``` bash
biopet pipeline Gears -h
... default config ...
Arguments for Gears:
-R1,--fastqr1 <fastqr1> R1 reads in FastQ format
-R2,--fastqr2 <fastqr2> R2 reads in FastQ format
-bam,--bamfile <bamfile> All unmapped reads will be extracted from this bam for analysis
--outputname <outputname> Undocumented option
-sample,--sampleid <sampleid> Sample ID
-library,--libid <libid> Library ID
-config,--config_file <config_file> JSON / YAML config file(s)
-cv,--config_value <config_value> Config values, value should be formatted like 'key=value' or
'path:path:key=value'
-DSC,--disablescatter Disable all scatters
| Key | Type | default | Function |
| --- | ---- | ------- | -------- |
| gears_use_kraken | Boolean | true | Run fastq file with kraken |
| gears_use_qiime_closed | Boolean | false | Run fastq files with qiime with the closed reference module |
| gears_use_qiime_rtax | Boolean | false | Run fastq files with qiime with the rtax module |
| gears_use_seq_count | Boolean | false | Produces raw count files |
### Example
To start the pipeline (remove `-run` for a dry run):
``` bash
biopet pipeline Gears -run \
-config mySettings.json -config samples.json
```
Note that the pipeline also works on unpaired reads where one should only provide R1.
## GearsSingle
This pipeline can be used to analyse a single sample, this can be fastq files or a bam file. When a bam file is given only the unmapped reads are extracted.
### Example
To start the pipeline (remove `-run` for a dry run):
``` bash
biopet pipeline Gears -run \
biopet pipeline GearsSingle -run \
-R1 myFirstReadPair -R2 mySecondReadPair -sample mySampleName \
-library myLibname -config mySettings.json
```
## Configuration and flags
### Commandline flags
For technical reasons, single sample pipelines, such as this pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.
......@@ -58,17 +61,22 @@ Command line flags for Gears are:
| -R2 | --input_r2 | Path (optional) | Path to second read pair fastq file. |
| -bam | --bamfile | Path (optional) | Path to bam file. |
| -sample | --sampleid | String (**required**) | Name of sample |
| -library | --libid | String (**required**) | Name of library |
| -library | --libid | String (optional) | Name of library |
If `-R2` is given, the pipeline will assume a paired-end setup. `-bam` is mutualy exclusive with the `-R1` and `-R2` flags. Either specify `-bam` or `-R1` and/or `-R2`.
### Config
| Key | Type | default | Function |
| --- | ---- | ------- | -------- |
| gears_use_kraken | Boolean | true | Run fastq file with kraken |
| gears_use_qiime_closed | Boolean | false | Run fastq files with qiime with the closed reference module |
| gears_use_qiime_rtax | Boolean | false | Run fastq files with qiime with the rtax module |
| gears_use_seq_count | Boolean | false | Produces raw count files |
### Result files
## Result files
The results of `Gears` are stored in the following files:
The results of `GearsSingle` are stored in the following files:
| File suffix | Application | Content | Description |
| ----------- | ----------- | ------- | ----------- |
......
#import(java.io.File)
#import(scala.io.Source)
<%@ var rootPath: String %>
<%@ var kronaXml: File %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="utf-8"/>
<link rel="shortcut icon" href="${rootPath}ext/img/krona/favicon.ico"/>
<!--<script id="notfound">window.onload=function(){document.body.innerHTML="Could not get resources from \"http://krona.sourceforge.net\"."}</script>-->
<script src="${rootPath}ext/js/krona-2.0.js"></script>
</head>
<body>
<img id="hiddenImage" src="${rootPath}ext/img/krona/hidden.png" style="display:none"/>
<img id="loadingImage" src="${rootPath}ext/img/krona/loading.gif" style="display:none"/>
<noscript>Javascript must be enabled to view this page.</noscript>
<div style="display:none">
<%
val reader = Source.fromFile(kronaXml)
val xml = reader.getLines().mkString("\n")
reader.close()
%>
${unescape(xml)}
</div></body></html>
......@@ -148,7 +148,7 @@
${name}
</h3>
</div>
${unescape(section.render(args))}
${unescape(section.render(args ++ Map("args" -> args)))}
</div>
#end
</div>
......
......@@ -94,7 +94,7 @@ trait PipelineCommand extends MainCommand with GatkLogging with ImplicitConversi
}
if (!args.contains("-retry") && !args.contains("--retry_failed")) {
val retry: Int = globalConfig(pipelineName, Nil, "retry", default = 5)
logger.info("No retry flag found, ")
logger.info(s"No retry flag found, set to default value of '$retry'")
argv ++= List("-retry", retry.toString)
}
BiopetQCommandLine.main(argv)
......
package nl.lumc.sasc.biopet.extensions
import java.io.File
import nl.lumc.sasc.biopet.core.{ Version, BiopetCommandLineFunction }
import nl.lumc.sasc.biopet.utils.config.Configurable
import org.broadinstitute.gatk.utils.commandline.Input
import scala.util.matching.Regex
/**
* Created by pjvanthof on 16/12/15.
*/
class Flash(val root: Configurable) extends BiopetCommandLineFunction with Version {
executable = config("exe", default = "flash", freeVar = false)
/** Command to get version of executable */
def versionCommand: String = executable + " --version"
/** Regex to get version from version command output */
def versionRegex: Regex = """FLASH (v.*)""".r
@Input(required = true)
var fastqR1: File = _
@Input(required = true)
var fastqR2: File = _
var minOverlap: Option[Int] = config("min_overlap")
var maxOverlap: Option[Int] = config("max_overlap")
var maxMismatchDensity: Option[Double] = config("max_mismatch_density")
var allowOuties: Boolean = config("allow_outies", default = false)
var phredOffset: Option[Int] = config("phred_offset")
var readLen: Option[Int] = config("read_len")
var fragmentLen: Option[Int] = config("fragment_len")
var fragmentLenStddev: Option[Int] = config("fragment_len_stddev")
var capMismatchQuals: Boolean = config("cap_mismatch_quals", default = false)
var interleavedInput: Boolean = config("interleaved-input", default = false)
var interleavedOutput: Boolean = config("interleaved_output", default = false)
var interleaved: Boolean = config("interleaved", default = false)
var tabDelimitedInput: Boolean = config("tab_delimited_input", default = false)
var tabDelimitedOutput: Boolean = config("tab_delimited_output", default = false)
var outputPrefix: String = config("output_prefix", default = "out")
var outputDirectory: File = _
var compress: Boolean = config("compress", default = false)
var compressProg: Option[String] = config("compress_prog")
var compressProgArgs: Option[String] = config("compress_prog_args")
var outputSuffix: Option[String] = config("output_suffix")
private def suffix = outputSuffix.getOrElse("fastq") + (if (compress) ".gz" else "")
def combinedFastq = new File(outputDirectory, s"$outputPrefix.extendedFrags.$suffix")
def notCombinedR1 = new File(outputDirectory, s"$outputPrefix.notCombined_1.$suffix")
def notCombinedR2 = new File(outputDirectory, s"$outputPrefix.notCombined_2.$suffix")
def outputHistogramTable = new File(outputDirectory, s"$outputPrefix.hist")
def outputHistogram = new File(outputDirectory, s"$outputPrefix.histogram")
override def beforeGraph(): Unit = {
super.beforeGraph()
outputFiles :::= combinedFastq :: notCombinedR1 ::
notCombinedR2 :: outputHistogramTable :: outputHistogram :: Nil
}
def cmdLine = executable +
optional("-m", minOverlap) +
optional("-M", maxOverlap) +
optional("-x", maxMismatchDensity) +
conditional(allowOuties, "--allow-outies") +
optional("--phred-offset", phredOffset) +
optional("--read-len", readLen) +
optional("--fragment-len", fragmentLen) +
optional("--fragment-len-stddev", fragmentLenStddev) +
conditional(capMismatchQuals, "--cap-mismatch-quals") +
conditional(interleavedInput, "--interleaved-input") +
conditional(interleavedOutput, "--interleaved-output") +
conditional(interleaved, "--interleaved") +
conditional(tabDelimitedInput, "--tab-delimited-input") +
conditional(tabDelimitedOutput, "--tab-delimited-output") +
optional("--output-prefix", outputPrefix) +
required("--output-directory", outputDirectory) +
conditional(compress, "--compress") +
optional("--compress-prog", compressProg) +
optional("--compress-prog-args", compressProgArgs) +
optional("--output-suffix", outputSuffix) +
optional("--threads", threads) +
required(fastqR1) +
required(fastqR2)
}
......@@ -49,52 +49,32 @@ class Ln(val root: Configurable) extends InProcessFunction with Configurable {
/** return commandline to execute */
lazy val cmd: String = {
lazy val inCanonical: String = {
val inCanonical: String = {
// need to remove "/~" to correctly expand path with tilde
input.getAbsolutePath.replace("/~", "")
}
lazy val outCanonical: String = output.getAbsolutePath.replace("/~", "")
val outCanonical: String = output.getAbsolutePath.replace("/~", "")
lazy val inToks: Array[String] = inCanonical.split(File.separator)
if (relative) {
val inToks: Array[String] = inCanonical.split(File.separator)
lazy val outToks: Array[String] = outCanonical.split(File.separator)
val outToks: Array[String] = outCanonical.split(File.separator)
lazy val commonPrefixLength: Int = {
val maxLength = scala.math.min(inToks.length, outToks.length)
var i: Int = 0
while (i < maxLength && inToks(i) == outToks(i)) i += 1
i
}
val commonPrefixLength: Int = {
val maxLength = scala.math.min(inToks.length, outToks.length)
var i: Int = 0
while (i < maxLength && inToks(i) == outToks(i)) i += 1
i
}
lazy val inUnique: String = {
inToks.slice(commonPrefixLength, inToks.length).mkString(File.separator)
}
val inUnique = inToks.slice(commonPrefixLength, inToks.length)
lazy val outUnique: String = {
outToks.slice(commonPrefixLength, outToks.length).mkString(File.separator)
}
val outUnique = outToks.slice(commonPrefixLength, outToks.length)
lazy val inRelative: String = {
// calculate 'distance' from output directory to input
// which is the number of directory walks required to get to the inUnique directory from outDir
val dist =
// relative path differs depending on which of the input or target is in the 'higher' directory
if (inToks.length > outToks.length)
scala.math.max(0, inUnique.split(File.separator).length - 1)
else
scala.math.max(0, outUnique.split(File.separator).length - 1)
val result =
if (dist == 0 || inToks.length > outToks.length)
inUnique
else
((".." + File.separator) * dist) + inUnique
result
}
val inRelative: String =
((".." + File.separator) * (outUnique.length - 1)) + inUnique.mkString(File.separator)
if (relative) {
// workaround until we have `ln` that works with relative path (i.e. `ln -r`)
"ln -s " + inRelative + " " + outCanonical
} else {
......
......@@ -54,10 +54,10 @@ object Zcat {
zcat
}
def apply(root: Configurable, input: List[File], output: File): Zcat = {
def apply(root: Configurable, input: List[File], output: File = null): Zcat = {
val zcat = new Zcat(root)
zcat.input = input
zcat.output = output
if (output != null) zcat.output = output
zcat
}
}
\ No newline at end of file
......@@ -53,7 +53,7 @@ class Kraken(val root: Configurable) extends BiopetCommandLineFunction with Vers
def versionCommand = executable + " --version"
override def defaultCoreMemory = 8.0
override def defaultCoreMemory = 15.0
override def defaultThreads = 4
......
......@@ -32,7 +32,11 @@ class KrakenReport(val root: Configurable) extends BiopetCommandLineFunction wit
override def defaultCoreMemory = 4.0
override def defaultThreads = 1
def versionCommand = new File(new File(executable).getParent, "kraken").getAbsolutePath + " --version"
def versionCommand = {
val exe = new File(new File(executable).getParent, "kraken")
if (exe.exists()) exe.getAbsolutePath + " --version"
else executable + " --version"
}
var db: File = config("db")
var show_zeros: Boolean = config("show_zeros", default = false)
......@@ -43,10 +47,9 @@ class KrakenReport(val root: Configurable) extends BiopetCommandLineFunction wit
@Output(doc = "Output path kraken report")
var output: File = _
def cmdLine: String = {
val cmd: String = required(executable) + "--db " + required(db) +
conditional(show_zeros, "--show-zeros") +
required(input.getAbsolutePath) + " > " + required(output.getAbsolutePath)
cmd
}
def cmdLine: String = required(executable) +
required("--db", db) +
conditional(show_zeros, "--show-zeros") +
required(input) +
" > " + required(output)
}
package nl.lumc.sasc.biopet.extensions.qiime
import java.io.File
import nl.lumc.sasc.biopet.core.{ Version, BiopetCommandLineFunction }
import nl.lumc.sasc.biopet.utils.config.Configurable
import org.broadinstitute.gatk.utils.commandline.Input
/**
* Created by pjvan_thof on 12/4/15.
*/
class AssignTaxonomy(val root: Configurable) extends BiopetCommandLineFunction with Version {
executable = config("exe", default = "assign_taxonomy.py")
@Input(required = true)
var inputFasta: File = _
@Input(required = false)
var read_1_seqs_fp: Option[File] = None
@Input(required = false)
var read_2_seqs_fp: Option[File] = None
@Input(required = false)
var id_to_taxonomy_fp: Option[File] = config("id_to_taxonomy_fp")
@Input(required = false)
var reference_seqs_fp: Option[File] = config("reference_seqs_fp")
@Input(required = false)
var training_data_properties_fp: Option[File] = config("training_data_properties_fp")
var single_ok: Boolean = config("single_ok", default = false)
var no_single_ok_generic: Boolean = config("no_single_ok_generic", default = false)
var amplicon_id_regex: Option[String] = config("amplicon_id_regex")
var header_id_regex: Option[String] = config("header_id_regex")
var assignment_method: Option[String] = config("assignment_method")
var sortmerna_db: Option[String] = config("sortmerna_db")
var sortmerna_e_value: Option[String] = config("sortmerna_e_value")
var sortmerna_coverage: Option[String] = config("sortmerna_coverage")
var sortmerna_best_N_alignments: Option[String] = config("sortmerna_best_N_alignments")
var sortmerna_threads: Option[String] = config("sortmerna_threads")
var blast_db: Option[String] = config("blast_db")
var confidence: Option[String] = config("confidence")
var min_consensus_fraction: Option[String] = config("min_consensus_fraction")
var similarity: Option[String] = config("similarity")
var uclust_max_accepts: Option[String] = config("uclust_max_accepts")
var rdp_max_memory: Option[String] = config("rdp_max_memory")
var blast_e_value: Option[String] = config("blast_e_value")
var outputDir: File = _
def versionCommand = executable + " --version"
def versionRegex = """Version: (.*)""".r
override def defaultCoreMemory = 4.0
override def beforeGraph(): Unit = {
super.beforeGraph()
require(outputDir != null)
}
def cmdLine = executable +
required("-i", inputFasta) +
optional("--read_1_seqs_fp", read_1_seqs_fp) +
optional("--read_2_seqs_fp", read_2_seqs_fp) +
optional("-t", id_to_taxonomy_fp) +
optional("-r", reference_seqs_fp) +
optional("-p", training_data_properties_fp) +
optional("--amplicon_id_regex", amplicon_id_regex) +
optional("--header_id_regex", header_id_regex) +
optional("--assignment_method", assignment_method) +
optional("--sortmerna_db", sortmerna_db) +
optional("--sortmerna_e_value", sortmerna_e_value) +
optional("--sortmerna_coverage", sortmerna_coverage) +
optional("--sortmerna_best_N_alignments", sortmerna_best_N_alignments) +
optional("--sortmerna_threads", sortmerna_threads) +
optional("--blast_db", blast_db) +
optional("--confidence", confidence) +
optional("--min_consensus_fraction", min_consensus_fraction) +
optional("--similarity", similarity) +
optional("--uclust_max_accepts", uclust_max_accepts) +
optional("--rdp_max_memory", rdp_max_memory) +
optional("--blast_e_value", blast_e_value) +
required("--output_dir", outputDir) +
conditional(single_ok, "--single_ok") +
conditional(no_single_ok_generic, "--no_single_ok_generic")
}
package nl.lumc.sasc.biopet.extensions.qiime
import java.io.File
import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, Version }
import nl.lumc.sasc.biopet.utils.config.Configurable
import org.broadinstitute.gatk.utils.commandline.{ Input, Output }
/**
* Created by pjvan_thof on 12/10/15.
*/
class MergeOtuMaps(val root: Configurable) extends BiopetCommandLineFunction with Version {
executable = config("exe", default = "merge_otu_maps.py")
def versionCommand = executable + " --version"
def versionRegex = """Version: (.*)""".r
@Input(required = true)
var input: List[File] = Nil
@Output(required = true)
var outputFile: File = _
var failures_fp: Option[File] = None
override def beforeGraph(): Unit = {
super.beforeGraph()
require(input.nonEmpty)
require(outputFile != null)
}
def cmdLine = executable +
(input match {
case l: List[_] if l.nonEmpty => required("-i", l.mkString(","))
case _ => ""
}) +
required("-o", outputFile) +
optional("--failures_fp", failures_fp)
}
\ No newline at end of file
package nl.lumc.sasc.biopet.extensions.qiime
import java.io.File
import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, Version }
import nl.lumc.sasc.biopet.utils.config.Configurable
import org.broadinstitute.gatk.utils.commandline.{ Output, Input }
/**
* Created by pjvan_thof on 12/10/15.
*/
class MergeOtuTables(val root: Configurable) extends BiopetCommandLineFunction with Version {
executable = config("exe", default = "merge_otu_tables.py")
def versionCommand = executable + " --version"
def versionRegex = """Version: (.*)""".r
@Input(required = true)
var input: List[File] = Nil
@Output(required = true)
var outputFile: File = _
override def beforeGraph(): Unit = {
super.beforeGraph()
require(input.nonEmpty)
require(outputFile != null)
}
def cmdLine = executable +
(input match {
case l: List[_] if l.nonEmpty => required("-i", l.mkString(","))
case _ => ""
}) +
required("-o", outputFile)
}
\ No newline at end of file
package nl.lumc.sasc.biopet.extensions.qiime
import java.io.File
import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, Version }
import nl.lumc.sasc.biopet.utils.config.Configurable
import org.broadinstitute.gatk.utils.commandline.Input
/**
* Created by pjvan_thof on 12/4/15.
*/
class PickClosedReferenceOtus(val root: Configurable) extends BiopetCommandLineFunction with Version {
executable = config("exe", default = "pick_closed_reference_otus.py")
@Input(required = true)
var inputFasta: File = _
var outputDir: File = null
override def defaultThreads = 2
override def defaultCoreMemory = 10.0
def versionCommand = executable + " --version"
def versionRegex = """Version: (.*)""".r
@Input(required = false)
var parameter_fp: Option[File] = config("parameter_fp")
@Input(required = false)
var reference_fp: Option[File] = config("reference_fp")
@Input(required = false)
var taxonomy_fp: Option[File] = config("taxonomy_fp")
var assign_taxonomy: Boolean = config("assign_taxonomy", default = false)
var force: Boolean = config("force", default = false)
var print_only: Boolean = config("print_only", default = false)
var suppress_taxonomy_assignment: Boolean = config("suppress_taxonomy_assignment", default = false)