Commit f3779b52 authored by Peter van 't Hof's avatar Peter van 't Hof
Browse files

Merge branch 'feature-small_rna' into 'develop'

Feature small rna

smallRNA pipeline TinyCap

closes #237 
closes #258 
closes #265 
includes fixes for !306

See merge request !312
parents ad79e773 17afe6c0
......@@ -13,3 +13,4 @@ target/
public/target/
protected/target/
site/
*.sc
\ No newline at end of file
......@@ -30,6 +30,7 @@
<module>../public/toucan</module>
<module>../public/shiva</module>
<module>../public/basty</module>
<module>../public/tinycap</module>
<module>../public/biopet-utils</module>
<module>../public/biopet-tools</module>
<module>../public/biopet-tools-extensions</module>
......
......@@ -12,7 +12,7 @@ The sample config should be in [__JSON__](http://www.json.org/) or [__YAML__](ht
#### Example sample config
###### yaml:
###### YAML:
``` yaml
output_dir: /home/user/myoutputdir
......@@ -24,7 +24,7 @@ samples:
R2: R2.fastq.gz
```
###### json:
###### JSON:
``` json
{
......
# TinyCap
## Introduction
``TinyCap`` is an analysis pipeline meant to process smallRNA captures. We use a fixed aligner in this pipeline: `bowtie` .
By default, we allow one fragment to align up to 5 different locations on the genome. In most of the cases, the shorter
the sequence, the less 'unique' the pattern is. Multiple **"best"** alignments is in these cases possible.
To avoid **'first-occurence found and align-to'** bias towards the reference genome, we allow the aligner
to report more alignment positions.
After alignment, `htseq-count` is responsible for the quantification of transcripts.
One should supply 2 annotation-files for this to happen:
- mirBase GFF3 file with all annotated and curated miRNA for the genome of interest. [visit mirBase](http://www.mirbase.org/ftp.shtml)
- Ensembl (Gene sets) in GTF format. [visit Ensembl](http://www.ensembl.org/info/data/ftp/index.html)
Count tables are generated per sample and and aggregation per (run)project is created in the top level folder of the project.
## Starting the pipeline
```bash
biopet pipelines tinycap [options] \
-config `<path-to>/settings_tinycap.json`
-config `<path-to>/sample_sheet.json` \
-l DEBUG \
-qsub \
-jobParaEnv BWA \
-run
```
## Example
Note that one should first create the appropriate [configs](../general/config.md).
The pipeline specific (minimum) config looks like:
```json
{
"output_dir": "<path-to>/outputdirectory",
"reference_name": "GRCh38",
"species": "H.sapiens",
"annotation_gff": "<path-to>/data/annotation/mirbase-21-hsa.gff3",
"annotation_refflat": "<path-to>/data/annotation/ucsc_ensembl_83_38.refFlat",
"annotation_gtf": "<path-to>/data/annotation/ucsc_ensembl_83_38.gtf"
}
```
### Advanced config:
One can specify other options such as: `bowtie` (alignment) options, clipping and trimming options `sickle` and `cutadapt`.
```json
"bowtie": {
"chunkmbs": 256, # this is a performance option, keep it high (256) as many alternative alignments are possible
"seedmms": 3,
"seedlen": 25,
"k": 5, # take and report best 5 alignments
"best": true # sort by best hit
},
"sickle": {
"lengthThreshold": 8 # minimum length to keep after trimming
},
"cutadapt": {
"error_rate": 0.2, # recommended: 0.2, allow more mismatches in adapter to be clipped of (ratio)
"minimum_length": 8, # minimum length to keep after clipping, setting lower will cause multiple alignments afterwards
"q": 30, # minimum quality over the read after the clipping in order to keep and report the read
"default_clip_mode": "both", # clip from: front/end/both (5'/3'/both). Depending on the protocol. Setting `both` makes clipping take more time, but is safest to do on short sequences such as smallRNA.
"times": 2 # in cases of chimera reads/adapters, how many times should cutadapt try to remove am adapter-sequence
}
```
The settings above is quite strict and aggressive on the clipping with `cutadapt`. By default the option `sensitiveAdapterSearch` is turned on in the TinyCap pipeline:
```json
"fastqc": {
"sensitiveAdapterSearch": true
}
```
This setting is turned on to enable aggressive adapter trimming. e.g. found (partial) adapters in `FastQC`
are all used in `Cutadapt`. Depending on the sequencing technique and sample preparation, e.g. short
sequences (76bp - 100bp). Turning of this option will still produce sensible results.
## Examine results
### Result files
- `counttables_smallrna.tinycap.tsv`
- `counttables_mirna.tinycap.tsv`
### Tested setups
The pipeline is designed and tested with sequences produced by: Illumina HiSeq 2000/2500, Illumina MiSeq. Both on single-end sequences.
Whenever a run is performed in Paired End mode, one should use the `R1` only. For analysis of (long) non-coding RNA, one should use `Gentrap`, this pipeline is optimized for Paired End RNA analysis.
Wetlab-Protocol: NEB SmallRNA kit and TruSeq SmallRNA kits were used for the data generated to test this pipeline.
## References
- [Cutadapt](https://github.com/marcelm/cutadapt)
- [HTSeqCount](http://www-huber.embl.de/HTSeq/doc/overview.html)
- [Bowtie1](http://bowtie-bio.sourceforge.net/index.shtml)
......@@ -17,6 +17,7 @@ pages:
- Mapping (Alignment): 'pipelines/mapping.md'
- Sage: 'pipelines/sage.md'
- Shiva (variantcalling): 'pipelines/shiva.md'
- TinyCap (smallRNA): 'pipelines/tinycap.md'
- Toucan (Annotation): 'pipelines/toucan.md'
- Tools:
- SamplesTsvToJson: 'tools/SamplesTsvToJson.md'
......
......@@ -6,9 +6,9 @@
package nl.lumc.sasc.biopet.pipelines.gatk
import nl.lumc.sasc.biopet.core.PipelineCommand
import nl.lumc.sasc.biopet.utils.config.Configurable
import nl.lumc.sasc.biopet.extensions.gatk.broad._
import nl.lumc.sasc.biopet.pipelines.shiva.{ ShivaVariantcallingTrait, ShivaTrait }
import nl.lumc.sasc.biopet.pipelines.shiva.ShivaTrait
import nl.lumc.sasc.biopet.utils.config.Configurable
import org.broadinstitute.gatk.queue.QScript
/**
......
......@@ -28,7 +28,7 @@
#if (sampleId.isDefined && libId.isDefined)
Here we show basic <a href="https://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29/Alignment">alignment</a> statistics for this run for sample ${sampleId} with library ${libId}. Total number of reads, number of alignments reads and number of duplicate reads are given, and the percentages thereof as a percentage of total.
#elseif(sampleId.isDefined && showPlot)
The following plot shows basic <a href="https://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29/Alignment">alignment</a> statistics for this run for sample ${sampleId}. Every library is represented by a multi-color bar. Red represents the total number of properly mapped reads for this sample. Green represents the total number of duplicates reads, which is usually caused by <a href="http://www.cureffi.org/2012/12/11/how-pcr-duplicates-arise-in-next-generation-sequencing/">PCR duplicates</a>. Blue denotes the number of unmapped reads, and purple denotes reads flagged <em>secondary</em> (this is dependent on the aligner used). A table showing similar statistics, including values represented as percent of total, can be downloaded as a tab-delimited file.
The following plot shows basic <a href="https://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29/Alignment">alignment</a> statistics for this run for sample ${sampleId}. Every library is represented by a multi-color bar. Red represents the total number of properly mapped reads for this sample. Green represents the total number of duplicates reads, which is usually caused by <a href="http://www.cureffi.org/2012/12/11/how-pcr-duplicates-arise-in-next-generation-sequencing/">PCR duplicates</a>. Blue denotes the number of unmapped reads, and purple denotes reads flagged <em>secondary</em> (this depends on the aligner used). A table showing similar statistics, including values represented as percent of total, can be downloaded as a tab-delimited file.
#elseif(sampleId.isDefined && !showPlot)
Here we show basic <a href="https://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29/Alignment">alignment</a> statistics for this run for every library of sample ${sampleId}. Total number of reads, number of alignments reads and number of duplicate reads are given, and the percentages thereof as a percentage of total.
#else
......@@ -46,16 +46,18 @@
</div>
<div class="panel-footer">
#if (showTable)
<button type="button" class="btn btn-info" data-toggle="collapse" data-target="#alignmentSummaryTable">Hide table</button>
<button type="button" class="btn btn-info" data-toggle="collapse" data-target="#alignmentSummaryTable">
<i class="glyphicon glyphicon-eye-close"></i> Hide table</button>
#else
<button type="button" class="btn btn-info" data-toggle="collapse" data-target="#alignmentSummaryTable">Show table</button>
<button type="button" class="btn btn-info" data-toggle="collapse" data-target="#alignmentSummaryTable">
<i class="glyphicon glyphicon-eye-open"></i> Show table</button>
#end
<i class="glyphicon glyphicon-file"></i> <a href="alignmentSummary.tsv">tsv file</a>
<a href="alignmentSummary.tsv"><button type="button" class="btn btn-info"><i class="glyphicon glyphicon-cloud-download"></i> TSV file</button></a>
</div>
#end
<div class="panel-body collapse #if (showTable)in#end" id="alignmentSummaryTable">
<!-- Table -->
<table class="table sortable-theme-bootstrap" data-sortable>
<table class="table">
<thead><tr>
<th data-sorted="true" data-sorted-direction="ascending">Sample</th>
#if (!sampleLevel) <th>Library</th> #end
......@@ -70,8 +72,8 @@
#{
val libs = (libId, sampleLevel) match {
case (_, true) => List("")
case (Some(libId), _) => List(libId.toString)
case _ => summary.libraries(sample).toList
case (Some(libId), _) => List(libId.toString).sorted
case _ => summary.libraries(sample).toList.sorted
}
}#
<tr><td rowspan="${libs.size}"><a href="${rootPath}Samples/${sample}/index.html">${sample}</a></td>
......
......@@ -30,12 +30,13 @@
<div class="col-md-6">
<p>
#if (sampleId.isDefined && libId.isDefined)
something
This plot shows the insert size distribution for all libraries combined in sample <b>${sampleId}</b>.
#elseif(sampleId.isDefined)
This plot shows the insert size distribution for the libraries of sample ${sampleId}. <a href="http://thegenomefactory.blogspot.nl/2013/08/paired-end-read-confusion-library.html">Insert size</a> denotes the size of the so-called <em>insert</em> between two read pairs in a paired-end sequencing run. This should correspond to the length of the sequence between the sequencing adaptors. The provided table shows mean and median insert size for each sample, together with the standard deviation.
This plot shows the insert size distribution for the libraries of sample <b>${sampleId}</b>.
#else
This plot shows the insert size distribution for each of the ${samples.size} samples. <a href="http://thegenomefactory.blogspot.nl/2013/08/paired-end-read-confusion-library.html">Insert size</a> denotes the size of the so-called <em>insert</em> between two read pairs in a paired-end sequencing run. This should correspond to the length of the sequence between the sequencing adaptors. The provided table shows mean and median insert size for each sample, together with the standard deviation.
This plot shows the insert size distribution for each of the <b>${samples.size}</b> samples.
#end
<a href="http://thegenomefactory.blogspot.nl/2013/08/paired-end-read-confusion-library.html">Insert size</a> denotes the size of the so-called <em>insert</em> between two read pairs in a paired-end sequencing run. This should correspond to the length of the sequence between the sequencing adaptors. The provided table shows mean and median insert size for each sample, together with the standard deviation.
</p>
</div>
</div>
......@@ -49,17 +50,20 @@
</div>
<div class="panel-footer">
#if (showTable)
<button type="button" class="btn btn-info" data-toggle="collapse" data-target="#insertsizeTable">Hide table</button>
<button type="button" class="btn btn-info" data-toggle="collapse" data-target="#insertsizeTable">
<i class="glyphicon glyphicon-eye-close"></i> Hide table</button>
#else
<button type="button" class="btn btn-info" data-toggle="collapse" data-target="#insertsizeTable">Show table</button>
<button type="button" class="btn btn-info" data-toggle="collapse" data-target="#insertsizeTable">
<i class="glyphicon glyphicon-eye-open"></i> Show table</button>
#end
<i class="glyphicon glyphicon-file"></i> <a href="insertsize.tsv">tsv file</a>
<button type="button" class="btn btn-info"><i class="glyphicon glyphicon-cloud-download"> <a href="insertsize.tsv">tsv file</a></i></button>
</div>
#end
<div class="panel-body collapse #if (showTable)in#end" id="insertsizeTable">
<!-- Table -->
<table class="table sortable-theme-bootstrap" data-sortable>
<table class="table">
<thead><tr>
<th data-sorted="true" data-sorted-direction="ascending">Sample</th>
#if (!sampleLevel) <th>Library</th> #end
......@@ -84,7 +88,7 @@
val prefixPath = List("samples", sample) ::: (if (libId.isEmpty) Nil else List("libraries", libId)) ::: List("bammetrics", "stats")
val fieldValues = for (field <- fields) yield {
summary.getValue((prefixPath ::: List("CollectInsertSizeMetrics", "metrics", field.toUpperCase)):_*).getOrElse(prefixPath ::: metricsTag :: Nil)
summary.getValue((prefixPath ::: List("CollectInsertSizeMetrics", "metrics", field.toUpperCase)):_*).getOrElse("N/A")
}
}#
#for (value <- fieldValues)
......
......@@ -29,7 +29,7 @@
<div class="col-md-1"></div>
<div class="col-md-6">
<p>
This Show the relative coverage for all transcripts. De data here is generated by picard CollectRnaMetrics
This shows the relative coverage for all transcripts using Picard CollectRnaMetrics
</p>
</div>
</div>
......@@ -39,7 +39,7 @@
#{ BammetricsReport.rnaHistogramPlot(outputDir, "rna", summary, !sampleLevel, sampleId = sampleId, libId = libId) }#
<div class="panel-body">
<img src="rna.png" class="img-responsive" />
<img src="rna.png" class="img-responsive" />
</div>
<div class="panel-footer">
#if (showTable)
......@@ -47,7 +47,7 @@
#else
<button type="button" class="btn btn-info" data-toggle="collapse" data-target="#rnaTable">Show table</button>
#end
<i class="glyphicon glyphicon-file"></i> <a href="rna.tsv">tsv file</a>
<a href="rna.tsv"><button type="button" class="btn btn-info"><i class="glyphicon glyphicon-cloud-download"></i> TSV file</button></a>
</div>
#end
......
......@@ -47,7 +47,7 @@
#else
<button type="button" class="btn btn-info" data-toggle="collapse" data-target="#wgsTable">Show table</button>
#end
<i class="glyphicon glyphicon-file"></i> <a href="wgs.tsv">tsv file</a>
<a href="wgs.tsv"><button type="button" class="btn btn-info"><i class="glyphicon glyphicon-cloud-download"></i> TSV file</button></a>
</div>
#end
......
......@@ -419,7 +419,7 @@ object BammetricsReport extends ReportBuilder {
val plot = new LinePlot(null)
plot.input = tsvFile
plot.output = pngFile
plot.xlabel = Some("Reletive position")
plot.xlabel = Some("Relative position")
plot.ylabel = Some("Coverage")
plot.width = Some(1200)
plot.removeZero = true
......
......@@ -18,13 +18,10 @@ package nl.lumc.sasc.biopet.pipelines.bammetrics
import java.io.{ File, FileOutputStream }
import com.google.common.io.Files
import nl.lumc.sasc.biopet.utils.config.Config
import nl.lumc.sasc.biopet.extensions.bedtools.{ BedtoolsCoverage, BedtoolsIntersect }
import nl.lumc.sasc.biopet.extensions.picard._
import nl.lumc.sasc.biopet.extensions.samtools.SamtoolsFlagstat
import nl.lumc.sasc.biopet.pipelines.bammetrics.scripts.CoverageStats
import nl.lumc.sasc.biopet.extensions.tools.BiopetFlagstat
import nl.lumc.sasc.biopet.utils.ConfigUtils
import nl.lumc.sasc.biopet.utils.config.Config
import org.apache.commons.io.FileUtils
import org.broadinstitute.gatk.queue.QSettings
import org.scalatest.Matchers
......
......@@ -6,7 +6,7 @@
<table class="table">
<thead><tr><th>Libraries</th></tr></thead>
<tbody>
#for (lib <- summary.libraries(sampleId.get))
#for (lib <- summary.libraries(sampleId.get).toList.sorted)
<tr><td><a href="${rootPath}Samples/${sampleId}/Libraries/${lib}/index.html">${lib}</a></td></tr>
#end
</tbody>
......
......@@ -12,22 +12,37 @@
val buffer: StringBuffer = new StringBuffer()
if (page.subPages.nonEmpty){
buffer.append("<ul class=\"dropdown-menu\">")
buffer.append("<ul class=\"dropdown-menu list-group\">")
}
for (subPage <- page.subPages) {
for (subPage <- page.subPages.sortBy(_._1)) {
val href: String = {
if (path.isEmpty) rootPath + subPage._1 + "/index.html"
else rootPath + path.mkString("","/","/") + subPage._1 + "/index.html"
}
buffer.append("<li")
if(subPage._2.subPages.nonEmpty) buffer.append(" class=\"dropdown-submenu\"")
buffer.append("><a href=\"" + href + "\"")
if (first) buffer.append(" tabindex=\"-1\"")
buffer.append(">" + subPage._1 + "</a>")
buffer.append(createMenu(subPage._2, path ::: subPage._1 :: Nil, first = false))
buffer.append("</li>")
// buffer.append("<li")
// if(subPage._2.subPages.nonEmpty) buffer.append(" class=\"dropdown-submenu list-group-item\"")
// buffer.append("><span class=\"badge\">%d</span><a href=\"%s\"".format(subPage._2.subPages.size, href))
// if (first) buffer.append(" tabindex=\"-1\"")
// buffer.append(">%s</a>".format(subPage._1))
// buffer.append(createMenu(subPage._2, path ::: subPage._1 :: Nil, first = false))
// buffer.append("</li>")
val listSubmenu = if(subPage._2.subPages.nonEmpty) "dropdown-submenu" else ""
// val subMenuBadgeCount = if(subPage._2.subPages.nonEmpty && first) "<span class='badge'>%d</span>".format(subPage._2.subPages.size) else ""
val tabIndex = if (first) " tabindex='-1'" else ""
// val listGroupA = if(subPage._2.subPages.nonEmpty) "list-group-item" else ""
var menuItem: String = "<li class='%s'>".format(listSubmenu) +
"<a href='%s' class='%s'%s>".format(href, "", tabIndex) +
"%s".format(subPage._1) +
"</a>" +
createMenu(subPage._2, path ::: subPage._1 :: Nil, first = false) +
"</li>"
buffer.append(menuItem)
}
if(page.subPages.nonEmpty) {
buffer.append("</ul>\n")
......@@ -74,7 +89,7 @@
Sortable.init()
$('body').scrollspy({
target: '.bs-sidebar',
target: '.bs-sidebar'
});
});
......@@ -113,14 +128,14 @@
<a href="${rootPath}index.html">Home
#if (indexPage.subPages.nonEmpty) <b class="caret"></b> #end
</a>
${unescape(createMenu(indexPage))}
${unescape(createMenu(indexPage, Nil, false))}
</li>
#else
<li class="root #if (t == path.size) active #end">
<a href="${rootPath}${path.slice(0,t).mkString("", "/", "/")}index.html">${path( t - 1 )}
#if (getSubPage(path.slice(0, t)).subPages.nonEmpty) <b class="caret"></b> #end
</a>
${unescape(createMenu(getSubPage(path.slice(0, t)), path.slice(0, t)))}
${unescape(createMenu(getSubPage(path.slice(0, t)), path.slice(0, t), false))}
</li>
#end
#end
......
......@@ -5,7 +5,7 @@
<%@ var pipeline: String %>
#{
val contigs = summary.getValue(pipeline, "settings", "reference", "contigs").get.asInstanceOf[Map[String, Map[String, Any]]]
val contigs = summary.getValue(pipeline, "settings", "reference", "contigs").getOrElse(Map.empty).asInstanceOf[Map[String, Map[String, Any]]]
}#
<table class="table">
......
......@@ -5,7 +5,7 @@
<table class="table sortable-theme-bootstrap" data-sortable>
<thead><tr><th data-sorted="true" data-sorted-direction="ascending">Sample</th></tr></thead>
<tbody>
#for (sample <- summary.samples)
#for (sample <- summary.samples.toList.sorted)
<tr><td><a href="${rootPath}Samples/${sample}/index.html">${sample}</a></td></tr>
#end
</tbody>
......
......@@ -16,6 +16,14 @@ trait AnnotationGtf extends BiopetQScript { qscript: QScript =>
file
}
}
trait AnnotationGff extends BiopetQScript { qscript: QScript =>
/** GFF reference file in GFF3 format */
lazy val annotationGff: File = {
val file: File = config("annotation_gff", freeVar = true)
inputFiles :+ InputFile(file, config("annotation_gff_md5", freeVar = true))
file
}
}
trait AnnotationRefFlat extends BiopetQScript { qscript: QScript =>
/** GTF reference file */
......
......@@ -36,7 +36,7 @@ trait MultisampleReportBuilder extends ReportBuilder {
def libraryPage(sampleId: String, libraryId: String, args: Map[String, Any]): ReportPage
/** Default list of libraries, can be override */
def libririesSections: List[(String, ReportSection)] = {
def librariesSections: List[(String, ReportSection)] = {
List(
("Libraries", ReportSection("/nl/lumc/sasc/biopet/core/report/librariesList.ssp"))
)
......@@ -60,6 +60,6 @@ trait MultisampleReportBuilder extends ReportBuilder {
val libPages = summary.libraries(sampleId)
.map(libId => libId -> libraryPage(sampleId, libId, args ++ Map("libId" -> Some(libId))))
.toList
ReportPage(libPages, libririesSections, args)
ReportPage(libPages, librariesSections, args)
}
}
......@@ -17,9 +17,9 @@ package nl.lumc.sasc.biopet.extensions
import java.io.File
import nl.lumc.sasc.biopet.core.{ Version, BiopetCommandLineFunction }
import nl.lumc.sasc.biopet.utils.config.Configurable
import nl.lumc.sasc.biopet.core.summary.Summarizable
import nl.lumc.sasc.biopet.core.{ BiopetCommandLineFunction, Version }
import nl.lumc.sasc.biopet.utils.config.Configurable
import org.broadinstitute.gatk.utils.commandline.{ Input, Output }
import scala.collection.mutable
......@@ -165,27 +165,65 @@ class Cutadapt(val root: Configurable) extends BiopetCommandLineFunction with Su
/** Output summary stats */
def summaryStats: Map[String, Any] = {
val trimR = """.*Trimmed reads: *(\d*) .*""".r
val tooShortR = """.*Too short reads: *(\d*) .*""".r
val tooLongR = """.*Too long reads: *(\d*) .*""".r
val adapterR = """Adapter '([C|T|A|G]*)'.*trimmed (\d*) times.""".r
val stats: mutable.Map[String, Int] = mutable.Map("trimmed" -> 0, "tooshort" -> 0, "toolong" -> 0)
val adapter_stats: mutable.Map[String, Int] = mutable.Map()
if (statsOutput.exists) for (line <- Source.fromFile(statsOutput).getLines()) {
line match {
case trimR(m) => stats += ("trimmed" -> m.toInt)
case tooShortR(m) => stats += ("tooshort" -> m.toInt)
case tooLongR(m) => stats += ("toolong" -> m.toInt)
case adapterR(adapter, count) => adapter_stats += (adapter -> count.toInt)
case _ =>
/**
* The following regex is specific for Cutadapt 1.7+
*/
val processedReads = """Total reads processed: *([,\d]+).*""".r
val withAdapters = """.* with adapters: *([,\d]+) .*""".r
val readsPassingFilters = """.* written \(passing filters\): *([,\d]+) .*""".r
val tooShortR = """.* that were too short: *([,\d]+) .*""".r
val tooLongR = """.* that were too long: *([,\d]+) .*""".r
val tooManyN = """.* with too many N: *([,\d]+) .*""".r
val adapterR = """Sequence ([C|T|A|G]*);.*Trimmed: ([,\d]+) times.""".r
val basePairsProcessed = """Total basepairs processed: *([,\d]+) bp""".r
val basePairsWritten = """Total written \(filtered\): *([,\d]+) bp .*""".r
val stats: mutable.Map[String, Long] = mutable.Map(
"processed" -> 0,
"withadapters" -> 0,
"passingfilters" -> 0,
"tooshort" -> 0,
"toolong" -> 0,
"bpinput" -> 0,
"bpoutput" -> 0,
"toomanyn" -> 0
)
val adapter_stats: mutable.Map[String, Long] = mutable.Map()
if (statsOutput.exists) {
val statsFile = Source.fromFile(statsOutput)
for (line <- statsFile.getLines()) {
line match {
case processedReads(m) => stats("processed") = m.replaceAll(",", "").toLong
case withAdapters(m) => stats("withadapters") = m.replaceAll(",", "").toLong
case readsPassingFilters(m) => stats("passingfilters") = m.replaceAll(",", "").toLong
case tooShortR(m) => stats("tooshort") = m.replaceAll(",", "").toLong
case tooLongR(m) => stats("toolong") = m.replaceAll(",", "").toLong
case tooManyN(m) => stats("toomanyn") = m.replaceAll(",", "").toLong
case basePairsProcessed(m) => stats("bpinput") = m.replaceAll(",", "").toLong
case basePairsWritten(m) => stats("bpoutput") = m.replaceAll(",", "").toLong
case adapterR(adapter, count) => adapter_stats += (adapter -> count.toLong)
case _ =>
}
}
}
Map("num_reads_affected" -> stats("trimmed"),
val cleanReads = stats("processed") - stats("withadapters")
val trimmed = stats("passingfilters") - cleanReads
Map("num_reads_affected" -> trimmed,
"num_reads_input" -> stats("processed"),
"num_reads_with_adapters" -> stats("withadapters"),
"num_reads_output" -> stats("passingfilters"),
"num_reads_discarded_too_short" -> stats("tooshort"),
"num_reads_discarded_too_long" -> stats("toolong"),
"num_reads_discarded_many_n" -> stats("toomanyn"),
"num_bases_input" -> stats("bpinput"),
"num_based_output" -> stats("bpoutput"),
adaptersStatsName -> adapter_stats.toMap
)
}
......
......@@ -64,6 +64,11 @@
<artifactId>Gentrap</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>nl.lumc.sasc</groupId>
<artifactId>TinyCap</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>nl.lumc.sasc</groupId>
<artifactId>Sage</artifactId>
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment