v4.0.0 · Tags · biowdl / tasks

v4.0.0

d1e2d6e5 · set version in changelog to stable version · Aug 05, 2020

version 4.0.0
---------------------------
+ Picard MergeVcf now uses compression level 1 by default.
+ bwa mem, bwa mem+kit and hisat2 have their samtools sort threads tweaked. The
number of threads is now related to the number of threads on the aligner.
Using more threads reduces the chance of the samtools sort pipe getting
blocked if it's full.
+ Renamed a few inputs in centrifuge.wdl, isoseq3.wdl, talon.wdl,
transcriptclean.wdl to be more descriptive.
+ Renamed outputs of tasks used in the TALON-WDL, PacBio-subreads-processing &
sequence-classification pipelines.
+ Reworked bcf2vcf task into bcftools view task.
+ Removed the redundant format flag from the htseq interface. This is
autodetected in newer versions of htseq.
+ Update docker images for samtools, bcftools, picard, GATK, cutadapt, htseq
and chunked-scatter.
+ Default docker images for bwa, bwakit and hisat2 updated to include samtools
1.10.
+ Alignment tasks (STAR, Hisat2, BWA) now produce BAM files at level 1
compression.
+ Hisat2 task has added controls for samtools.
+ Alignment tasks no longer produce BAM indexes as these are not needed
by the markduplicates step.
+ Picard Markduplicates now uses 7G of RAM just like in GATK's best practice
example pipeline.
+ Picard SortSam added as a task.
+ Md5 files are no longer created by default on Picard tasks that generate
BAM files.
+ Changed PicardMarkduplicates to use COMPRESSION_LEVEL=1 by default with
the htsjdk deflater.
This makes the task finish in 32% less time at the cost of a 8% larger BAM
file.
+ Added sambamba markdup and sambamba sort. NOTE: samtools sort is more
efficient and is recommended.
+ Correctly represent samtools inconsistent use of the threads flag.
Sometimes it means 'threads' sometimes it means 'additional threads'.
BioWDL tasks now use only threads. The `threads - 1` conversion is
applied where necessary for samtools tools that use additional threads.
+ Updated BWA MEM and BWA KIT tasks to use samtools sort version 1.10 for
sorting the BAM file.
+ Updated memory requirements on bcftools Stats, bwa mem, bwakit, GATK
ApplyBQSR, GATK BaseRecalibrator, GATK GatherBqsrReports, Gatk
HaplotypeCaller, Picard CollectMultipleMetrics, Picard GatherBamFiles,
samtools Flagstat, samtools sort and bcftools stats.
+ TALON: Update `FilterTalonTranscripts` to new version, which removes the
pairingsFile and replaces this with datasetsFile.
+ TALON: Add `GetSpliceJunctions` & `LabelReads` tasks.
+ TALON: Update to version 5.0.
+ Add tasks for pbmm2, the PacBio wrapper for minimap2.
+ Update the image for chunked-scatter and make use of new features from 0.2.0.
+ Tuned resource requirements for GATK VariantEval, MultiQC, Picard metrics and
STAR.
+ Added a new task for [scatter-regions](https://github.com/biowdl/chunked-scatter)
that replaces biopet-scatterregions.
+ The FastQC task now talks to the Java directly instead of using the included
Perl wrapper for FastQC. This has the advantage that memory and threads can
be set independently. A rather high maximum heap size of 1750MB (Xmx1750M)
was set, as OOM errors occurred frequently on some fastqs.
+ STAR: Add options regarding alignment score (regarding read length as well)
for tweaking when processing rRNA depleted samples.
+ TALON: Update `minimumIdentity` to correct type (float, was integer)
& set new default according to developers (0.8, was 0).
+ Added GATK VariantEval task.
+ Added a log output for STAR.
+ Added report output to Hisat2.
+ Added output with all reports to gffcompare.
+ Change MultiQC inputs. It now accepts an array of reports files. It does not
need access to a folder with the reports anymore. MultiQC can now be used
as a normal WDL task without hacks.
+ Picard: Make all outputs in `CollectMultipleMetrics` optional. This will make sure the
task will not fail if one of the metrics is set to false.
+ The struct `BowtieIndex` was removed, as it has become obsolete.
+ The task `ReorderGlobbedScatters` was removed, as it has become obsolete.
+ Adjusted the memory settings of many tools, especially java tools.
They should now more accurately represent actual memory usage (as
opposed to virtual memory).
+ Added `-XX:ParallelGCThreads=1` to the java options of java tasks.
+ Added `timeMinutes` input to many tasks, this indicates a maximum
number of minutes that the job will run. The associated runtime
attribute is `time_minutes` which can be used to inform
a scheduler (eg. slurm) of the run time of the job.
+ Added STAR GenomeGenerate task.
+ GATK.HaplotypeCaller: Add `--dont-use-soft-clipped-bases` and
`--standard-min-confidence-threshold-for-calling` options. These are
required for RNA seq variant calling according to GATK best practices.
+ Samtools: Fix quotations in sort command.
+ Samtools SortByName is now called Sort.
+ Generalize sort task to now also sort by position, instead of just read name.
+ Add CreateSequenceDictionary task to picard.
+ Add faidx task to samtools.
+ Isoseq3: Remove dirname command from output folder creation step.
+ Isoseq3: Requires more memory by default, is now 2G.
+ Isoseq3: Remove cp commands and other bash magic, file naming is now solved by pipeline.
+ Lima: Replace mv command with cp.
+ Add WDL task for smoove (lumpy) sv-caller.