Commits · 51781bb6f6551500f8d7c5b08e74199d3b0afacf · Klinische Genetica / capture-lumc / hutspot

Mar 10, 2021
- Switch to chunked-scatter instead of biopet · bb0f1582
  van den Berg authored 4 years ago
  
  bb0f1582
Mar 02, 2021
- Add option to generate a true multisample VCF file · d5f5d0a1
  van den Berg authored 4 years ago
  
  d5f5d0a1
Jan 07, 2021
- Add option to create a merged multi sample VCF · 7bc45284
  van den Berg authored 4 years ago
  
  7bc45284
Dec 10, 2020

van den Berg authored 4 years ago

MultiQC can run out of /tmp space on execution nodes on shark, leading
to incorrect results or crashes. This commit fixes this problem by
specifying a default location for the python tempfiles that MultiQC uses
via the environment variable TMPFILE inside of the shell block for the
MultiQC rule.

b1baa6de

Aug 28, 2020

Do not write MarkDuplicates PG tag for every read · 5d4fca8a

van den Berg authored 4 years ago

This is supposed to speed up MarkDuplicates by up to 16%, see
https://github.com/broadinstitute/picard/issues/902 for details.

5d4fca8a

Aug 27, 2020
- Samtools does not utilise the additional threads · 86d3356e
  van den Berg authored 4 years ago
  
  86d3356e
- Revert "Switch to bwa version 2" · 5e202f55
  van den Berg authored 4 years ago
  
  This reverts commit 0cf92f4b. bwa-mem2 can crash when multiple processes are running on the same system. See https://github.com/bwa-mem2/bwa-mem2/issues/88 for details.
  5e202f55
Aug 25, 2020
- Switch to bwa version 2 · 0cf92f4b
  van den Berg authored 4 years ago
  
  0cf92f4b
- Scripts are relative to the Snakefile · 7dce875c
  van den Berg authored 4 years ago
  
  7dce875c
- No longer hide script paths in config · ea8629eb
  van den Berg authored 4 years ago
  
  ea8629eb
- Add localrules if rules take less then 5 seconds · ae68936e
  van den Berg authored 4 years ago
  
  ae68936e
Aug 24, 2020
- Only run FastQC on the trimmed file · 8a6c6da6
  van den Berg authored 4 years ago
  
  8a6c6da6
- Remove custom statistics in favor of picard · c6cadb60
  van den Berg authored 4 years ago
  
  c6cadb60
- Use samtools to sort bamfile · d563632b
  van den Berg authored 4 years ago
  
  d563632b
Aug 21, 2020
- Remove python imports from the main snakemake file · ecc12cc2
  van den Berg authored 4 years ago
  
  ecc12cc2
- Make formatting more consistent · 5f215391
  van den Berg authored 4 years ago
  
  - Start every entry on a new indented line - Make the order of the entries consistent in every rule - Ensure equal signs are surrounded by spaces
  5f215391
- Move final functions to separate file · e1e83b13
  van den Berg authored 4 years ago
  
  e1e83b13
- Clean up input for multiqc · 9ab0debe
  van den Berg authored 4 years ago
  
  9ab0debe
- Move cutadapt summary files to separate file · 88c13809
  van den Berg authored 4 years ago
  
  88c13809
- Move gathering after scatter to separate file · 79de4b37
  van den Berg authored 4 years ago
  
  79de4b37
- Add test for input multiple readgroups · 4f5695d8
  van den Berg authored 4 years ago
  
  Add test to make sure the markdup and baserecal groups receive the correct inputs when a sample has multiple readgroups.
  4f5695d8
- Move markdup inputs to separate file · 1ce11c64
  van den Berg authored 4 years ago
  
  1ce11c64
- Clean up input for rule 'all' · f16a49c1
  van den Berg authored 4 years ago
  
  f16a49c1
- Begin moving python code into separate file · 383b838f
  van den Berg authored 4 years ago
  
  383b838f
- Use python f-string for formatting · 0dff1b37
  van den Berg authored 4 years ago
  
  0dff1b37
- Remove explicit path from params for linting · aa37905b
  van den Berg authored 4 years ago
  
  aa37905b
- More hiding for the linter · f1f323f5
  van den Berg authored 4 years ago
  
  f1f323f5
- Hide the absolute path for GATK for linting · b1780fe9
  van den Berg authored 4 years ago
  
  b1780fe9
- Add separate logfiles for different commands · a948a75c
  van den Berg authored 4 years ago
  
  a948a75c
- Add log to every rule, this allows for linting · 515c9743
  van den Berg authored 4 years ago
  
  515c9743
Aug 20, 2020
- Add resources to tasks · 665d6d1f
  van den Berg authored 4 years ago
  
  665d6d1f
Aug 12, 2020

Remove directory output from fastqc · 89da42b6

van den Berg authored 4 years ago

The directory output for the fastqc tasks is causing issues on the
shared file system of the cluster, since it cannot properly determine
the age of the folder. As a result, it re-runs the fastqc tasks every
time a workflow is restarted, regardless of whether the task has already
completed.

To prevent this, a single dummy output file '.done' has been added to
the fastqc tasks which will be written when fastqc exits successfully.

89da42b6

Aug 07, 2020

Add an option to restrict BaseRecalibration · 095305f0

van den Berg authored 4 years ago

The base recalibration step of the pipeline can take up to 7 hours for
WGS samples, which is a significant part of the total run time.

The developers of GATK state that BQSR requires at least 100M bases per
read group: "We usually expect to see more than 100M bases per read
group; as a rule of thumb, larger numbers will work better."

A human WGS sample with an average read depth of 43x has almost 1300
times that amount of bases. The analysis of these samples would be sped
up greatly by restricting BQSR to a single chromosome.

095305f0

Speed up cutadapt · b585ff80

van den Berg authored 4 years ago

Use 8 cores instead of just 1, according to the documentation speed
should almost scale linear with the cores provided.

Also reduce the compression level on the output file, since most time
for cutadapt is spent re-compressing the data after trimming the
adapters.

b585ff80

Jul 29, 2020
- Mark FastQC input for MultiQC as directory · 3481d929
  van den Berg authored 4 years ago
  
  3481d929
Jul 28, 2020
- Add memory settings for hs_metrics and multiple_metrics · aebad7df
  van den Berg authored 4 years ago
  
  aebad7df
Jul 24, 2020
- Revert "Remove explicit tmp folder" · 17aa79ad
  van den Berg authored 4 years ago
  
  This reverts commit 1b7d807f
  17aa79ad
- Increase memory for bed_to_interval · a4e1aadb
  van den Berg authored 4 years ago
  
  a4e1aadb
Jul 22, 2020

Remove explicit tmp folder · 1b7d807f

van den Berg authored 4 years ago

This should no longer be needed on the slurm cluster, where each task
can request the amount of tmp space it requires explicitly.

Removing the tmp from the shared filesystem back onto the host running
the analysis should also improve the performance of the pipeline.

1b7d807f

Update to temporary version of gvcf2coverage · d6bb9229
van den Berg authored 4 years ago

d6bb9229