Commits · e1e83b1313c6e6727d913a5b6833e49a18d66653 · Klinische Genetica / capture-lumc / hutspot

Aug 21, 2020
- Move final functions to separate file · e1e83b13
  van den Berg authored 4 years ago
  
  e1e83b13
- Clean up input for multiqc · 9ab0debe
  van den Berg authored 4 years ago
  
  9ab0debe
- Move cutadapt summary files to separate file · 88c13809
  van den Berg authored 4 years ago
  
  88c13809
- Move gathering after scatter to separate file · 79de4b37
  van den Berg authored 4 years ago
  
  79de4b37
- Add test for input multiple readgroups · 4f5695d8
  van den Berg authored 4 years ago
  
  Add test to make sure the markdup and baserecal groups receive the correct inputs when a sample has multiple readgroups.
  4f5695d8
- Move markdup inputs to separate file · 1ce11c64
  van den Berg authored 4 years ago
  
  1ce11c64
- Clean up input for rule 'all' · f16a49c1
  van den Berg authored 4 years ago
  
  f16a49c1
- Begin moving python code into separate file · 383b838f
  van den Berg authored 4 years ago
  
  383b838f
- Add gatk_jar to schema · 2e1aac68
  van den Berg authored 4 years ago
  
  2e1aac68
- Use python f-string for formatting · 0dff1b37
  van den Berg authored 4 years ago
  
  0dff1b37
- Remove explicit path from params for linting · aa37905b
  van den Berg authored 4 years ago
  
  aa37905b
- More hiding for the linter · f1f323f5
  van den Berg authored 4 years ago
  
  f1f323f5
- Hide the absolute path for GATK for linting · b1780fe9
  van den Berg authored 4 years ago
  
  b1780fe9
- Add separate logfiles for different commands · a948a75c
  van den Berg authored 4 years ago
  
  a948a75c
- Add log to every rule, this allows for linting · 515c9743
  van den Berg authored 4 years ago
  
  515c9743
- Update snakemake version · 50989904
  van den Berg authored 4 years ago
  
  50989904
Aug 12, 2020

Remove directory output from fastqc · 89da42b6

van den Berg authored 4 years ago

The directory output for the fastqc tasks is causing issues on the
shared file system of the cluster, since it cannot properly determine
the age of the folder. As a result, it re-runs the fastqc tasks every
time a workflow is restarted, regardless of whether the task has already
completed.

To prevent this, a single dummy output file '.done' has been added to
the fastqc tasks which will be written when fastqc exits successfully.

89da42b6

Aug 07, 2020

Add an option to restrict BaseRecalibration · e33b4498

van den Berg authored 4 years ago

The base recalibration (BQSR) step of the pipeline can take up to 7 hours for
WGS samples, which is a significant part of the total run time.

The developers of GATK state that BQSR requires at least 100M bases per read
group: "We usually expect to see more than 100M bases per read group; as a rule
of thumb, larger numbers will work better."

A human WGS sample with an average read depth of 43x has almost 1300 times that
amount of bases. The analysis of these samples would be sped up greatly by
restricting BQSR to a single chromosome.

e33b4498

Add an option to restrict BaseRecalibration · 095305f0

van den Berg authored 4 years ago

The base recalibration step of the pipeline can take up to 7 hours for
WGS samples, which is a significant part of the total run time.

The developers of GATK state that BQSR requires at least 100M bases per
read group: "We usually expect to see more than 100M bases per read
group; as a rule of thumb, larger numbers will work better."

A human WGS sample with an average read depth of 43x has almost 1300
times that amount of bases. The analysis of these samples would be sped
up greatly by restricting BQSR to a single chromosome.

095305f0

Speed up cutadapt · b585ff80

van den Berg authored 4 years ago

Use 8 cores instead of just 1, according to the documentation speed
should almost scale linear with the cores provided.

Also reduce the compression level on the output file, since most time
for cutadapt is spent re-compressing the data after trimming the
adapters.

b585ff80

Jul 29, 2020
- Increase runtime for tests to 30 minutes · 1d79ec4c
  van den Berg authored 4 years ago
  
  1d79ec4c
- Mark FastQC input for MultiQC as directory · 3481d929
  van den Berg authored 4 years ago
  
  3481d929
Jul 28, 2020
- Merge branch 'devel' of https://git.lumc.nl/klinische-genetica/capture-lumc/hutspot into devel · 81e3bcf2
  van den Berg authored 4 years ago
  
  81e3bcf2
- Add memory settings for hs_metrics and multiple_metrics · aebad7df
  van den Berg authored 4 years ago
  
  aebad7df
- Make sure female-threshold is a float · e2c4274e
  van den Berg authored 4 years ago
  
  e2c4274e
- Merge branch 'devel' of git.lumc.nl:klinische-genetica/capture-lumc/hutspot into devel · 6ec83a6b
  van den Berg authored 4 years ago
  
  6ec83a6b
- Add slurm cluster configuration · 0c208c5d
  van den Berg authored 4 years ago
  
  0c208c5d
Jul 24, 2020
- Increase memory for bed_to_interval · 93bdedbe
  van den Berg authored 4 years ago
  
  93bdedbe
- Merge branch 'revert-1b7d807f ' into 'devel' · 838ba013
  van den Berg authored 4 years ago
  
  Revert "Remove explicit tmp folder" See merge request !17
  838ba013
- Revert "Remove explicit tmp folder" · 17aa79ad
  van den Berg authored 4 years ago
  
  This reverts commit 1b7d807f
  17aa79ad
- Increase memory for bed_to_interval · a4e1aadb
  van den Berg authored 4 years ago
  
  a4e1aadb
Jul 22, 2020
- Remove explicit tmp folder · 1b7d807f
  van den Berg authored 4 years ago
  
  This should no longer be needed on the slurm cluster, where each task can request the amount of tmp space it requires explicitly. Removing the tmp from the shared filesystem back onto the host running the analysis should also improve the performance of the pipeline.
  1b7d807f
- Merge branch 'bed-coverage' into devel · d56d6bb2
  van den Berg authored 4 years ago
  
  d56d6bb2
- Update to temporary version of gvcf2coverage · d6bb9229
  van den Berg authored 4 years ago
  
  d6bb9229
- Update mulled container · eb70cc38
  van den Berg authored 4 years ago
  
  This way, the same version of picard is used in all tasks in the pipeline. This fixes issue #38
  eb70cc38
- Add new container for gvcf2coverage · 3d3905ed
  van den Berg authored 4 years ago
  
  3d3905ed
Jun 26, 2020
- Merge branch 'bed-coverage' into 'devel' · 09661984
  van den Berg authored 4 years ago
  
  Add optional bed coverage output files See merge request !16
  09661984
- Update slurm cluster status · dfb9c88f
  van den Berg authored 4 years ago
  
  dfb9c88f
Jun 25, 2020
- Update tags for tests · 6ae25118
  van den Berg authored 4 years ago
  
  6ae25118
- Fix dry run tests for coverage bed · a63cb591
  van den Berg authored 4 years ago
  
  a63cb591