1. 21 Aug, 2020 10 commits
  2. 12 Aug, 2020 1 commit
    • van den Berg's avatar
      Remove directory output from fastqc · 89da42b6
      van den Berg authored
      The directory output for the fastqc tasks is causing issues on the
      shared file system of the cluster, since it cannot properly determine
      the age of the folder. As a result, it re-runs the fastqc tasks every
      time a workflow is restarted, regardless of whether the task has already
      completed.
      
      To prevent this, a single dummy output file '.done' has been added to
      the fastqc tasks which will be written when fastqc exits successfully.
      89da42b6
  3. 07 Aug, 2020 2 commits
    • van den Berg's avatar
      Add an option to restrict BaseRecalibration · 095305f0
      van den Berg authored
      The base recalibration step of the pipeline can take up to 7 hours for
      WGS samples, which is a significant part of the total run time.
      
      The developers of GATK state that BQSR requires at least 100M bases per
      read group: "We usually expect to see more than 100M bases per read
      group; as a rule of thumb, larger numbers will work better."
      
      A human WGS sample with an average read depth of 43x has almost 1300
      times that amount of bases. The analysis of these samples would be sped
      up greatly by restricting BQSR to a single chromosome.
      095305f0
    • van den Berg's avatar
      Speed up cutadapt · b585ff80
      van den Berg authored
      Use 8 cores instead of just 1, according to the documentation speed
      should almost scale linear with the cores provided.
      
      Also reduce the compression level on the output file, since most time
      for cutadapt is spent re-compressing the data after trimming the
      adapters.
      b585ff80
  4. 29 Jul, 2020 1 commit
  5. 28 Jul, 2020 1 commit
  6. 24 Jul, 2020 2 commits
  7. 22 Jul, 2020 4 commits
  8. 24 Jun, 2020 3 commits
  9. 23 Jun, 2020 1 commit
    • van den Berg's avatar
      Add picard DuplicationMetrics to stats.json · 50e50edd
      van den Berg authored
      Unfortunately, this adds a limitation on the sample names that can be
      used with Hutspot, since the naming of the samples in the multiQC parsed
      output of picard MarkDuplicates is partly ambiguous.
      
      This limitation has been added to the readme, and a check has been added
      to the pipeline snakefile to throw an error when overlapping sample
      names are detected.
      50e50edd
  10. 17 Jun, 2020 1 commit
  11. 03 Jun, 2020 1 commit
    • van den Berg's avatar
      Rename bedfile to targetsfile · d82c4932
      van den Berg authored
      Hutspot now supports two bed files to calculate coverage. One is the
      `targetsfile`, which was called `bedfile` before, and holds the targets
      of the capture kit. The other one is `baitsfile`, which holds the bait
      locations of the capture kit.
      
      It is possible to specify only the `targetsfile`, but if you specify the
      `baitsfile`, the `targetsfile` must be specified as well, since both are
      required by picard HsMetrics.
      
      Also added a test for invalid configuration files, and shortened the
      jsonschema validation error to only show the human readable message.
      d82c4932
  12. 02 Jun, 2020 7 commits
  13. 29 May, 2020 5 commits
    • van den Berg's avatar
      Make formatting more consistent · d91fb0ff
      van den Berg authored
       - Use spaces around equal signs
       - Limit lines to 79 characters or less
       - Use 'container' instead of 'singularity'
      d91fb0ff
    • van den Berg's avatar
      Switch to using rule-based dependencies · e16cb6ec
      van den Berg authored
      Since Snakemake 2.4.8, it is possible to using rule based dependencies
      instead of relying on file pattern matching. This reduces clutter, makes
      it clearer which rules depend on one another, and make it easier to move
      output files around.
      e16cb6ec
    • van den Berg's avatar
      Rename final bamfile to {sample}.bam · b83bb9c0
      van den Berg authored
      When using multiqc, the sample name for the picard statistics are
      determined by simply removing the '.bam' part of the filename. This
      would lead all sample names to be set to {sample}.markdup, which makes
      later parsing of the statistics difficult.
      b83bb9c0
    • van den Berg's avatar
      Add picard multiple metrics · 33faba14
      van den Berg authored
       - Update picard to 2.22 since earlier versions have bugs with metrics
       - Add tests for new functionality
       - Add explicit inputs to multiqc
      33faba14
    • van den Berg's avatar
      Enable json output for multiqc · 067315ba
      van den Berg authored
      067315ba
  14. 30 Apr, 2020 1 commit