1. 21 Apr, 2020 8 commits
  2. 20 Apr, 2020 5 commits
  3. 18 Apr, 2020 2 commits
    • van den Berg's avatar
      Add summary output for cutadapt · b8934d0a
      van den Berg authored
      The cutadapt summary file contains statistics we want to report, such as
      the number of reads and bases before and after trimming. This way, we do
      not need to compute these statistics after the fact, which would require
      parsing the large fastq files an additional time.
      b8934d0a
    • van den Berg's avatar
      No longer merge fastq files · 8670b32a
      van den Berg authored
      Instead of merging fastq files as the first step of the pipeline, merge
      as late as possible to make better use of parallelism, and to prevent
      unnecessary reading/writing of all data. Currently, reads are trimmed
      and mapped per read group, and are merge in the picard MarkDuplicates
      step. Therefore, samples are merged as a side effect of another task
      that was performed as well.
      
      Additionally, fastq processing is now done in a single step using
      cutadapt, instead of using both sickle and cutadapt sequentially.
      
      As part of these changes, the following changes were made:
       - Use cutadapt to trim both adapters and low quality reads
       - Run bwa align on each readgroup independently
       - Run fastqc on each readgroup independenly
       - Pass multiple bam files to picard MarkDuplicates
       - Remove safe_fastqc.sh script
       - Remove fastqc_stats
       - Remove fastqc coverage from covstats
       - Update test data for slight differences in output vcf files
       - Add tests for fastqc zip files
      8670b32a
  4. 17 Apr, 2020 2 commits
  5. 16 Apr, 2020 7 commits
  6. 15 Apr, 2020 1 commit
    • van den Berg's avatar
      Remove global python variables · 7da15fdb
      van den Berg authored
      To make the pipeline more robust, the global python variables for
      various settings were removed where possible. Their values have been
      moved to the configuration json file, and a jsonschema validation has
      been added to the pipeline to make sure the configuration is valid.
      
      The downsampling step using seqtk has been remove since it was not used.
      
      The following additional changes were made:
       - Remove all --config values except CONFIG_JSON
       - Extend the config schema with the required and optional files that
         are supported
       - Add jsonschema validation of CONFIG_JSON
       - Remove global variables for scripts, add them to settings
         dictionary
       - Remove global variable for SAMPLES, use the settings dictionary
         instead
       - Remove support for multiple bed files
       - Remove support for multiple refFlat files
       - Remove support for downsampling of reads
       - Add json and jsonschema to the requirements
       - Update tests to work with the new config file
      7da15fdb
  7. 08 Apr, 2020 1 commit
  8. 07 Apr, 2020 2 commits
  9. 06 Apr, 2020 2 commits
  10. 03 Apr, 2020 2 commits
  11. 01 Apr, 2020 1 commit
  12. 31 Mar, 2020 5 commits
  13. 30 Mar, 2020 2 commits