Initial release of jovian-screener

jovian-screener is an extension to Jovian and facilitates the screening for species and genes of interest in metagenomes. This initial release features the following components:

  • an import script for the metagenomics output of Jovian (Python)
  • used software as conda environment YAML files
  • a Snakemake-workflow that:
    • reads user-defined configurations from a YAML file
    • filters species of interest from Jovian's all_taxClassified.tsv output table
    • compares classified scaffolds between samples (experimental feature: likely to fail)
    • searches scaffolds for sequences (for instance genes) of interest (with BLAST; and filters results to a minimum length: default half of the given sequence)
    • searches trimmed and filtered reads for sequences (genes) of interest (with BWA MEM; default parameters)
    • quantifies species and sequences of interest to the number of reads mapped to them

Note that the current version assumes paired-end Illumina reads as input and uses both paired and unpaired reads yielded by trimmomatic for mapping and quantification of sequences of interest. Also, after mapping reads are deduplicated using PICARD's MarkDuplicates. Quantifications are stored for each 1) reads containig duplicates and 2) deduplicated reads, as well as for A) paired reads and B) unpaired reads; and paired and unpaired reads are summed to create 'total' read counts.
It is advised to use the deduplicated, summed quantifications such as in the output file data/processed/Deduplicated_read_counts.tsv.

This wealth of output files is intended for experimental purposes and mostly help answer the questions:

  • do unpaired reads add any information that is missed when using paired reads only?
  • are there any (many?) duplicate reads that influence the number of mapped reads and depth of coverage?