DBSNP, ONETHOUSAND and HAPMAP database files should not be hardcoded
Currently, the pipeline expects these three SNP databases to be specified, as they are used by the baserecal
rule. However, the gatk BaseRecalibrator
can accept an arbitrary number of known SNP files, including zero. Therefore, the pipeline should gracefully handle 0 or more files with known SNPs.
Note: using BaseRecalibrator
without any known SNPs will overestimate the error rate of your sequencing run, and artificially reduce the quality of the reads. If possible, output a warning when no known SNPs are specified.
This is relevant because of a request by GenomeScan to analyse mouse NGS data using hutspot. The expected SNP databases do not exist for mice.