Update rule 'compare_species_fastas' to prevent errors with lowly abundant or absent species
Currently the rule runs average_nucleotide_identity.py
by pyANI on anything you put in. However, if the input is empty or contains too little overlap to make meaningful comparisons, the script will fail. In turn, the output expected by Snakemake is not generated and the pipeline will halt by default. As a work-around I suggested the following:
Also note that the scaffolds of the species of interest will be compared to one another with pyANI. This will only work when there is enough sequence overlap, so usually if the species is sufficiently abundant. When using species that are not expected to be highly abundant, the pipeline is likely to return an error. To ignore this and continue with all other steps, add the parameter
--keep-going
can be used with your snakemake command. (See below for an example.)
I think it would be better to not have the pipeline crash, but catch this by touching the output files that Snakemake expects. E.g. with touch -r results/pyANI-{species}/ANIm_percentage_identity.png
. (Also see the example in Jovian with trimmomatic: https://github.com/DennisSchmitz/Jovian/blob/bcdcd5bc476dbca56b8d514d216b5cbdb69c8263/bin/rules/CleanData.smk#L35
This way there should be no error. The README would probably still require a note on empty output files: tell users that these may occur, that is probably means that there was too little overlap for good comparisons and that this should be explained in the log file log/compare_species_fastas-{species}.txt
.