bin/read_sequence_lengths.sh only works with certain IDs
The current script is very simple and assumes sequence IDs of 32 characters. In many cases, this is probably fine, but it is not very flexible. A better approach may be to replace this script by a Python script that uses Biopython (SeqIO) to parse the fasta, and return IDs that way. For now the most important file to match is the blast output, so that should be check number 1 when implementing this. Later I might want to match this to other tables/scripts as well , to link with e.g. custom gene names. (Custom names can help make figures easier to read than GenBank accession IDs.)