Fix for unexpected batch processor crash
Created by: mihailefter
Problem
It looks like the batch processor crashes in the __alterBatchEntries
function. Considering an input file with the following contents:
- NM_024690:c.40638_40638delinsCTGGA
- NM_024690:c.3609_3788AAGTAATATTCCAACAAGTGGTGCCATAGGAAAAAGCACCCTGGTTCCCTTGGACACTCCATCTCCAGCCACATCATTGGAGGCATCAGAAGGGGGACTTCCACCCTCAGCACCTACCCTGAATCAACAAACACACCCAGCATCCACCTCGGAGCACACGCTAGTTCAGAAAGTCCG
During the processing of the first job (which proceeds without crashing) Mutalyzer fetches the most recent version for NM_024690, which is NM_024690.2. Next it tries to update any other entries in the batch_queue_items
database table which utilize only NM_024690 to the most recent version. This is done in order to speed up the batch process when those jobs are reached. In this case it tries to update the second job. The information stored in the item
column of the batch_queue_items
table for the second job has 200 characters, which is to be replaced by a larger one, of 202 characters. Since this is greater than the maximum allowed, the query results in an error:
(psycopg2.DataError) value too long for type character varying(200)
It seems that an input line is automatically truncated to 200 characters when added to the database, so no error appears there, but during the replace operation the truncation is no longer performed.
Possible solutions
- Change the
item
column type inbatch_queue_items
table to a variable unlimited length type. This is supported by PostgreSQL as type text but didn't check for other SQL database management systems. - Don't process (skip) entries which are longer than 200 chars and do not perform the replacement query on them.