Skip to content

Fix for unexpected batch processor crash

Mihai Lefter requested to merge fix-batch-crash-on-db-alter into master

Created by: mihailefter

Problem

It looks like the batch processor crashes in the __alterBatchEntries function. Considering an input file with the following contents:

  • NM_024690:c.40638_40638delinsCTGGA
  • NM_024690:c.3609_3788AAGTAATATTCCAACAAGTGGTGCCATAGGAAAAAGCACCCTGGTTCCCTTGGACACTCCATCTCCAGCCACATCATTGGAGGCATCAGAAGGGGGACTTCCACCCTCAGCACCTACCCTGAATCAACAAACACACCCAGCATCCACCTCGGAGCACACGCTAGTTCAGAAAGTCCG

During the processing of the first job (which proceeds without crashing) Mutalyzer fetches the most recent version for NM_024690, which is NM_024690.2. Next it tries to update any other entries in the batch_queue_items database table which utilize only NM_024690 to the most recent version. This is done in order to speed up the batch process when those jobs are reached. In this case it tries to update the second job. The information stored in the item column of the batch_queue_items table for the second job has 200 characters, which is to be replaced by a larger one, of 202 characters. Since this is greater than the maximum allowed, the query results in an error:

(psycopg2.DataError) value too long for type character varying(200)

It seems that an input line is automatically truncated to 200 characters when added to the database, so no error appears there, but during the replace operation the truncation is no longer performed.

Possible solutions

  1. Change the item column type in batch_queue_items table to a variable unlimited length type. This is supported by PostgreSQL as type text but didn't check for other SQL database management systems.
  2. Don't process (skip) entries which are longer than 200 chars and do not perform the replacement query on them.

Merge request reports