Inconsistent number of fields in batch output
Created by: mutalyzerbot
Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/176 Original date: 2014/10/10 Original reporter: ken DOT doig AND petermac DOT org
Submitting the mutations to the batch name checker JSON interface results in a variable number of output fields per mutation.
eg curl -d 'process=NameChecker' -d 'argument=hg19' -d "data=$(echo 'NM_001093772.1:c.1656_1658del,NM_001276760.1:c.673-36G>C,NM_005228.3:c.2239_2250del' | base64)" 'https://mutalyzer.nl/json/submitBatchJob'
Input:
NM_001093772.1:c.1656_1658del NM_001276760.1:c.673-36G>C NM_005228.3:c.2239_2250del
Output:
Input Errors and warnings AccNo Genesymbol Variant Reference Sequence Start Descr. Coding DNA Descr. Protein Descr. GeneSymbol Coding DNA Descr. GeneSymbol Protein Descr. Genomic Reference Coding Reference Protein Reference Affected Transcripts Affected Proteins Restriction Sites Created Restriction Sites Deleted
NM_001093772.1:c.1656_1658del (variantchecker): Sequence "GTG" at position 1743_1745 was given, however, the HGVS notation prescribes that on the forward strand it should be "TGG" at position 1744_1746. NM_001093772.1 KIT_v001 c.1656_1658del n.1744_1746del c.1657_1659del p.(Trp553del) KIT_v001:c.1657_1659del KIT_v001:p.(Trp553del) NM_001093772.1 NP_001087241.1 NM_001093772.1(KIT_v001):c.1657_1659del NM_001093772.1(KIT_i001):p.(Trp553del) BtsIMutI,HpyCH4III,TspRI
NM_001276760.1:c.673-36G>C (variantchecker): Intronic position given for a non-genomic reference sequence.
NM_005228.3:c.2239_2250del NM_005228.3 EGFR_v001 c.2239_2250del n.2485_2496del c.2239_2250del p.(Leu747_Ala750del) EGFR_v001:c.2239_2250del EGFR_v001:p.(Leu747_Ala750del) NM_005228.3 NP_005219.2 NM_005228.3(EGFR_v001):c.2239_2250del NM_005228.3(EGFR_i001):p.(Leu747_Ala750del) MseI
The first but returns 17 fields (matching header cols) The second returns 4 fields with an error The third return 17 fields without an error
As the record separator (tab) is the same as the field separator (tab) it is very difficult to parse the output. This behaviour has changed form the server upgrade where all errored mutations returned 4 fields consistently.
Please can all mutations whether errored or not return 17 fields with a record separator distinct from the field separator.