Support nomenclature of repeat sequences
Created by: mutalyzerbot
Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/110 Original date: 2012/08/02 Original reporter: I DOT F DOT A DOT C DOT Fokkema AND LUMC DOT nl
Mutalyzer currently does not correctly support the repeat sequences nomenclature, while it would be very useful to have at least support for it in the syntax checker and position converter. At a later stage, it would be very useful to have it working in the name checker also (for unambiguous variants), especially for the protein effect prediction.
Please note that my previous explanation today of the correct syntax was not accurate. According to the nomenclature, the following examples are correct HGVS syntax:
-Short repeats, sequence given:*[[BR]] g.123TG[4][[BR]] This individual has exactly 4 repeats of "TG", which starts at position g.123. This means Mutalyzer will have to analyze the reference sequence starting at g.123 to see how many TG repeats are wildtype, to determine whether this is an deletion, insertion, or wildtype itself.[[BR]] Right now, the syntax check works, but the position converter returns an HTTP 500 (internal server error).
g.123TG(3_6)[[BR]] The TG repeat, starting at g.123, is repeated 3 to 6 times in the individuals described. I don't expect the name checker to give me anything useful here, but at least having the position converter would be nice.[[BR]] Right now, the syntax check works, but the position converter returns an HTTP 500 (internal server error).
-Long repeats; no sequence given, but sizes:*[[BR]] g.123_456[4][[BR]] This individual has the sequence g.123_456 repeated exactly 4 times. This means Mutalyzer will have to analyze the reference sequence of g.123_456 to see what the sequence is, and then analyze the reference sequence starting at position g.457 to see how many of these repeats are wildtype, to determine whether this is an deletion, insertion, or wildtype itself.[[BR]] Right now, the syntax check works, but the position converter removes the [4] when returning the results.
g.123_456(3_6)[[BR]] The sequence g.123_456 is repeated 3 to 6 times in the individuals described. I don't expect the name checker to give me anything useful here, but at least having the position converter would be nice.[[BR]] Right now, the syntax check doesn't work, thus the position converter also doesn't return any results.