LRG transcript mappings ignore 1:n transcript:protein links
In LRG records, transcript:protein links can be n:m, instead of the 1:1 as is the case in ordinary RefSeq (and assumed by Mutalyzer). See #340 for a proposal on how to deal with this from a syntactical point of view.
In the position converter we only deal with c.
, not with p.
, so I would say that 1:n links would be the main concern. An example is TP53 where we should report c.
positions on both LRG_321t1p1
and LRG_321t1p8
(their CDS differs, affecting c.
positioning). Currently, this is not supported:
- HGVS grammar does not allow to select both transcript and protein. #340
- Our mapping database does not allow to specify protein.
- The code we use to parse EBI LRG transcripts map files (e.g., GRCh37/hg19) ignores the protein.
Point 1 is only relevant for accepting c.
input, since reporting c.
for g.
input does not depend on grammar support.
Point 2 means currently only one mapping per transcript is supported. Point 3 means a random one is picked if there are multiple in the input file (e.g., p1 and p8 for LRG_321t1
).
A first and relatively easy step could be to work on points 2 and 3, so all mappings are reported correctly. This would not yet allow the user to specify the protein when entering c.
as input.