Promoting INS to DUP issue
Created by: drtconway
Hi Mutalyzer,
Thanks for your awesome work! We use mutalyzer on well over 1000 research & clinical assays per month and it's great!
We've run in to an issue which I think is actually a flaw in the HGVS specification, which Mutalyzer is faithfully implementing.
We have an annotation pipeline which has the following steps (mixed in with other stuff):
- transliterate variants from the VCF to HGVSg.
- use mutalyzer to do position conversion, from which we pick a preferred HGVSc name
- use the name checker to normalise the nomenclature
- convert the HGVSc back to "normalised" HGVSg
- rewrite the VCF in "normalised" form (with the HGVS{g,c,p} in the info field) for downstream tools
We've recently had a couple of cases where this blows up in our faces.
Consider the transilterated HGVSg for a 4bp insertion:
chr1:g.33479001_33479002insATGTC
This is converted by Mutalyzer to the following HGVSc:
NM_013411.4:c.500_501insGACAT
Which in turn is normalised by Mutalyzer to:
NM_013411.4:c.c.496_500dup
And the corresponding HGVSg produced by Mutalyzer for this is:
chr1:g.33479002_33480125dup
What started out as a 4bp insertion has turned into a 1124bp duplication (insertion)!
The reason is clear when we examine (e.g. in the web interface) the HGVSc form. In particular, the coding positions of the 5th and 6th exons:
5 426 498 6 499 694
The original HGVSc is an insertion just after the splice junction joining exons 5 & 6. Just by coincidence, the inserted sequence happens be the same as the preceding 4bp of the transcript (i.e. the last 2bp of exon 5 and the first 2bp of exon 6) (1/(4^4)=1/256, so I guess we shouldn't be too shocked). Accordingly, and following the prioritisation rules, the INS variant is promoted to a DUP variant. However, like 3' shifting variants onto or across splice junctions, this is misleading.
It seems to me that just as the 3' shifting rule has explicit exceptions relating to splice junctions, the prioritisation rules for duplications and inversions should also disallow promotion from insertion or deletion-insertion. Both duplications and inversions describe variants in terms of the sequence content of a reference sequence, not just position in the reference. As such, it is problematic when the implied sequence contains a splice junction.
Would you consider modifying Mutalyzer to avoid promoting ins->dup and delins->inv in such circumstances.
Thanks, Tom.