A few tools are available to check sequence variation nomenclature:
The Mutalyzer sequence variation nomenclature checker has been developed as the first part of an integrated modular package of tools, which should allow users to obtain information about the effect of sequence variations in genes associated with human disease. The Mutalyzer package (under development) should assist decision-making in molecular diagnosis based on a sequence change detected in a patient's DNA and prevent misdiagnosis caused by either missing the pathogenic effect of a sequence variation reported as "polymorphic" or, more seriously, reporting a polymorphic change as non-pathogenic.
For this exercise, it is most convenient to right-click this link and open a separate window for Mutalyzer. The exercise will demonstrate the use of the different functionalities of the Mutalyzer sequence variation nomenclature checker. Its main functions can be selected from the index page or the list on the left side of any Mutalyzer window. For more information, please check this help file. Although the performance of Mutalyzer has been checked using complete sequence variation database contents, some changes might not be processed correctly. Please report any strange results to mutalyzer@humgen.nl
Mutalyzer needs the annotation of a Genbank reference sequence file to retrieve basic information for the check. Its functionality can be appreciated best using well-annotated genomic reference sequences, which contain information about all the transcripts and proteins encoded by the reference sequence.
For many disease genes, LSDB curators should have specified a reference sequence, preferably by listing its GenBank accession number. In many cases, this will be a RefSeq record with an accession number starting with NM_ (or XM_). These records describe transcript sequences and cannot be used by Mutalyzer to describe intron variants. If you need to check intron variants, please ask the LSDB curator to specify a well-annotated genomic reference sequence. (See AL449423.14 or NCBI's new RefSeqGene records for example). Alternatively, you can use the GenBank Uploader options described below to obtain a suitable genomic reference sequence. If the annotation of this record contains information about the transcript specified by the NM_ number, Mutalyzer can use the genomic sequence record to describe any variant relative to this transcript. Always use an Accession number (AL449423) in combination with a specific version number (14). Otherwise, Mutalyzer automatically uses the last version and may present unexpected results.
1) The Mutalyzer Name Generator
The Mutalyzer Name Generator provides an easy check of a sequence variation by returning the correct name of the variation according to HGVS nomenclature guidelines with information about the sequence surrounding the variation and the transcripts and proteins affected by the variation. Mutalyzer accepts in principle any Genbank sequence as a reference, but its functionality can be appreciated best using well-annotated genomic reference sequences.
Enter the accession number AL449423.14 as a reference sequence, select "genomic DNA", enter start position "1", stop position "1", select "deletion" and press submit.
The legend of the results describes the transcripts and proteins annotated in the reference sequence.
On the Mutalyzer page, now select "coding DNA", enter one of the gene symbols from the legend, and press submit to see the effect of a change on the major transcript (optional: enter, in addition to the gene symbol, the number of a transcript variant, e.g., 2 or v002 for the second transcript annotated in the record).
Repeat this for the other gene symbols, additional sequence variation types, changes in introns, across intron exon boundaries, etc. You may check the HGVS nomenclature guidelines or sequence variation databases for inspiration.
2) The Mutalyzer Name Checker
The Name Checker is most convenient when checking a few sequence variations, which have been described in a putative correct format. The description made by the generator can be submitted to the name checker to regenerate the additional information. It also allows you to change and check descriptions fast, once you get a feeling for it.
Submit one of the descriptions generated by the Mutalyzer Name Generator by pasting it into the submission box and pressing the submit button.
Repeat this after changing the positions or nucleotide(s).
Submit these descriptions to see the importance of a version number: NM_000787.3:c.61A>G and NM_000787.2:c.61A>G.
Compare the CDS annotation of the corresponding files NM_000787.3 and NM_000787.2 to see why Mutalyzer gives a different result.
How does NM_000787.3:c.61A>G compare to NM_000787.2:c.19A>G ?
3) The Mutalyzer SNP converter
The SNP converter has been developed to convert dbSNP identifiers to HGVS compliant variant descriptions, based on the GenBank nucleotide reference sequence specified by the user.
Submit rs9919552 as SNP Accession number and NM_003002 as Nucleotide Accession number.
Open the UCSC Human Genome Browser in a new window to see additional SNPs for this gene.
4) The Mutalyzer sequence variation description batch checker
The batch checker has been developed for database curators, but can be used with any list of sequence variations provided that they are submitted in the correct format: a tab-delimited text file.
Open the WordPad file containing the sequence variation descriptions that you have checked before.
Duplicate some of the entries and create a few mistakes in numbering, sequence type (e.g., replace duplication by insertion), nucleotides, description format.
Submit the file together with your e-mail address (lower case only!) to see the alerts and automatic corrections created by Mutalyzer.
4) The Mutalyzer reference sequence uploader
Users can upload their own reference sequence file in GenBank format, retrieve the genomic sequence of a gene with its flanking regions, or specify a chromosomal range for use as a reference sequence. Mutalyzer checks whether the file is in valid GenBank format. If so, Mutalyzer stores the file locally for a limited time and returns a unique UD identifier that can be used with all different forms of the Mutalyzer Sequence Variation Nomenclature Checker, except the SNP converter. This option allows users to use reference files, which are not present in GenBank, or add information about alternative transcripts or proteins or additional genes contained within or derived from the reference sequence to an existing GenBank file. We strongly recommend to limit your use of this option and to send a request for a new RefSeqGene record to rsgene@ncbi.nlm.nih.gov. Alternatively, you can submit annotation updates and corrections of existing GenBank files following these instructions.
Click the GenBank Uploader link in the list on the left side of any Mutalyzer window to see its options.
If you already have a well-annotated GenBank file on your computer or stored on a web server, the first two GenBank uploader options facilitate easy uploading and will return the unique UD identifier. This identifier can be used in the Name generator or Name Checker, which will then provide a link to download the record. This record can also be used as input for the LOVD reference sequence parser.
If you do not have a well-annotated GenBank file, the last two options can provide one. Curators of LOVD databases can use these options to generate well-annotated genomic reference sequence files for import into LOVD2.0 using the Reference Sequence Parser 2.0.
Option 3: Retrieving a well-annotated genomic reference sequence file using a (HGNC-approved) gene symbol in combination with the name of the organism (e.g. human, mouse, etc.).
Click the corresponding radio button and try this for your favourite gene.
In some cases, this may not work for your gene due to ambiguous gene symbols. If Mutalyzer mentions other problems, please report them! In those cases, or when you would like to include additional flanking sequences (e.g., promoters) you can also specify the range of a chromosomal sequence using the fourth option. The easiest way to find a chromosomal range is to search Entrez Gene using the gene symbol in combination with the name of the organism.
Open Entrez Gene in a new window and search with the gene symbol of your favorite gene in combination with the name of the organism. Open the correct link in the list of results and click on the link to reference sequence details under the heading "Genomic regions, transcripts and products".
Under the header "RefSeqs of Annotated Genomes", you will find an Acc. No starting with NC_, which refers to a chromosomal reference sequence. The positions behind Range refer to the start of the most upstream exon and the end of the most downstream exon, respectively.
Click on the Genbank link to view the sequence and its annotation. You can save the file by changing "Send to" in the Entrez menu bar to "Save".
Option 4: Retrieving a well-annotated genomic reference sequence file using chromosomal positions
Click the corresponding radio button and enter the NC_ Accession number and the range positions corresponding to the gene of interest.
You can modify the (annotation of the) genomic reference sequence file obtained via Entrez Gene.using Network aware Sequin.
If you have any comments or suggestions, please let us know!mutalyzer@humgen.nl