Frequently Asked Questions

How should I refer to Mutalyzer?

Mutalyzer is described in:

Wildeman M, van Ophuizen E, den Dunnen JT, Taschner PE. Improving sequence variant descriptions in mutation databases and literature using the MUTALYZER sequence variation nomenclature checker. Hum Mutat 29:6-13 (2008) [ PMID: 18000842].

What does Mutalyzer do?

Mutalyzer checks sequence variant descriptions given a certain reference sequence and, if necessary, tries to correct them according to the HGVS nomenclature guidelines. Descriptions of its functionality can be found in the Help file.

Were can I find examples of Mutalyzer input data?

Most Mutalyzer web pages show an example of the input data accepted. Additional descriptions of input data formats can be found in the Help file.

Can Mutalyzer analyze sequence traces?

No, Mutalyzer only checks sequence variant descriptions. Sequence variant descriptions can be generated from sequence traces using third party software, e.g., MutationSurveyor.

Why does Mutalyzer only accept GenBank Accession Numbers or files in GenBank format?

Mutalyzer has been developed to extract sequence and annotation information from files in GenBank format. This extraction of information will not work properly with other formats.

Why does Mutalyzer not work directly with GenBank Accession Numbers starting with NC_ or NT_?

GenBank Accession Numbers starting with NC_ or NT_ refer to contigs of smaller sequences, potentially interspaced with gaps. Mutalyzer will try to retrieve the underlying sequences to check the sequence variant, but it may lose track of the corresponding positions due to the different levels of assembly and return errors. Users are advised circumvent this problem by using the Genbank uploader when (part of) these NC_ or NT references are used. The Mutalyzer exercise provides more detailed information.

Why does Mutalyzer not accept positions outside exons when using a coding DNA reference sequence?

Mutalyzer uses the reference sequence to check for the presence of the nucleotides at the positions specified. Since promoter sequences, intron sequences and intergenic sequences are not included in a coding DNA reference sequence, Mutalyzer is unable to check these and will issue an "Out of bounds" error. We strongly suggest to use genomic reference sequences to describe changes in promoter sequences, intron sequences and intergenic sequences.

I am using a genomic reference sequence, but I am still unable to check intron variants

Mutalyzer checks the annotation of the genomic reference sequence for information about the genes, their exons and protein coding sequence. When these features are not annotated, Mutalyzer is unable to use the coding DNA numbering scheme and will return an error. Please note that Mutalyzer ignores non-coding transcripts, because the coding DNA numbering scheme can not be applied in the absence of a start codon. The HGVS sequence variation nomenclature guidelines do not yet provide guidance on this issue.

How do I find the correct reference sequence?

Although any file in GenBank format can be used, curated RefSeq sequences are preferred (See HGVS Reference Sequence discussion). Most locus-specific mutation databases (LSDBs) specify reference sequences for the genes of interest. In many cases, these will be coding DNA reference sequences. If you want to check changes in promoter sequences, intron sequences and intergenic sequences, you should contact the curator of the LSDB to get a genomic DNA reference sequence.

If you are the curator of an LSDB in need of an appropriate genomic reference sequence, you can use the options on the GenBank uploader page to select a genomic reference sequence. More information about the selection and modification of reference sequences can be found in the Mutalyzer exercise.

I am using the correct RefSeq Accession number. Why is the position of most or all sequence variants on transcript or protein level different from what I expected?

Every GenBank file has an Accession number and a version number (e.g. AB026906.1). If the version number has not been specified, Mutalyzer will use the last version of this file. The sequence annotation of the last version may differ from that of an earlier version, leading to new/changed transcripts and protein sequence information, which is automatically used by Mutalyzer to describe the variants. Most locus-specific mutation databases (LSDBs) specify the accession numbers of reference sequences for the genes of interest, but they should also include the version number to prevent unexpected Mutalyzer results. You can check the influence of the version number on Mutalyzer's analysis by specifying a previous version number. If this solves the problem, please ask the curator of the LSDB to specify the correct version of the reference sequence.

Why does Mutalyzer fail to recognize my SNP descriptions?

Mutalyzer checks if the nucleotide changed is present in the reference sequence. SNPs are commonly indicated in dbSNP as two or more possible alleles at the same position of the sequence. According to the HGVS nomenclature guidelines, only alleles which differ from the reference can be described. As a result, NM_003002.1:c.204C>T is approved, but NM_003002.1:c.204T>C will result in an error. When the dbSNP identifier is known, the SNP converter can be used to generate the correct description (see the Help file for more information.

Why does the Genbank uploader not work with my local file or the URL provided?

Mutalyzer checks if it gets a GenBank Flat file in both cases. Other formats can not be processed correctly and will generate an error.

Why do large deletions seem shorter using coding DNA position numbering than genomic position numbering?

The difference in deletion size is caused by our intention to reflect the effect of variations on transcript level, when coding DNA position numbering is used. Therefore, ranges of deleted nucleotides are limited to positions present in the coding DNA reference sequence.

Can Mutalyzer analyze sequence variant descriptions from other organisms?

Yes, Mutalyzer can check sequence variant descriptions from other organisms as long as a proper reference sequence is provided and the HGVS sequence variation nomenclature guidelines are applied. Mutalyzer will check the reference sequence annotation to determine which codon table should be used for proper translation of coding sequences.

How do I create a batch checker file?

The easiest way to create a batch checker file is to download the Example file. Please right-click the link and select "Save as" to download the example file for modification. Open the batchtest.txt file in Excel. The Text Import Wizard window will open to guide you through the import procedure. Click "Finish" to finalize the import. The Excel spreadsheet created should have three columns and a header row containing the column names. You can add your data to the batchtest.txt file by typing or pasting the required information into the appropriate fields. When you are finished select "Save as" from the File menu and save your file using a different name (without spaces!) as type: Text (tab delimited)(*.txt).

In case of problems follow the separate import steps by verifying the correct selections before clicking the "Next" button to go to step 2. Click the "Next" button to go to step 3, click "Finish" to finalize the import.

I am having trouble with the batch checker. What should I do?

The batch checker is relatively sensitive to unexpected file and file name formats. Users are advised to check the following:

File name: The file name should not contain any spaces. Windows users can check the file extension: if it is not .txt, but .doc or .xls, the file is unlikely to be a tab-delimited textfile.

File format: The tab-delimited textfile should contain a header row (the first line) containing the words AccNo Genesymbol Mutation separated by a single tab.

File format: The file should be a tab-delimited text file. Excel users can import the file to check its format. The file should have three columns with the appropriate column names, AccNo Genesymbol Mutation, respectively. The variant information should be present in the corresponding columns.

The Genesymbol column/field may be left blank, when the reference sequence contains only one gene, but a tab should be present. Please note that Mutalyzer does not check the correctness of the gene symbol in this case.

Why can't I use the bugtracker?

The bugtracker can only be accessed via a secure connection. Normally, when you try to connect securely, sites will present trusted identification to prove that you are going to the right place. However, this site's identity can't be verified by the browser because the security certificate is self signed (and not signed by the proper authorities).
A certificate that is signed by the proper authorities costs a lot of money, so for the time being we have solved it in this way.
To get access to the bugtracker, you need to add a security exception. Usually the browser presents this option.

 


If you have any comments or suggestions be sure to let us know!

Last modified: January 14, 2011

mutalyzer@humgen.nl