- Feb 22, 2016
-
-
Vermaat authored
Note that we explicitely only support LRG references as transcripts, so using c. positioning to convert to/from chromosomal positioning. Supporting LRG references as genomic referenes, so using g. positioning can be future work but converting them to/from LRG transcripts is of course already done by the name checker. Converting between genomic LRG positioning and chromosomal positioning directly is not something that can be easily supported in the current setup of the position converter.
-
- Feb 10, 2016
-
-
Vermaat authored
With the change introduced by #65 we forgot if the variant RNA has an alternative downstream stop codon and therefore always reported ext*? when the original stop codon was removed. Fixes #145
-
- Dec 19, 2015
-
-
Vermaat authored
With this change the genbank parser no longer discards incomplete genes directly but keeps them as long as they have complete features annotated. For example, the PIK3R2 gene is annotated on NC_000019.9 (or a slice) as 4973..>22328 with two RNA entries. One of these, however, is complete so it would be a shame to discard the entire gene.
-
Vermaat authored
This genbank file is incomplete and incorrect anyway, but this was not the mistake we want to test.
-
- Dec 18, 2015
-
-
Vermaat authored
This fixes a bug where transcripts created from CDS by construction did not show up in the legend because the legend was created before that construction.
-
- Nov 10, 2015
-
-
Vermaat authored
Partial fix for https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/188
-
Vermaat authored
-
- Oct 29, 2015
-
-
Vermaat authored
This speeds up lookup of transcript mappings by genomic position a lot. By filtering on bin index, such a query now uses the index on the bin column, where previously this would involve a sequential table scan. http://interval-binning.readthedocs.org/
-
- Oct 26, 2015
- Oct 23, 2015
-
-
Vermaat authored
-
- Oct 22, 2015
- Oct 20, 2015
-
-
Vermaat authored
Caching of transcript protein links received from the NCBI Entrez service is a typical use case for Redis. This implements this cache in Redis and removes all use of our original database table. An Alembic migration copies all existing links from the database to Redis. The original `TranscriptProteinLink` database table is not dropped. This will be done in a future migration to ensure running processes don't error and to provide a rollback scenario. We also remove the expiration of links (originally defaulting to 30 days), since we don't expect them to ever change. Negative links (caching a 'not found' result from Entrez) *are* still expiring, but with a longer default of 30 days (was 5 days). The configuration setting for the latter was renamed, yielding the following changes in the default configuration settings. Removed default settings: # Expiration time for transcript<->protein links from the NCBI (in seconds). PROTEIN_LINK_EXPIRATION = 60 * 60 * 24 * 30 # Expiration time for negative transcript<->protein links from the NCBI (in # seconds). NEGATIVE_PROTEIN_LINK_EXPIRATION = 60 * 60 * 24 * 5 Added default setting: # Cache expiration time for negative transcript<->protein links from the NCBI # (in seconds). NEGATIVE_LINK_CACHE_EXPIRATION = 60 * 60 * 24 * 30
-
- Oct 13, 2015
- Oct 10, 2015
-
-
Vermaat authored
-
- Oct 01, 2015
-
-
Vermaat authored
-
- Sep 30, 2015
-
-
Vermaat authored
-
- Sep 27, 2015
-
-
Vermaat authored
-
- Sep 23, 2015
-
-
Vermaat authored
The alternative variant protein sequence translated from a non-reference start codon (created by the variant), was not color-diffed as normal variant protein sequences are. In the process we also rename the `oldprotein` and `newprotein` fields in the output object to `oldProtein` and `newProtein` to be more consistent with other field names.
-
Vermaat authored
In the case of an alternative start codon (in the reference CDS), protein changes were not visualised. This is fixed and a WALTSTART warning is also issued. Also, if a new non-reference start codon is created by the variant, visualise this as such.
-
Vermaat authored
In case of an alternative start codon, the variant CDS was not translated to a protein starting with M. This caused the protein description machinery to conclude a variant affecting the start codon, hence reporting `p.?`. We fix this by always translating the start codon to M (except when the variant actually affects it). Example: `NM_024426.4:c.1107A>G` (a synomymous mutation) should yield `NM_024426.4(WT1_i001):p.(=)`, not `p.?`. The start codon for that protein is `CTG`.
-
- Aug 10, 2015
-
-
Vermaat authored
-
- Aug 04, 2015
-
-
Vermaat authored
-
- Jul 15, 2015
-
-
Vermaat authored
When a variant results in a frame shift or extension and we don't see a new stop codon in the RNA, the protein description should use the notation for an uncertain stop codon, e.g., `p.(Gln730Profs*?)` instead of `p.(Gln730Profs*96)` where 96 is just the last codon in our transcript [1]. To detect this, we now use `to_stop=False` in our `.translate()` calls, since that will explicitely return `*` characters for stop codons. We also slightly fix the coloring of changes in the protein sequence where previously changed stop codon characters where not included. [1] http://www.hgvs.org/mutnomen/FAQ.html#nostop
-
- Jul 09, 2015
- Jul 03, 2015
-
-
Vermaat authored
Issue #50 showed a problem in our file encoding detection, caused by our cut-off for the confidence as reported by the cchardet [1] library: >>> import cchardet >>> s = u'NM_000052.4:c.2407\u20132A>G' >>> b = s.encode('WINDOWS-1252') >>> cchardet.detect(b) {'confidence': 0.5, 'encoding': u'WINDOWS-1252'} We require a confidence stictly greater than 0.5 and default to UTF8 otherwise. If, however, we try the same thing using the chardet [2] library, we get a higher confidence for the same string: >>> import chardet >>> chardet.detect(b) {'confidence': 0.73, 'encoding': 'windows-1252'} So the two obvious ways to solve this are: 1. Lower the confidence threshold. 2. Use chardet instead of cchardet. We implement the second solution here, since it also removes a C library dependency and we are not worried by performance. Of course the detected encoding remains a guess which can still be wrong! [1] https://github.com/PyYoshi/cChardet [2] https://github.com/chardet/chardet Fixes #50
-
- May 31, 2015
-
-
Vermaat authored
Adds a `EXTRACTOR_MAX_INPUT_LENGTH` configuration setting, defaulting to 50 Kbp.
-
- May 18, 2015
-
-
We can now compare two sequences by supplying their sequence strings, accession numbers, or uploaded file.
-
- May 01, 2015
-
-
Vermaat authored
-
- Apr 30, 2015
-
-