Commits · d9335656fa3e956fdf2702fc3c8f9346b1f28057 · Mirrors / mutalyzer

Feb 22, 2016

Support LRG transcripts in the position converter · d9335656

Vermaat authored 9 years ago

Note that we explicitely only support LRG references as transcripts,
so using c. positioning to convert to/from chromosomal positioning.

Supporting LRG references as genomic referenes, so using g. positioning
can be future work but converting them to/from LRG transcripts is of
course already done by the name checker.

Converting between genomic LRG positioning and chromosomal positioning
directly is not something that can be easily supported in the current
setup of the position converter.

d9335656

Feb 10, 2016

Don't report ext*? when variant RNA has stop codon · 9191352b

Vermaat authored 9 years ago

With the change introduced by #65 we forgot if the variant RNA has an
alternative downstream stop codon and therefore always reported ext*?
when the original stop codon was removed.

Fixes #145

9191352b

Dec 19, 2015

Keep incomplete genes with complete features · 8fac2dc7

Vermaat authored 9 years ago

With this change the genbank parser no longer discards incomplete genes
directly but keeps them as long as they have complete features
annotated.

For example, the PIK3R2 gene is annotated on NC_000019.9 (or a slice) as
4973..>22328 with two RNA entries. One of these, however, is complete so
it would be a shame to discard the entire gene.

8fac2dc7

Add gene feature to genbank file without version · c1ea8bc3

Vermaat authored 9 years ago

This genbank file is incomplete and incorrect anyway, but this was
not the mistake we want to test.

c1ea8bc3

Dec 18, 2015

Create legend only after gene model enrichment · 4db71666

Vermaat authored 9 years ago

This fixes a bug where transcripts created from CDS by construction did
not show up in the legend because the legend was created before that
construction.

4db71666

Nov 10, 2015
- Parse genbank file without VERSION field · d18b5395
  Vermaat authored 9 years ago
  
  Partial fix for https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/188
  d18b5395
- Unit test for NCBI mapview file import · 5d31ce12
  Vermaat authored 9 years ago
  
  5d31ce12
Oct 29, 2015

Use interval binning scheme on transcript mappings · e0a127cf

Vermaat authored 9 years ago

This speeds up lookup of transcript mappings by genomic position
a lot. By filtering on bin index, such a query now uses the index
on the bin column, where previously this would involve a
sequential table scan.

http://interval-binning.readthedocs.org/

e0a127cf

Oct 26, 2015
- Add tests for backtranslator module · 7feaf889
  Vermaat authored 9 years ago
  
  7feaf889
- Refactor transcript-protein links to raise NoLinkError instead of None · 0f6cafe0
  Vermaat authored 9 years ago
  
  0f6cafe0
- Never load MUTALYZER_SETTINGS in tests · e13a5017
  Vermaat authored 9 years ago
  
  e13a5017
Oct 23, 2015
- Optionally include versions in transcript-protein links · 2d1771a5
  Vermaat authored 9 years ago
  
  2d1771a5
Oct 22, 2015
- Add with_references and with_links decorators for unit tests · b36be291
  Vermaat authored 9 years ago
  
  b36be291
- Add links fixture for unit tests · c97e32b9
  Vermaat authored 9 years ago
  
  c97e32b9
Oct 20, 2015

Cache transcript protein links in Redis · 473c732c

Vermaat authored 9 years ago

Caching of transcript protein links received from the NCBI Entrez
service is a typical use case for Redis. This implements this cache
in Redis and removes all use of our original database table.

An Alembic migration copies all existing links from the database to
Redis. The original `TranscriptProteinLink` database table is not
dropped. This will be done in a future migration to ensure running
processes don't error and to provide a rollback scenario.

We also remove the expiration of links (originally defaulting to 30
days), since we don't expect them to ever change. Negative links
(caching a 'not found' result from Entrez) *are* still expiring,
but with a longer default of 30 days (was 5 days).

The configuration setting for the latter was renamed, yielding the
following changes in the default configuration settings.

Removed default settings:

    # Expiration time for transcript<->protein links from the NCBI (in seconds).
    PROTEIN_LINK_EXPIRATION = 60 * 60 * 24 * 30

    # Expiration time for negative transcript<->protein links from the NCBI (in
    # seconds).
    NEGATIVE_PROTEIN_LINK_EXPIRATION = 60 * 60 * 24 * 5

Added default setting:

    # Cache expiration time for negative transcript<->protein links from the NCBI
    # (in seconds).
    NEGATIVE_LINK_CACHE_EXPIRATION = 60 * 60 * 24 * 30

473c732c

Oct 13, 2015
- Add tests for ncbi module · 44cd2b31
  Vermaat authored 9 years ago
  
  44cd2b31
- Disable network in unit tests · 95c893c8
  Vermaat authored 9 years ago
  
  95c893c8
- Refactor unit tests using common py.test layout and fixtures · d94f20cf
  Vermaat authored 9 years ago
  
  d94f20cf
Oct 10, 2015
- Use MUTALYZER_TEST_REDIS_URI in unit tests · da6b5839
  Vermaat authored 9 years ago
  
  da6b5839
Oct 01, 2015
- Test the migrations on database content · fe532841
  Vermaat authored 9 years ago
  
  fe532841
Sep 30, 2015
- Test database migrations · 141aa09e
  Vermaat authored 9 years ago
  
  141aa09e
Sep 27, 2015
- Test querying transcript-protein links · 45f7d276
  Vermaat authored 9 years ago
  
  45f7d276
Sep 23, 2015

Show diff for variant protein from non-reference start codon · 3c98a1af

Vermaat authored 9 years ago

The alternative variant protein sequence translated from a
non-reference start codon (created by the variant), was not
color-diffed as normal variant protein sequences are.

In the process we also rename the `oldprotein` and `newprotein`
fields in the output object to `oldProtein` and `newProtein` to
be more consistent with other field names.

3c98a1af

Visualise protein change, also with alternative start · 851e71fe

Vermaat authored 9 years ago

In the case of an alternative start codon (in the reference CDS),
protein changes were not visualised. This is fixed and a WALTSTART
warning is also issued.

Also, if a new non-reference start codon is created by the variant,
visualise this as such.

851e71fe

Translate alternative start to M, also in variant · ae70ddfd

Vermaat authored 9 years ago

In case of an alternative start codon, the variant CDS was not
translated to a protein starting with M. This caused the protein
description machinery to conclude a variant affecting the start
codon, hence reporting `p.?`.

We fix this by always translating the start codon to M (except
when the variant actually affects it).

Example: `NM_024426.4:c.1107A>G` (a synomymous mutation) should
yield `NM_024426.4(WT1_i001):p.(=)`, not `p.?`. The start codon
for that protein is `CTG`.

ae70ddfd

Aug 10, 2015
- Customizable database connection uri for unit tests · 36626215
  Vermaat authored 9 years ago
  
  36626215
Aug 04, 2015
- Fix bug in recognizing p.(=) · 6435f0cf
  Vermaat authored 9 years ago
  
  6435f0cf
Jul 15, 2015

Uncertain stop codon in protein descriptions (fs and ext) · d2f91690

Vermaat authored 9 years ago

When a variant results in a frame shift or extension and we don't
see a new stop codon in the RNA, the protein description should use
the notation for an uncertain stop codon, e.g., `p.(Gln730Profs*?)`
instead of `p.(Gln730Profs*96)` where 96 is just the last codon in
our transcript [1].

To detect this, we now use `to_stop=False` in our `.translate()`
calls, since that will explicitely return `*` characters for stop
codons.

We also slightly fix the coloring of changes in the protein sequence
where previously changed stop codon characters where not included.

[1] http://www.hgvs.org/mutnomen/FAQ.html#nostop

d2f91690

Jul 09, 2015
- Fix cache fixture in tests · f1e57a13
  Vermaat authored 9 years ago
  
  f1e57a13
- Convert DNA to uppercase when reading from plain text · 93159a0e
  Vermaat authored 9 years ago
  
  93159a0e
Jul 03, 2015

Use chardet instead of cchardet · dedad241

Vermaat authored 9 years ago

Issue #50 showed a problem in our file encoding detection, caused
by our cut-off for the confidence as reported by the cchardet [1]
library:

    >>> import cchardet
    >>> s = u'NM_000052.4:c.2407\u20132A>G'
    >>> b = s.encode('WINDOWS-1252')
    >>> cchardet.detect(b)
    {'confidence': 0.5, 'encoding': u'WINDOWS-1252'}

We require a confidence stictly greater than 0.5 and default to
UTF8 otherwise.

If, however, we try the same thing using the chardet [2] library,
we get a higher confidence for the same string:

    >>> import chardet
    >>> chardet.detect(b)
    {'confidence': 0.73, 'encoding': 'windows-1252'}

So the two obvious ways to solve this are:

1. Lower the confidence threshold.
2. Use chardet instead of cchardet.

We implement the second solution here, since it also removes a C
library dependency and we are not worried by performance.

Of course the detected encoding remains a guess which can still
be wrong!

[1] https://github.com/PyYoshi/cChardet
[2] https://github.com/chardet/chardet

Fixes #50

dedad241

May 31, 2015
- Configurable maximum input length for description extractor · ee390387
  Vermaat authored 9 years ago
  
  Adds a `EXTRACTOR_MAX_INPUT_LENGTH` configuration setting, defaulting to 50 Kbp.
  ee390387
May 18, 2015
- New description extractor web interface · 55d10b82
  Jeroen F.J. Laros authored 9 years ago and Vermaat committed 9 years ago
  
  We can now compare two sequences by supplying their sequence strings, accession numbers, or uploaded file.
  55d10b82
May 01, 2015
- Fix descriptionExtract webservice · 7d7cb6af
  Vermaat authored 9 years ago
  
  7d7cb6af
Apr 30, 2015
- Moved describe functionality to the extractor package. · 6c64e5ee
  Jeroen F.J. Laros authored 9 years ago and Vermaat committed 9 years ago
  
  6c64e5ee
- PEP8. · 57c55d0f
  Jeroen F.J. Laros authored 9 years ago and Vermaat committed 9 years ago
  
  57c55d0f
- Integrated the description extractor in the website. · 216146bb
  Laros authored 10 years ago and Vermaat committed 9 years ago
  
  216146bb
- Some more refactoring. · 2db722ff
  Laros authored 10 years ago and Vermaat committed 9 years ago
  
  2db722ff
- Fixed empty allele bug. · 52724cc8
  Laros authored 10 years ago and Vermaat committed 9 years ago
  
  52724cc8
- Fixed erroneous unit tests. · b0d85531
  Laros authored 10 years ago and Vermaat committed 9 years ago
  
  b0d85531