Commits · a98f48f17f0f43ce5bd1f617033bc63cf2570785 · Mirrors / mutalyzer

Oct 26, 2015
- Refactor transcript-protein links to raise NoLinkError instead of None · 0f6cafe0
  Vermaat authored 9 years ago
  
  0f6cafe0
- Never load MUTALYZER_SETTINGS in tests · e13a5017
  Vermaat authored 9 years ago
  
  e13a5017
Oct 23, 2015
- Optionally include versions in transcript-protein links · 2d1771a5
  Vermaat authored 9 years ago
  
  2d1771a5
Oct 22, 2015
- Add with_references and with_links decorators for unit tests · b36be291
  Vermaat authored 9 years ago
  
  b36be291
- Add links fixture for unit tests · c97e32b9
  Vermaat authored 9 years ago
  
  c97e32b9
Oct 20, 2015

Cache transcript protein links in Redis · 473c732c

Vermaat authored 9 years ago

Caching of transcript protein links received from the NCBI Entrez
service is a typical use case for Redis. This implements this cache
in Redis and removes all use of our original database table.

An Alembic migration copies all existing links from the database to
Redis. The original `TranscriptProteinLink` database table is not
dropped. This will be done in a future migration to ensure running
processes don't error and to provide a rollback scenario.

We also remove the expiration of links (originally defaulting to 30
days), since we don't expect them to ever change. Negative links
(caching a 'not found' result from Entrez) *are* still expiring,
but with a longer default of 30 days (was 5 days).

The configuration setting for the latter was renamed, yielding the
following changes in the default configuration settings.

Removed default settings:

    # Expiration time for transcript<->protein links from the NCBI (in seconds).
    PROTEIN_LINK_EXPIRATION = 60 * 60 * 24 * 30

    # Expiration time for negative transcript<->protein links from the NCBI (in
    # seconds).
    NEGATIVE_PROTEIN_LINK_EXPIRATION = 60 * 60 * 24 * 5

Added default setting:

    # Cache expiration time for negative transcript<->protein links from the NCBI
    # (in seconds).
    NEGATIVE_LINK_CACHE_EXPIRATION = 60 * 60 * 24 * 30

473c732c

Oct 13, 2015
- Add tests for ncbi module · 44cd2b31
  Vermaat authored 9 years ago
  
  44cd2b31
- Disable network in unit tests · 95c893c8
  Vermaat authored 9 years ago
  
  95c893c8
- Refactor unit tests using common py.test layout and fixtures · d94f20cf
  Vermaat authored 9 years ago
  
  d94f20cf
Oct 10, 2015
- Use MUTALYZER_TEST_REDIS_URI in unit tests · da6b5839
  Vermaat authored 9 years ago
  
  da6b5839
Oct 01, 2015
- Test the migrations on database content · fe532841
  Vermaat authored 9 years ago
  
  fe532841
Sep 30, 2015
- Test database migrations · 141aa09e
  Vermaat authored 9 years ago
  
  141aa09e
Sep 27, 2015
- Test querying transcript-protein links · 45f7d276
  Vermaat authored 9 years ago
  
  45f7d276
Sep 23, 2015

Show diff for variant protein from non-reference start codon · 3c98a1af

Vermaat authored 9 years ago

The alternative variant protein sequence translated from a
non-reference start codon (created by the variant), was not
color-diffed as normal variant protein sequences are.

In the process we also rename the `oldprotein` and `newprotein`
fields in the output object to `oldProtein` and `newProtein` to
be more consistent with other field names.

3c98a1af

Visualise protein change, also with alternative start · 851e71fe

Vermaat authored 9 years ago

In the case of an alternative start codon (in the reference CDS),
protein changes were not visualised. This is fixed and a WALTSTART
warning is also issued.

Also, if a new non-reference start codon is created by the variant,
visualise this as such.

851e71fe

Translate alternative start to M, also in variant · ae70ddfd

Vermaat authored 9 years ago

In case of an alternative start codon, the variant CDS was not
translated to a protein starting with M. This caused the protein
description machinery to conclude a variant affecting the start
codon, hence reporting `p.?`.

We fix this by always translating the start codon to M (except
when the variant actually affects it).

Example: `NM_024426.4:c.1107A>G` (a synomymous mutation) should
yield `NM_024426.4(WT1_i001):p.(=)`, not `p.?`. The start codon
for that protein is `CTG`.

ae70ddfd

Aug 10, 2015
- Customizable database connection uri for unit tests · 36626215
  Vermaat authored 9 years ago
  
  36626215
Aug 04, 2015
- Fix bug in recognizing p.(=) · 6435f0cf
  Vermaat authored 9 years ago
  
  6435f0cf
Jul 15, 2015

Uncertain stop codon in protein descriptions (fs and ext) · d2f91690

Vermaat authored 9 years ago

When a variant results in a frame shift or extension and we don't
see a new stop codon in the RNA, the protein description should use
the notation for an uncertain stop codon, e.g., `p.(Gln730Profs*?)`
instead of `p.(Gln730Profs*96)` where 96 is just the last codon in
our transcript [1].

To detect this, we now use `to_stop=False` in our `.translate()`
calls, since that will explicitely return `*` characters for stop
codons.

We also slightly fix the coloring of changes in the protein sequence
where previously changed stop codon characters where not included.

[1] http://www.hgvs.org/mutnomen/FAQ.html#nostop

d2f91690

Jul 09, 2015
- Fix cache fixture in tests · f1e57a13
  Vermaat authored 9 years ago
  
  f1e57a13
- Convert DNA to uppercase when reading from plain text · 93159a0e
  Vermaat authored 9 years ago
  
  93159a0e
Jul 03, 2015

Use chardet instead of cchardet · dedad241

Vermaat authored 9 years ago

Issue #50 showed a problem in our file encoding detection, caused
by our cut-off for the confidence as reported by the cchardet [1]
library:

    >>> import cchardet
    >>> s = u'NM_000052.4:c.2407\u20132A>G'
    >>> b = s.encode('WINDOWS-1252')
    >>> cchardet.detect(b)
    {'confidence': 0.5, 'encoding': u'WINDOWS-1252'}

We require a confidence stictly greater than 0.5 and default to
UTF8 otherwise.

If, however, we try the same thing using the chardet [2] library,
we get a higher confidence for the same string:

    >>> import chardet
    >>> chardet.detect(b)
    {'confidence': 0.73, 'encoding': 'windows-1252'}

So the two obvious ways to solve this are:

1. Lower the confidence threshold.
2. Use chardet instead of cchardet.

We implement the second solution here, since it also removes a C
library dependency and we are not worried by performance.

Of course the detected encoding remains a guess which can still
be wrong!

[1] https://github.com/PyYoshi/cChardet
[2] https://github.com/chardet/chardet

Fixes #50

dedad241

May 31, 2015
- Configurable maximum input length for description extractor · ee390387
  Vermaat authored 9 years ago
  
  Adds a `EXTRACTOR_MAX_INPUT_LENGTH` configuration setting, defaulting to 50 Kbp.
  ee390387
May 18, 2015
- New description extractor web interface · 55d10b82
  Jeroen F.J. Laros authored 9 years ago and Vermaat committed 9 years ago
  
  We can now compare two sequences by supplying their sequence strings, accession numbers, or uploaded file.
  55d10b82
May 01, 2015
- Fix descriptionExtract webservice · 7d7cb6af
  Vermaat authored 9 years ago
  
  7d7cb6af
Apr 30, 2015
- Moved describe functionality to the extractor package. · 6c64e5ee
  Jeroen F.J. Laros authored 9 years ago and Vermaat committed 9 years ago
  
  6c64e5ee
- PEP8. · 57c55d0f
  Jeroen F.J. Laros authored 9 years ago and Vermaat committed 9 years ago
  
  57c55d0f
- Integrated the description extractor in the website. · 216146bb
  Laros authored 10 years ago and Vermaat committed 9 years ago
  
  216146bb
- Some more refactoring. · 2db722ff
  Laros authored 10 years ago and Vermaat committed 9 years ago
  
  2db722ff
- Fixed empty allele bug. · 52724cc8
  Laros authored 10 years ago and Vermaat committed 9 years ago
  
  52724cc8
- Fixed erroneous unit tests. · b0d85531
  Laros authored 10 years ago and Vermaat committed 9 years ago
  
  b0d85531
- Made the inserted and deleted sequences uniform. · 49534102
  Laros authored 10 years ago and Vermaat committed 9 years ago
  
  49534102
- Checked the generated positions. · 036fc241
  Laros authored 10 years ago and Vermaat committed 9 years ago
  
  036fc241
- Use new extract package for the description extractor · 534a41fe
  Vermaat authored 11 years ago
  
  This is a work in progress as there still seem to be some bugs. For example, some unit tests fail due to incorrect descriptions generated and others fail due to a crash.
  534a41fe
- Add some JSON and SOAP service tests · 100f53b2
  Vermaat authored 9 years ago
  
  100f53b2
Jan 30, 2015

Discard incomplete genes in genbank reference files · 73c0862f

Vermaat authored 10 years ago

Many genbank reference files contain more than one gene, especially
slices from an assembly. Some of these genes may be incomplete in
the reference file (i.e., either start or end exceeds the outer
coordinates). We cannot really do anything with these genes, so we
discard them during parsing.

73c0862f

Fix broken DMD reference in unit tests · 51d8cc50
Vermaat authored 10 years ago

51d8cc50

Add getGeneLocation webservice method · e06452a1

Vermaat authored 10 years ago

Given a gene symbol and optional genome build, this returns the location
of the gene.

Primary motivation for this is LOVD, where it will be used in combination
with sliceChromsome as an alternative for sliceChromosomeByGene which only
works on the fixed Ensembl genome build.

e06452a1

Nov 24, 2014
- Fix form buttons and general language issues · 9e6ca731
  Vermaat authored 10 years ago
  
  9e6ca731
- Many fixes in templates · 5fc78480
  Vermaat authored 10 years ago
  
  5fc78480