Commits · d94f20cf326b4c9116e8ea672298621e6f96d36b · Mirrors / mutalyzer

Oct 13, 2015
- Refactor unit tests using common py.test layout and fixtures · d94f20cf
  Vermaat authored 9 years ago
  
  d94f20cf
Sep 23, 2015

Show diff for variant protein from non-reference start codon · 3c98a1af

Vermaat authored 9 years ago

The alternative variant protein sequence translated from a
non-reference start codon (created by the variant), was not
color-diffed as normal variant protein sequences are.

In the process we also rename the `oldprotein` and `newprotein`
fields in the output object to `oldProtein` and `newProtein` to
be more consistent with other field names.

3c98a1af

Visualise protein change, also with alternative start · 851e71fe

Vermaat authored 9 years ago

In the case of an alternative start codon (in the reference CDS),
protein changes were not visualised. This is fixed and a WALTSTART
warning is also issued.

Also, if a new non-reference start codon is created by the variant,
visualise this as such.

851e71fe

Translate alternative start to M, also in variant · ae70ddfd

Vermaat authored 9 years ago

In case of an alternative start codon, the variant CDS was not
translated to a protein starting with M. This caused the protein
description machinery to conclude a variant affecting the start
codon, hence reporting `p.?`.

We fix this by always translating the start codon to M (except
when the variant actually affects it).

Example: `NM_024426.4:c.1107A>G` (a synomymous mutation) should
yield `NM_024426.4(WT1_i001):p.(=)`, not `p.?`. The start codon
for that protein is `CTG`.

ae70ddfd

Aug 04, 2015
- Fix bug in recognizing p.(=) · 6435f0cf
  Vermaat authored 9 years ago
  
  6435f0cf
Jul 15, 2015

Uncertain stop codon in protein descriptions (fs and ext) · d2f91690

Vermaat authored 9 years ago

When a variant results in a frame shift or extension and we don't
see a new stop codon in the RNA, the protein description should use
the notation for an uncertain stop codon, e.g., `p.(Gln730Profs*?)`
instead of `p.(Gln730Profs*96)` where 96 is just the last codon in
our transcript [1].

To detect this, we now use `to_stop=False` in our `.translate()`
calls, since that will explicitely return `*` characters for stop
codons.

We also slightly fix the coloring of changes in the protein sequence
where previously changed stop codon characters where not included.

[1] http://www.hgvs.org/mutnomen/FAQ.html#nostop

d2f91690

Jul 09, 2015
- Fix cache fixture in tests · f1e57a13
  Vermaat authored 9 years ago
  
  f1e57a13
Jan 30, 2015

Discard incomplete genes in genbank reference files · 73c0862f

Vermaat authored 10 years ago

Many genbank reference files contain more than one gene, especially
slices from an assembly. Some of these genes may be incomplete in
the reference file (i.e., either start or end exceeds the outer
coordinates). We cannot really do anything with these genes, so we
discard them during parsing.

73c0862f

Fix broken DMD reference in unit tests · 51d8cc50
Vermaat authored 10 years ago

51d8cc50

Oct 20, 2014

Use unicode strings · 2a4dc3c1

Vermaat authored 10 years ago

Don't fix what ain't broken. Unfortunately, string handling in Mutalyzer
really is broken. So we fix it.

Internally, all strings should be represented by unicode strings as much as
possible. The main exception are large reference sequence strings. These can
often better be BioPython sequence objects, since that is how we usually get
them in the first place.

These changes will hopefully make Mutalyzer more reliable in working with
incoming data. As a bonus, they're a first (small) step towards Python 3
compatibility [1].

Our strategy is as follows:

1. We use `from __future__ import unicode_literals` at the top of every file.
2. All incoming strings are decoded to unicode (if necessary) as soon as
   possible.
3. Outgoing strings are encoded to UTF8 (if necessary) as late as possible.
4. BioPython sequence objects can be based on byte strings as well as unicode
   strings.
5. In the database, everything is UTF8.
6. We worry about uploaded and downloaded reference files and batch jobs in a
   later commit.

Point 1 will ensure that all string literals in our source code will be
unicode strings [2].

As for point 4, sometimes this may even change under our eyes (e.g., calling
`.reverse_complement()` will change it to a byte string). We don't care as
long as they're BioPython objects, only when we get the sequence out we must
have it as unicode string. Their contents are always in the ASCII range
anyway.

Although `Bio.Seq.reverse_complement` works fine on Python byte strings (and
we used to rely on that), it crashes on a Python unicode string. So we take
care to only use it on BioPython sequence objects and wrote our own reverse
complement function for unicode strings (`mutalyzer.util.reverse_complement`).

As for point 5, SQLAlchemy already does a very good job at presenting decoding
from and encoding to UTF8 for us.

The Spyne documentation has the following to say about their `String` and
`Unicode` types [3]:

> There are two string types in Spyne: `spyne.model.primitive.Unicode` and
> `spyne.model.primitive.String` whose native types are `unicode` and `str`
> respectively.
>
> Unlike the Python `str`, the Spyne `String` is not for arbitrary byte
> streams. You should not use it unless you are absolutely, positively sure
> that you need to deal with text data with an unknown encoding. In all other
> cases, you should just use the `Unicode` type. They actually look the same
> from outside, this distinction is made just to properly deal with the quirks
> surrounding Python-2's `unicode` type.
>
> Remember that you have the `ByteArray` and `File` types at your disposal
> when you need to deal with arbitrary byte streams.
>
> The `String` type will be just an alias for `Unicode` once Spyne gets ported
> to Python 3. It might even be deprecated and removed in the future, so make
> sure you are using either `Unicode` or `ByteArray` in your interface
> definitions.

So let's not ignore that and never use `String` anymore in our webservice
interface.

For the command line interface it's a bit more complicated, since there seems
to be no reliable way to get the encoding of command line arguments. We use
`sys.stdin.encoding` as a best guess.

For us to interpret a sequence of bytes as text, it's key to be aware of their
encoding. Once decoded, a text string can be safely used without having to
worry about bytes. Without unicode we're nothing, and nothing will help
us. Maybe we're lying, then you better not stay. But we could be safer, just
for one day. Oh-oh-oh-ohh, oh-oh-oh-ohh, just for one day.

[1] https://docs.python.org/2.7/howto/pyporting.html
[2] http://python-future.org/unicode_literals.html
[3] http://spyne.io/docs/2.10/manual/03_types.html#strings

2a4dc3c1

Aug 27, 2014
- Move from nose to pytest for unit tests · e6f19d1c
  Vermaat authored 10 years ago
  
  See http://pytest.org/
  e6f19d1c
Mar 01, 2014

Reverse complement range insertions/insertion-deletions · 57120a89

Vermaat authored 11 years ago

The name checker supports reverse complement ranges in insertions
and insertions-deletions, for example `3_4ins8_12inv'.

Reverse complement range insertions and insertion-deletions are not
part of the current HGVS nomenclature, but will be proposed.

57120a89

Feb 28, 2014

Range and compound insertions/insertion-deletions · 31b2f13a

Vermaat authored 11 years ago

The name checker supports ranges in insertions and insertion-
deletions, for example `3_4ins8_12`, and compound insertions and
insertion-deletions, for example `3_4ins[ATC;8_12]`.
The inserted sequences are accepted and concatenated before any
further processing, so reported descriptions show only the
concatenated sequences.
The support for ranges is limited to genomic descriptions.

The position converter supports compound insertions and
insertion-deletions, not ranges.

Compound insertions and insertion-deletions are not part of the
current HGVS nomenclature, but will be proposed.

31b2f13a

Jan 22, 2014

Use fixtures in the unit tests · c49d49f0

Vermaat authored 11 years ago

This is The Good Stuff. The entire test suite can now be run without
having to setup a database, running the batch checker, any of the web
services or the website. It even passes without an internet connection.
In, like, 30 seconds! Awesome!

This means tests don't randomly fail after some reference sequence
changes on the NCBI server and it doesn't take an entire configured
server with mapping database setup to run the tests. Those are things
of the past! No more frustrations, Mutalyzer is testable!

Going down now...

The mountain screamed three times today
I guess it thought it'd like to play
How much does one have to pay
To fry a peak and melt away
Launching titan's breath on mine
The sweating measure lands on time

And the old man, down by the river
Well he walks up and he walks on down
To the spaceship that's parked at your doorstep
And it's waiting to take you away now

Goin' down now
Goin' down now

Looking for the rate that crowed
He's hooked up down in Mexico
Slap my nerve now give me more
It's my disaster friend, not yours

And the old man, down by the river
Well he walks up and he walks on down
To the spaceship that's parked at your doorstep
And it's waiting to take you away now

And the last one, it's down by the river
Where he gets up and he walks on down
To the spaceship that's parked at your doorstep
And it's waiting to take you away now

It's down by the river, it's always this way now
It's down by the river, it's always this way now

Going down now
Going down now
now, now, now

down, down, down

c49d49f0

Jan 04, 2014
- Some fixes for running the unit tests · f2a6cc59
  Vermaat authored 11 years ago
  
  f2a6cc59
- Temporarily skip tests using AL449423.14 (no longer valid) · 323a8be1
  Vermaat authored 11 years ago
  
  323a8be1
Dec 23, 2013

Fix unit tests with SQLAlchemy · 94df7c07

Vermaat authored 11 years ago

This involves making the SQLAlchemy session reconfigurable at run-time,
which is done automatically on updating the Mutalyzer configuration using
configuration update callbacks.

94df7c07

Dec 19, 2013
- Update unit tests for new style configuration · 135866c3
  Vermaat authored 11 years ago
  
  135866c3
Feb 25, 2013

Warning on non-adjacent exons in transcript reference · bd3ce1d1

Vermaat authored 12 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@671 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

bd3ce1d1

Feb 12, 2013

Use WNOMRNA_OTHER when appropriate (fixes #132) · c2b54871

Vermaat authored 12 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@667 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

c2b54871

Dec 11, 2012

Update LRG parser for updated schema (incomplete) · e84bdeb8

Vermaat authored 12 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/lrg-schema-update@654 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

e84bdeb8

Oct 04, 2012

Warn on missing positioning scheme (fixes #114) · 1ec48b68

Vermaat authored 12 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@617 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

1ec48b68

Jul 26, 2012

Unit test for issue #108 · 7e6dd859

Vermaat authored 12 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@590 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

7e6dd859

Jun 21, 2012

Do not crash on inversions (introduced in r528) (fixes #99) · acfb2415

Vermaat authored 12 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@557 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

acfb2415

First download reference in GI unit tests · 061f3d34

Vermaat authored 12 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@556 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

061f3d34

Mar 12, 2012

Use UD slices in unit tests · 451b671d

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@497 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

451b671d

Better descriptions on chromosome · d07b822a

Vermaat authored 13 years ago

For UD slices we also generate g. descriptions on the chromosome reference. We
now also apply the roll rule there and use correct ranges and sequences on the
reverse strand. Fixes #75.

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@495 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

d07b822a

Feb 21, 2012

Fix del with deleted sequence length as argument · 1b7c9c03

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@488 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

1b7c9c03

Jan 31, 2012

Describe NOP variants with = (#88) · 0c5f32ea

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@472 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

0c5f32ea

Unit tests for GI references · 6ead8b65

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@470 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

6ead8b65

Jan 26, 2012

Do not crash on non-numeric locus tag end, fixes #81 · 38f90d91

Vermaat authored 13 years ago


git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@452 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

38f90d91

Jan 25, 2012

Fix LRG reference sequences · 5f77ecaf

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@445 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

5f77ecaf

Dec 14, 2011

Do not crash on EX positioning (fixes #79) · 2809d839

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@435 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

2809d839

Nov 24, 2011

Add strandedness to BED tracks · 325130bb

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/browser-link-branch@423 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

325130bb

Nov 14, 2011

Fix crash on variant without reference (thx Ivar) · 5fdc74c3

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@418 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

5fdc74c3

Nov 08, 2011

Retrieve full sequence for contig reference files · cff8ea36

Vermaat authored 13 years ago

Fixes #74.


git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@410 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

cff8ea36

Nov 04, 2011

Use implicit configuration object creation · ae2ae0c8

Vermaat authored 13 years ago

The old way of using the configuration file was by instantiating a Config
object which read the file. This instance was passed to every function and
object that might need it.

The new way is by simply calling config.get('name') to get the configuration
value for 'name'. This lazily reads the configuration file and the contents
are cached for future calls.

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/implicit-config-branch@408 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

ae2ae0c8

Oct 20, 2011

Implement BED tracks for Genome Browser · 8402dae3

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/browser-link-branch@394 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

8402dae3

Aug 24, 2011

Get only primary assembly mappings in position converter (naive fix for #58). · 11eb9eec

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@338 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

11eb9eec

Aug 18, 2011

Fix a bug in commit r324, allowing ordinary delins again. · 7bc44ba1

Vermaat authored 13 years ago

git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@328 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1

7bc44ba1