Commits · 00e80cddac89a376413fafd0760e9ecb02eb0767 · Mirrors / mutalyzer

Nov 04, 2014
- Typo · 00e80cdd
  Vermaat authored 10 years ago
  
  00e80cdd
- Apply AGPL and CC-by-sa licenses · e2765689
  Vermaat authored 10 years ago
  
  e2765689
Oct 22, 2014
- Update Python dependencies · 7c1947fa
  Vermaat authored 10 years ago
  
  7c1947fa
- Rename GRCh36 to NCBI36 · 8543a5bd
  Vermaat authored 10 years ago
  
  Not sure how this came to be, but NCBI36 was incorrectly named GRCh36. Changing this, however, breaks the sort order in assembly lists. So we now sort on the UCSC alias (hg18). Fixes #8
  8543a5bd
- Fix importing transcript mappings from UCSC database · 9882df54
  Vermaat authored 10 years ago
  
  Fixes #9
  9882df54
Oct 21, 2014
- Keep original remote address in reverse-proxied requests · 7c97ed0d
  Vermaat authored 10 years ago
  
  Fixes #22
  7c97ed0d
- Update changelog · c64561f6
  Vermaat authored 10 years ago
  
  c64561f6
- Don't crash on mail errors in the batch scheduler · cf157575
  Vermaat authored 10 years ago
  
  Fixes #30
  cf157575
- Merge branch 'unicode-strings' into 'master' · a35097e1
  Vermaat authored 10 years ago
  
  See merge request !25
  a35097e1
- Handle encoding for command line file arguments · 63825a47
  Vermaat authored 10 years ago
  
  63825a47
- Unit tests for unicode strings · 66629914
  Vermaat authored 10 years ago
  
  66629914
Oct 20, 2014

Developer documentation on string representations · 8bed539e
Vermaat authored 10 years ago

8bed539e
Correctly handle reference file encodings · 6f5c69bf
Vermaat authored 10 years ago

6f5c69bf
Correctly handle batch job input and output encodings · 8acb0970
Vermaat authored 10 years ago

8acb0970

Vermaat authored 10 years ago

Don't fix what ain't broken. Unfortunately, string handling in Mutalyzer
really is broken. So we fix it.

Internally, all strings should be represented by unicode strings as much as
possible. The main exception are large reference sequence strings. These can
often better be BioPython sequence objects, since that is how we usually get
them in the first place.

These changes will hopefully make Mutalyzer more reliable in working with
incoming data. As a bonus, they're a first (small) step towards Python 3
compatibility [1].

Our strategy is as follows:

1. We use `from __future__ import unicode_literals` at the top of every file.
2. All incoming strings are decoded to unicode (if necessary) as soon as
   possible.
3. Outgoing strings are encoded to UTF8 (if necessary) as late as possible.
4. BioPython sequence objects can be based on byte strings as well as unicode
   strings.
5. In the database, everything is UTF8.
6. We worry about uploaded and downloaded reference files and batch jobs in a
   later commit.

Point 1 will ensure that all string literals in our source code will be
unicode strings [2].

As for point 4, sometimes this may even change under our eyes (e.g., calling
`.reverse_complement()` will change it to a byte string). We don't care as
long as they're BioPython objects, only when we get the sequence out we must
have it as unicode string. Their contents are always in the ASCII range
anyway.

Although `Bio.Seq.reverse_complement` works fine on Python byte strings (and
we used to rely on that), it crashes on a Python unicode string. So we take
care to only use it on BioPython sequence objects and wrote our own reverse
complement function for unicode strings (`mutalyzer.util.reverse_complement`).

As for point 5, SQLAlchemy already does a very good job at presenting decoding
from and encoding to UTF8 for us.

The Spyne documentation has the following to say about their `String` and
`Unicode` types [3]:

> There are two string types in Spyne: `spyne.model.primitive.Unicode` and
> `spyne.model.primitive.String` whose native types are `unicode` and `str`
> respectively.
>
> Unlike the Python `str`, the Spyne `String` is not for arbitrary byte
> streams. You should not use it unless you are absolutely, positively sure
> that you need to deal with text data with an unknown encoding. In all other
> cases, you should just use the `Unicode` type. They actually look the same
> from outside, this distinction is made just to properly deal with the quirks
> surrounding Python-2's `unicode` type.
>
> Remember that you have the `ByteArray` and `File` types at your disposal
> when you need to deal with arbitrary byte streams.
>
> The `String` type will be just an alias for `Unicode` once Spyne gets ported
> to Python 3. It might even be deprecated and removed in the future, so make
> sure you are using either `Unicode` or `ByteArray` in your interface
> definitions.

So let's not ignore that and never use `String` anymore in our webservice
interface.

For the command line interface it's a bit more complicated, since there seems
to be no reliable way to get the encoding of command line arguments. We use
`sys.stdin.encoding` as a best guess.

For us to interpret a sequence of bytes as text, it's key to be aware of their
encoding. Once decoded, a text string can be safely used without having to
worry about bytes. Without unicode we're nothing, and nothing will help
us. Maybe we're lying, then you better not stay. But we could be safer, just
for one day. Oh-oh-oh-ohh, oh-oh-oh-ohh, just for one day.

[1] https://docs.python.org/2.7/howto/pyporting.html
[2] http://python-future.org/unicode_literals.html
[3] http://spyne.io/docs/2.10/manual/03_types.html#strings

2a4dc3c1

Use unicode strings in webservice response definitions · 95d9f52e
Vermaat authored 10 years ago

95d9f52e

Use unicode string arguments in webservice interface definitions · 0e1c2d92

Vermaat authored 10 years ago

This fixes Spyne to not crash on POST requests to the HTTP/RPC+JSON
webservice.

Note that all return values still use byte strings. Changing those will
touch a larger part of the codebase, and will be done in another commit.

As per [1]:

> Unlike the Python str, the Spyne String is not for arbitrary byte
> streams. You should not use it unless you are absolutely, positively
> sure that you need to deal with text data with an unknown encoding. In
> all other cases, you should just use the Unicode type. They actually
> look the same from outside, this distinction is made just to properly
> deal with the quirks surrounding Python-2’s unicode type.
>
> Remember that you have the ByteArray and File types at your disposal
> when you need to deal with arbitrary byte streams.
>
> The String type will be just an alias for Unicode once Spyne gets
> ported to Python 3. It might even be deprecated and removed in the
> future, so make sure you are using either Unicode or ByteArray in your
> interface definitions.

[1] http://spyne.io/docs/2.10/manual/03_types.html#strings

0e1c2d92

Open development for 2.0.4 · d299dbe0
Vermaat authored 10 years ago

d299dbe0
Bump version to 2.0.3 · ec73cf85
Vermaat authored 10 years ago

View commits for tag v2.0.3 v2.0.3

ec73cf85
Update changelog · 04c73405
Vermaat authored 10 years ago

04c73405

Oct 15, 2014

Fix several error cases in LOVD2 getGS call · bcef1633

Vermaat authored 10 years ago

The `getGS` website view for LOVD2 would report "transcript not found" if
the genomic reference has multiple transcripts annotated or if the variant
description raises an error in the variant checker.

bcef1633

Oct 09, 2014
- Mirror magic-python to be more secure · ea86a42c
  Vermaat authored 10 years ago
  
  ea86a42c
- Open development for 2.0.3 · 54d419da
  Vermaat authored 10 years ago
  
  54d419da
- Bump version to 2.0.2 · a1c5efaf
  Vermaat authored 10 years ago
  
  View commits for tag v2.0.2 v2.0.2
  
  a1c5efaf
- Update changelog · e7e9260b
  Vermaat authored 10 years ago
  
  e7e9260b
Oct 08, 2014
- Fix GRCm38 chromosome accession number versions · 542e61b7
  Vermaat authored 10 years ago
  
  542e61b7
Oct 04, 2014
- Fix sync of local cache with remote cache · 5ca4d216
  Vermaat authored 10 years ago
  
  5ca4d216
- Fix crash in position converter batch job · 55ca04e1
  Vermaat authored 10 years ago
  
  Fixes Trac#174
  55ca04e1
Oct 03, 2014
- Update links to issues in changelog · 310f5b07
  Vermaat authored 10 years ago
  
  310f5b07
Oct 02, 2014

Remove old announcement before setting new one · f72b678f

Vermaat authored 10 years ago

This prevents the case where the old announcement had a url set and the
new one does not (Redis would keep the existing url).

f72b678f

Sep 27, 2014

Upgrade Spyne from 2.10.10 to 2.11.0 · 8193dfd2

Vermaat authored 10 years ago

This fixes uploading base64 encoded data to the JSON webservice. For
example:

    echo "NM_003002.2:c.274delT\nXXX:g.1del" | base64 > test.base64
    curl \
      -d 'process=SyntaxChecker' \
      -d 'argument=hg19' \
      --data-urlencode 'data@test.base64' \
      'http://127.0.0.1:8082/submitBatchJob'

8193dfd2

Fix typo in model representation · d526dd67
Vermaat authored 10 years ago

d526dd67
Open development for 2.0.2 · 8fda3e51
Vermaat authored 10 years ago

8fda3e51
Bump version to 2.0.1 · e2c2a69c
Vermaat authored 10 years ago

View commits for tag v2.0.1 v2.0.1

e2c2a69c

Fix POST requests to the HTTP/RPC+JSON webservice · 5849fd76

Vermaat authored 10 years ago

Upstream Spyne crashes on POST requests to the HTTP/RPC+JSON webservice.
We patched it in a rather hacky way.

This was a regression from the old codebase, where we installed Spyne
separately from our LUMC GitHub mirror. This is now also referenced in
the requirements.txt file.

Thanks to Ken Doig for reporting the issue.

5849fd76

Sep 26, 2014
- Open development for 2.0.1 · deb4629c
  Vermaat authored 10 years ago
  
  deb4629c
- Bump version to 2.0.0 · c7e8179b
  Vermaat authored 10 years ago
  
  View commits for tag v2.0.0 v2.0.0
  
  c7e8179b
- Update changelog · f665c1ad
  Vermaat authored 10 years ago
  
  f665c1ad
- Fix unit test for renaming in parent commit · ae685116
  Vermaat authored 10 years ago
  
  ae685116
Sep 23, 2014

Rename upLoadGenBankLocalFile to uploadGenBankLocalFile · 3a7a0e1a

Vermaat authored 10 years ago

Rename this webservice method. Note the capital letter L in the old
name. Also add a short note to the documentation that data arguments
must be base64 encoded.

3a7a0e1a