- Oct 20, 2014
-
-
Vermaat authored
Don't fix what ain't broken. Unfortunately, string handling in Mutalyzer really is broken. So we fix it. Internally, all strings should be represented by unicode strings as much as possible. The main exception are large reference sequence strings. These can often better be BioPython sequence objects, since that is how we usually get them in the first place. These changes will hopefully make Mutalyzer more reliable in working with incoming data. As a bonus, they're a first (small) step towards Python 3 compatibility [1]. Our strategy is as follows: 1. We use `from __future__ import unicode_literals` at the top of every file. 2. All incoming strings are decoded to unicode (if necessary) as soon as possible. 3. Outgoing strings are encoded to UTF8 (if necessary) as late as possible. 4. BioPython sequence objects can be based on byte strings as well as unicode strings. 5. In the database, everything is UTF8. 6. We worry about uploaded and downloaded reference files and batch jobs in a later commit. Point 1 will ensure that all string literals in our source code will be unicode strings [2]. As for point 4, sometimes this may even change under our eyes (e.g., calling `.reverse_complement()` will change it to a byte string). We don't care as long as they're BioPython objects, only when we get the sequence out we must have it as unicode string. Their contents are always in the ASCII range anyway. Although `Bio.Seq.reverse_complement` works fine on Python byte strings (and we used to rely on that), it crashes on a Python unicode string. So we take care to only use it on BioPython sequence objects and wrote our own reverse complement function for unicode strings (`mutalyzer.util.reverse_complement`). As for point 5, SQLAlchemy already does a very good job at presenting decoding from and encoding to UTF8 for us. The Spyne documentation has the following to say about their `String` and `Unicode` types [3]: > There are two string types in Spyne: `spyne.model.primitive.Unicode` and > `spyne.model.primitive.String` whose native types are `unicode` and `str` > respectively. > > Unlike the Python `str`, the Spyne `String` is not for arbitrary byte > streams. You should not use it unless you are absolutely, positively sure > that you need to deal with text data with an unknown encoding. In all other > cases, you should just use the `Unicode` type. They actually look the same > from outside, this distinction is made just to properly deal with the quirks > surrounding Python-2's `unicode` type. > > Remember that you have the `ByteArray` and `File` types at your disposal > when you need to deal with arbitrary byte streams. > > The `String` type will be just an alias for `Unicode` once Spyne gets ported > to Python 3. It might even be deprecated and removed in the future, so make > sure you are using either `Unicode` or `ByteArray` in your interface > definitions. So let's not ignore that and never use `String` anymore in our webservice interface. For the command line interface it's a bit more complicated, since there seems to be no reliable way to get the encoding of command line arguments. We use `sys.stdin.encoding` as a best guess. For us to interpret a sequence of bytes as text, it's key to be aware of their encoding. Once decoded, a text string can be safely used without having to worry about bytes. Without unicode we're nothing, and nothing will help us. Maybe we're lying, then you better not stay. But we could be safer, just for one day. Oh-oh-oh-ohh, oh-oh-oh-ohh, just for one day. [1] https://docs.python.org/2.7/howto/pyporting.html [2] http://python-future.org/unicode_literals.html [3] http://spyne.io/docs/2.10/manual/03_types.html#strings
-
- Feb 12, 2013
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@668 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jan 14, 2013
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@662 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Oct 05, 2012
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@621 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Aug 21, 2012
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@601 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Aug 20, 2012
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@600 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Aug 04, 2012
-
-
Laros authored
rpc.py: - Added the function descriptionExtract(). - Standardised indentation. models.py: - Added a RawVar and an Allele class for the webservices. describe.py: - Made the RawVar class a child of models.RawVar. This is convenient for webservices since we can simply return this object. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@591 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jul 16, 2012
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@574 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jul 12, 2012
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@571 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Feb 20, 2012
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@483 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Feb 18, 2012
-
-
Laros authored
describe.py: - Module that provides the Variant Description Extractor functions. __init__.py: - Added an automated copyright year update. website.py: - Added the Variant Description Extractor web interface. templates/descriptionExtract.html: - Template page for the Variant Description Extractor. templates/snp.html: templates/menu.html: templates/converter.html: templates/index.html: templates/parse.html: - Cosmetic changes. Added a presentation. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@479 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Feb 01, 2012
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@476 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jan 31, 2012
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@471 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jan 30, 2012
-
-
Laros authored
- Fixed the bug in the insertions / duplications code. - Added documentation and pointers on how to integrate this as a module. - Added a wrapper for the recursive function. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@469 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@468 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jan 29, 2012
-
-
Laros authored
description of the stored raw variant. - Made a function that returns an allele description given a list of raw variants. - Removed the hgvs member variable from RawVar(). - Removed the HGVS description generation from DNA_description(). - Started refacoring the DNA_description() function. TODO: rolling for insertions and duplications is not working properly. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@466 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
Laros authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@465 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jan 28, 2012
-
-
Laros authored
as a list of objects. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@464 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jan 26, 2012
-
-
Vermaat authored
For use by LOVD (NGS variants import, Jerry Hoogenboom), implement the SOAP webservice method getTranscriptsMapping. The calling signature is the same as for the existing getTranscriptsRange method, but the new method returns more than just the transcript names. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@451 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jan 22, 2012
-
-
Laros authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@442 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jan 15, 2012
-
-
Laros authored
the length of the longest common substring in two sequences. - Changed the splitting on the longest common substring for direct slicing based on the new return values of LongestCommonSubstring(). - Added the removal of the lcs from the leftVariant and the lcp from the rightVariant in the inversion part of the DNA_description() function. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@441 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jan 13, 2012
-
-
Laros authored
sequences. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@440 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Sep 07, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@352 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@351 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Sep 05, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@346 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@345 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Aug 19, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@333 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jul 27, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/gbinfo-sync-branch@319 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jul 26, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/gbinfo-sync-branch@316 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jul 25, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/gbinfo-sync-branch@314 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jul 21, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/gbinfo-sync-branch@313 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/gbinfo-sync-branch@312 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Jul 12, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@295 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Apr 12, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@268 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Apr 08, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@263 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- Apr 05, 2011
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/refactor-mutalyzer-branch@246 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-