- 08 Nov, 2021 1 commit
-
-
Mihai authored
* Update GitHub links * Clean the about page * Update dependencies
-
- 04 Dec, 2018 1 commit
-
-
Mihai authored
-
- 08 Nov, 2016 1 commit
-
-
Laros authored
-
- 25 May, 2016 1 commit
-
-
Vermaat authored
This is not perfect yet, but a slight improvement for input variants of a type we don't support. Fixes for example #375
-
- 23 May, 2016 1 commit
-
-
Vermaat authored
-
- 11 Mar, 2016 1 commit
-
-
Vermaat authored
Position Converter should error when no gene or transcript is specified on accession numbers where this is required. Fixes #190
-
- 23 Feb, 2016 2 commits
- 22 Feb, 2016 2 commits
-
-
Vermaat authored
-
Vermaat authored
Note that we explicitely only support LRG references as transcripts, so using c. positioning to convert to/from chromosomal positioning. Supporting LRG references as genomic referenes, so using g. positioning can be future work but converting them to/from LRG transcripts is of course already done by the name checker. Converting between genomic LRG positioning and chromosomal positioning directly is not something that can be easily supported in the current setup of the position converter.
-
- 29 Oct, 2015 1 commit
-
-
Vermaat authored
This speeds up lookup of transcript mappings by genomic position a lot. By filtering on bin index, such a query now uses the index on the bin column, where previously this would involve a sequential table scan. http://interval-binning.readthedocs.org/
-
- 24 Sep, 2015 1 commit
-
-
Vermaat authored
-
- 20 Jul, 2015 1 commit
-
-
Vermaat authored
For transcripts without any UTR and CDS entries in the NCBI Mapview file (seems to happen for predicted genes), we generate one exon spanning the entire transcript.
-
- 22 Oct, 2014 1 commit
-
-
Vermaat authored
Fixes #9
-
- 21 Oct, 2014 1 commit
-
-
Vermaat authored
-
- 20 Oct, 2014 1 commit
-
-
Vermaat authored
Don't fix what ain't broken. Unfortunately, string handling in Mutalyzer really is broken. So we fix it. Internally, all strings should be represented by unicode strings as much as possible. The main exception are large reference sequence strings. These can often better be BioPython sequence objects, since that is how we usually get them in the first place. These changes will hopefully make Mutalyzer more reliable in working with incoming data. As a bonus, they're a first (small) step towards Python 3 compatibility [1]. Our strategy is as follows: 1. We use `from __future__ import unicode_literals` at the top of every file. 2. All incoming strings are decoded to unicode (if necessary) as soon as possible. 3. Outgoing strings are encoded to UTF8 (if necessary) as late as possible. 4. BioPython sequence objects can be based on byte strings as well as unicode strings. 5. In the database, everything is UTF8. 6. We worry about uploaded and downloaded reference files and batch jobs in a later commit. Point 1 will ensure that all string literals in our source code will be unicode strings [2]. As for point 4, sometimes this may even change under our eyes (e.g., calling `.reverse_complement()` will change it to a byte string). We don't care as long as they're BioPython objects, only when we get the sequence out we must have it as unicode string. Their contents are always in the ASCII range anyway. Although `Bio.Seq.reverse_complement` works fine on Python byte strings (and we used to rely on that), it crashes on a Python unicode string. So we take care to only use it on BioPython sequence objects and wrote our own reverse complement function for unicode strings (`mutalyzer.util.reverse_complement`). As for point 5, SQLAlchemy already does a very good job at presenting decoding from and encoding to UTF8 for us. The Spyne documentation has the following to say about their `String` and `Unicode` types [3]: > There are two string types in Spyne: `spyne.model.primitive.Unicode` and > `spyne.model.primitive.String` whose native types are `unicode` and `str` > respectively. > > Unlike the Python `str`, the Spyne `String` is not for arbitrary byte > streams. You should not use it unless you are absolutely, positively sure > that you need to deal with text data with an unknown encoding. In all other > cases, you should just use the `Unicode` type. They actually look the same > from outside, this distinction is made just to properly deal with the quirks > surrounding Python-2's `unicode` type. > > Remember that you have the `ByteArray` and `File` types at your disposal > when you need to deal with arbitrary byte streams. > > The `String` type will be just an alias for `Unicode` once Spyne gets ported > to Python 3. It might even be deprecated and removed in the future, so make > sure you are using either `Unicode` or `ByteArray` in your interface > definitions. So let's not ignore that and never use `String` anymore in our webservice interface. For the command line interface it's a bit more complicated, since there seems to be no reliable way to get the encoding of command line arguments. We use `sys.stdin.encoding` as a best guess. For us to interpret a sequence of bytes as text, it's key to be aware of their encoding. Once decoded, a text string can be safely used without having to worry about bytes. Without unicode we're nothing, and nothing will help us. Maybe we're lying, then you better not stay. But we could be safer, just for one day. Oh-oh-oh-ohh, oh-oh-oh-ohh, just for one day. [1] https://docs.python.org/2.7/howto/pyporting.html [2] http://python-future.org/unicode_literals.html [3] http://spyne.io/docs/2.10/manual/03_types.html#strings
-
- 13 May, 2014 1 commit
-
-
Vermaat authored
-
- 28 Feb, 2014 1 commit
-
-
Vermaat authored
The name checker supports ranges in insertions and insertion- deletions, for example `3_4ins8_12`, and compound insertions and insertion-deletions, for example `3_4ins[ATC;8_12]`. The inserted sequences are accepted and concatenated before any further processing, so reported descriptions show only the concatenated sequences. The support for ranges is limited to genomic descriptions. The position converter supports compound insertions and insertion-deletions, not ranges. Compound insertions and insertion-deletions are not part of the current HGVS nomenclature, but will be proposed.
-
- 17 Feb, 2014 1 commit
-
-
Vermaat authored
Also, the value for nuclear chromosomes is now `nucleus` instead of `chromosome` for better alignment with the other value `mitochondrion`. Note that I did not bother to make an Alembic migration for this, since we don't have any installations besides my own yet anyway.
-
- 25 Jan, 2014 1 commit
-
-
Vermaat authored
-
- 16 Jan, 2014 1 commit
-
-
Vermaat authored
-
- 10 Jan, 2014 2 commits
-
-
Vermaat authored
Now that we ported the database to SQLAlchemy, we remove the obsolete Db module and all references to it.
-
Vermaat authored
This introduces a proper notion of genome assemblies. Transcript mappings for alle genome assemblies are in the same database, which is better for maintenance. Updating transcript mappings is also simplified a lot, especially from NCBI mapview files where we now require a preprocessing sort on the input file. Overall, this port touches a lot of Mutalyzer code, so beware.
-
- 04 Jan, 2014 1 commit
-
-
Vermaat authored
-
- 23 Dec, 2013 1 commit
-
-
Vermaat authored
-
- 14 Jan, 2013 1 commit
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@664 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 14 Nov, 2012 2 commits
-
-
Vermaat authored
Keep organelle type ('chromosome' or 'mitochondrion') in chromosome database table and use it to choose between g. and m. positioning. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@638 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
Vermaat authored
Support genomic references in the mapping database. At the moment, this is only tested with mtDNA genes, but should in clear the way for NG_ mappings as well. Mappings for mtDNA genes can be added to the database using the command line tool mutalyzer-mapping-import. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@635 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 08 Nov, 2012 1 commit
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@630 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 12 Jul, 2012 2 commits
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@573 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@572 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 11 Jul, 2012 1 commit
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@570 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 11 May, 2012 1 commit
-
-
Vermaat authored
This fixes Trac bug #95. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@524 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 01 Mar, 2012 1 commit
-
-
Vermaat authored
During import of the NCBI transcript mappings, CDS start or stop positions were not picked up for some transcripts (where these are on exon boundaries). Bug reported by S Venkata Suresh Kumar. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@490 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 30 Jan, 2012 1 commit
-
-
Vermaat authored
Some genes (e.g. in the PAR) are mapped on both the X and Y chromosomes, but are (apart from the chromosome names) indistinguishable from transcripts that are mapped using different contigs. Transcripts of the latter type should be merged, those of the former type should not be merged. Our fix consists of only including exons where positions are consistent with the transcript mapping and allowing transcripts to be mapped more than once, but only to two different chromosomes. This fixes #82. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@467 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 27 Jan, 2012 1 commit
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@463 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 24 Nov, 2011 2 commits
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/browser-link-branch@423 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/browser-link-branch@422 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 23 Nov, 2011 1 commit
-
-
Vermaat authored
git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@420 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-
- 04 Nov, 2011 1 commit
-
-
Vermaat authored
The old way of using the configuration file was by instantiating a Config object which read the file. This instance was passed to every function and object that might need it. The new way is by simply calling config.get('name') to get the configuration value for 'name'. This lazily reads the configuration file and the contents are cached for future calls. git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/branches/implicit-config-branch@408 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
-