Skip to content
Snippets Groups Projects
  1. May 18, 2015
  2. May 01, 2015
  3. Apr 30, 2015
  4. Jan 30, 2015
    • Vermaat's avatar
      Discard incomplete genes in genbank reference files · 73c0862f
      Vermaat authored
      Many genbank reference files contain more than one gene, especially
      slices from an assembly. Some of these genes may be incomplete in
      the reference file (i.e., either start or end exceeds the outer
      coordinates). We cannot really do anything with these genes, so we
      discard them during parsing.
      73c0862f
    • Vermaat's avatar
      Fix broken DMD reference in unit tests · 51d8cc50
      Vermaat authored
      51d8cc50
    • Vermaat's avatar
      Add getGeneLocation webservice method · e06452a1
      Vermaat authored
      Given a gene symbol and optional genome build, this returns the location
      of the gene.
      
      Primary motivation for this is LOVD, where it will be used in combination
      with sliceChromsome as an alternative for sliceChromosomeByGene which only
      works on the fixed Ensembl genome build.
      e06452a1
  5. Nov 24, 2014
  6. Oct 21, 2014
  7. Oct 20, 2014
    • Vermaat's avatar
      8acb0970
    • Vermaat's avatar
      Use unicode strings · 2a4dc3c1
      Vermaat authored
      Don't fix what ain't broken. Unfortunately, string handling in Mutalyzer
      really is broken. So we fix it.
      
      Internally, all strings should be represented by unicode strings as much as
      possible. The main exception are large reference sequence strings. These can
      often better be BioPython sequence objects, since that is how we usually get
      them in the first place.
      
      These changes will hopefully make Mutalyzer more reliable in working with
      incoming data. As a bonus, they're a first (small) step towards Python 3
      compatibility [1].
      
      Our strategy is as follows:
      
      1. We use `from __future__ import unicode_literals` at the top of every file.
      2. All incoming strings are decoded to unicode (if necessary) as soon as
         possible.
      3. Outgoing strings are encoded to UTF8 (if necessary) as late as possible.
      4. BioPython sequence objects can be based on byte strings as well as unicode
         strings.
      5. In the database, everything is UTF8.
      6. We worry about uploaded and downloaded reference files and batch jobs in a
         later commit.
      
      Point 1 will ensure that all string literals in our source code will be
      unicode strings [2].
      
      As for point 4, sometimes this may even change under our eyes (e.g., calling
      `.reverse_complement()` will change it to a byte string). We don't care as
      long as they're BioPython objects, only when we get the sequence out we must
      have it as unicode string. Their contents are always in the ASCII range
      anyway.
      
      Although `Bio.Seq.reverse_complement` works fine on Python byte strings (and
      we used to rely on that), it crashes on a Python unicode string. So we take
      care to only use it on BioPython sequence objects and wrote our own reverse
      complement function for unicode strings (`mutalyzer.util.reverse_complement`).
      
      As for point 5, SQLAlchemy already does a very good job at presenting decoding
      from and encoding to UTF8 for us.
      
      The Spyne documentation has the following to say about their `String` and
      `Unicode` types [3]:
      
      > There are two string types in Spyne: `spyne.model.primitive.Unicode` and
      > `spyne.model.primitive.String` whose native types are `unicode` and `str`
      > respectively.
      >
      > Unlike the Python `str`, the Spyne `String` is not for arbitrary byte
      > streams. You should not use it unless you are absolutely, positively sure
      > that you need to deal with text data with an unknown encoding. In all other
      > cases, you should just use the `Unicode` type. They actually look the same
      > from outside, this distinction is made just to properly deal with the quirks
      > surrounding Python-2's `unicode` type.
      >
      > Remember that you have the `ByteArray` and `File` types at your disposal
      > when you need to deal with arbitrary byte streams.
      >
      > The `String` type will be just an alias for `Unicode` once Spyne gets ported
      > to Python 3. It might even be deprecated and removed in the future, so make
      > sure you are using either `Unicode` or `ByteArray` in your interface
      > definitions.
      
      So let's not ignore that and never use `String` anymore in our webservice
      interface.
      
      For the command line interface it's a bit more complicated, since there seems
      to be no reliable way to get the encoding of command line arguments. We use
      `sys.stdin.encoding` as a best guess.
      
      For us to interpret a sequence of bytes as text, it's key to be aware of their
      encoding. Once decoded, a text string can be safely used without having to
      worry about bytes. Without unicode we're nothing, and nothing will help
      us. Maybe we're lying, then you better not stay. But we could be safer, just
      for one day. Oh-oh-oh-ohh, oh-oh-oh-ohh, just for one day.
      
      [1] https://docs.python.org/2.7/howto/pyporting.html
      [2] http://python-future.org/unicode_literals.html
      [3] http://spyne.io/docs/2.10/manual/03_types.html#strings
      2a4dc3c1
  8. Oct 15, 2014
    • Vermaat's avatar
      Fix several error cases in LOVD2 getGS call · bcef1633
      Vermaat authored
      The `getGS` website view for LOVD2 would report "transcript not found" if
      the genomic reference has multiple transcripts annotated or if the variant
      description raises an error in the variant checker.
      bcef1633
  9. Oct 04, 2014
  10. Sep 26, 2014
  11. Sep 22, 2014
  12. Sep 19, 2014
  13. Aug 27, 2014
  14. Jun 24, 2014
  15. Mar 01, 2014
    • Vermaat's avatar
      Reverse complement range insertions/insertion-deletions · 57120a89
      Vermaat authored
      The name checker supports reverse complement ranges in insertions
      and insertions-deletions, for example `3_4ins8_12inv'.
      
      Reverse complement range insertions and insertion-deletions are not
      part of the current HGVS nomenclature, but will be proposed.
      57120a89
  16. Feb 28, 2014
    • Vermaat's avatar
      Range and compound insertions/insertion-deletions · 31b2f13a
      Vermaat authored
      The name checker supports ranges in insertions and insertion-
      deletions, for example `3_4ins8_12`, and compound insertions and
      insertion-deletions, for example `3_4ins[ATC;8_12]`.
      The inserted sequences are accepted and concatenated before any
      further processing, so reported descriptions show only the
      concatenated sequences.
      The support for ranges is limited to genomic descriptions.
      
      The position converter supports compound insertions and
      insertion-deletions, not ranges.
      
      Compound insertions and insertion-deletions are not part of the
      current HGVS nomenclature, but will be proposed.
      31b2f13a
  17. Feb 22, 2014
  18. Feb 17, 2014
  19. Jan 22, 2014
    • Vermaat's avatar
      Use fixtures in the unit tests · c49d49f0
      Vermaat authored
      This is The Good Stuff. The entire test suite can now be run without
      having to setup a database, running the batch checker, any of the web
      services or the website. It even passes without an internet connection.
      In, like, 30 seconds! Awesome!
      
      This means tests don't randomly fail after some reference sequence
      changes on the NCBI server and it doesn't take an entire configured
      server with mapping database setup to run the tests. Those are things
      of the past! No more frustrations, Mutalyzer is testable!
      
      Going down now...
      
      The mountain screamed three times today
      I guess it thought it'd like to play
      How much does one have to pay
      To fry a peak and melt away
      Launching titan's breath on mine
      The sweating measure lands on time
      
      And the old man, down by the river
      Well he walks up and he walks on down
      To the spaceship that's parked at your doorstep
      And it's waiting to take you away now
      
      Goin' down now
      Goin' down now
      
      Looking for the rate that crowed
      He's hooked up down in Mexico
      Slap my nerve now give me more
      It's my disaster friend, not yours
      
      And the old man, down by the river
      Well he walks up and he walks on down
      To the spaceship that's parked at your doorstep
      And it's waiting to take you away now
      
      And the last one, it's down by the river
      Where he gets up and he walks on down
      To the spaceship that's parked at your doorstep
      And it's waiting to take you away now
      
      It's down by the river, it's always this way now
      It's down by the river, it's always this way now
      
      Going down now
      Going down now
      now, now, now
      
      down, down, down
      c49d49f0
  20. Jan 10, 2014
    • Vermaat's avatar
      Remove obsolete Db module · 667f39a6
      Vermaat authored
      Now that we ported the database to SQLAlchemy, we remove the obsolete Db
      module and all references to it.
      667f39a6
    • Vermaat's avatar
      Use Redis for stat counters · 8fa5c251
      Vermaat authored
      The Redis client automatically falls back to a mock Redis server if no
      Redis server is configured. Therefore, a Redis server is not needed to
      run Mutalyzer. You'll just not get any aggregate stat counts over
      different runs.
      8fa5c251
    • Vermaat's avatar
      Port Mapping database module to SQLAlchemy · e9bf1bc9
      Vermaat authored
      This introduces a proper notion of genome assemblies. Transcript
      mappings for alle genome assemblies are in the same database, which
      is better for maintenance. Updating transcript mappings is also
      simplified a lot, especially from NCBI mapview files where we now
      require a preprocessing sort on the input file.
      
      Overall, this port touches a lot of Mutalyzer code, so beware.
      e9bf1bc9
  21. Jan 04, 2014
Loading