1. 08 Nov, 2021 1 commit
  2. 04 Dec, 2018 1 commit
  3. 08 Nov, 2016 1 commit
  4. 25 May, 2016 1 commit
  5. 23 May, 2016 1 commit
  6. 11 Mar, 2016 1 commit
  7. 23 Feb, 2016 2 commits
  8. 22 Feb, 2016 2 commits
    • Vermaat's avatar
      736f0bc1
    • Vermaat's avatar
      Support LRG transcripts in the position converter · d9335656
      Vermaat authored
      Note that we explicitely only support LRG references as transcripts,
      so using c. positioning to convert to/from chromosomal positioning.
      
      Supporting LRG references as genomic referenes, so using g. positioning
      can be future work but converting them to/from LRG transcripts is of
      course already done by the name checker.
      
      Converting between genomic LRG positioning and chromosomal positioning
      directly is not something that can be easily supported in the current
      setup of the position converter.
      d9335656
  9. 29 Oct, 2015 1 commit
  10. 24 Sep, 2015 1 commit
  11. 20 Jul, 2015 1 commit
    • Vermaat's avatar
      Fix transcript mappings containing no exons · 5e0d444a
      Vermaat authored
      For transcripts without any UTR and CDS entries in the NCBI Mapview
      file (seems to happen for  predicted genes), we generate one exon
      spanning the entire transcript.
      5e0d444a
  12. 22 Oct, 2014 1 commit
  13. 21 Oct, 2014 1 commit
  14. 20 Oct, 2014 1 commit
    • Vermaat's avatar
      Use unicode strings · 2a4dc3c1
      Vermaat authored
      Don't fix what ain't broken. Unfortunately, string handling in Mutalyzer
      really is broken. So we fix it.
      
      Internally, all strings should be represented by unicode strings as much as
      possible. The main exception are large reference sequence strings. These can
      often better be BioPython sequence objects, since that is how we usually get
      them in the first place.
      
      These changes will hopefully make Mutalyzer more reliable in working with
      incoming data. As a bonus, they're a first (small) step towards Python 3
      compatibility [1].
      
      Our strategy is as follows:
      
      1. We use `from __future__ import unicode_literals` at the top of every file.
      2. All incoming strings are decoded to unicode (if necessary) as soon as
         possible.
      3. Outgoing strings are encoded to UTF8 (if necessary) as late as possible.
      4. BioPython sequence objects can be based on byte strings as well as unicode
         strings.
      5. In the database, everything is UTF8.
      6. We worry about uploaded and downloaded reference files and batch jobs in a
         later commit.
      
      Point 1 will ensure that all string literals in our source code will be
      unicode strings [2].
      
      As for point 4, sometimes this may even change under our eyes (e.g., calling
      `.reverse_complement()` will change it to a byte string). We don't care as
      long as they're BioPython objects, only when we get the sequence out we must
      have it as unicode string. Their contents are always in the ASCII range
      anyway.
      
      Although `Bio.Seq.reverse_complement` works fine on Python byte strings (and
      we used to rely on that), it crashes on a Python unicode string. So we take
      care to only use it on BioPython sequence objects and wrote our own reverse
      complement function for unicode strings (`mutalyzer.util.reverse_complement`).
      
      As for point 5, SQLAlchemy already does a very good job at presenting decoding
      from and encoding to UTF8 for us.
      
      The Spyne documentation has the following to say about their `String` and
      `Unicode` types [3]:
      
      > There are two string types in Spyne: `spyne.model.primitive.Unicode` and
      > `spyne.model.primitive.String` whose native types are `unicode` and `str`
      > respectively.
      >
      > Unlike the Python `str`, the Spyne `String` is not for arbitrary byte
      > streams. You should not use it unless you are absolutely, positively sure
      > that you need to deal with text data with an unknown encoding. In all other
      > cases, you should just use the `Unicode` type. They actually look the same
      > from outside, this distinction is made just to properly deal with the quirks
      > surrounding Python-2's `unicode` type.
      >
      > Remember that you have the `ByteArray` and `File` types at your disposal
      > when you need to deal with arbitrary byte streams.
      >
      > The `String` type will be just an alias for `Unicode` once Spyne gets ported
      > to Python 3. It might even be deprecated and removed in the future, so make
      > sure you are using either `Unicode` or `ByteArray` in your interface
      > definitions.
      
      So let's not ignore that and never use `String` anymore in our webservice
      interface.
      
      For the command line interface it's a bit more complicated, since there seems
      to be no reliable way to get the encoding of command line arguments. We use
      `sys.stdin.encoding` as a best guess.
      
      For us to interpret a sequence of bytes as text, it's key to be aware of their
      encoding. Once decoded, a text string can be safely used without having to
      worry about bytes. Without unicode we're nothing, and nothing will help
      us. Maybe we're lying, then you better not stay. But we could be safer, just
      for one day. Oh-oh-oh-ohh, oh-oh-oh-ohh, just for one day.
      
      [1] https://docs.python.org/2.7/howto/pyporting.html
      [2] http://python-future.org/unicode_literals.html
      [3] http://spyne.io/docs/2.10/manual/03_types.html#strings
      2a4dc3c1
  15. 13 May, 2014 1 commit
  16. 28 Feb, 2014 1 commit
    • Vermaat's avatar
      Range and compound insertions/insertion-deletions · 31b2f13a
      Vermaat authored
      The name checker supports ranges in insertions and insertion-
      deletions, for example `3_4ins8_12`, and compound insertions and
      insertion-deletions, for example `3_4ins[ATC;8_12]`.
      The inserted sequences are accepted and concatenated before any
      further processing, so reported descriptions show only the
      concatenated sequences.
      The support for ranges is limited to genomic descriptions.
      
      The position converter supports compound insertions and
      insertion-deletions, not ranges.
      
      Compound insertions and insertion-deletions are not part of the
      current HGVS nomenclature, but will be proposed.
      31b2f13a
  17. 17 Feb, 2014 1 commit
    • Vermaat's avatar
      Rename organelle_type to organelle in chromosome model · 352c590b
      Vermaat authored
      Also, the value for nuclear chromosomes is now `nucleus` instead of
      `chromosome` for better alignment with the other value `mitochondrion`.
      
      Note that I did not bother to make an Alembic migration for this, since
      we don't have any installations besides my own yet anyway.
      352c590b
  18. 25 Jan, 2014 1 commit
  19. 16 Jan, 2014 1 commit
  20. 10 Jan, 2014 2 commits
    • Vermaat's avatar
      Remove obsolete Db module · 667f39a6
      Vermaat authored
      Now that we ported the database to SQLAlchemy, we remove the obsolete Db
      module and all references to it.
      667f39a6
    • Vermaat's avatar
      Port Mapping database module to SQLAlchemy · e9bf1bc9
      Vermaat authored
      This introduces a proper notion of genome assemblies. Transcript
      mappings for alle genome assemblies are in the same database, which
      is better for maintenance. Updating transcript mappings is also
      simplified a lot, especially from NCBI mapview files where we now
      require a preprocessing sort on the input file.
      
      Overall, this port touches a lot of Mutalyzer code, so beware.
      e9bf1bc9
  21. 04 Jan, 2014 1 commit
  22. 23 Dec, 2013 1 commit
  23. 14 Jan, 2013 1 commit
  24. 14 Nov, 2012 2 commits
  25. 08 Nov, 2012 1 commit
  26. 12 Jul, 2012 2 commits
  27. 11 Jul, 2012 1 commit
  28. 11 May, 2012 1 commit
  29. 01 Mar, 2012 1 commit
  30. 30 Jan, 2012 1 commit
    • Vermaat's avatar
      Fix mapping info for genes mapped to more than one chromosome · 5896fe3b
      Vermaat authored
      Some genes (e.g. in the PAR) are mapped on both the X and Y chromosomes, but
      are (apart from the chromosome names) indistinguishable from transcripts that
      are mapped using different contigs. Transcripts of the latter type should be
      merged, those of the former type should not be merged.
      
      Our fix consists of only including exons where positions are consistent with
      the transcript mapping and allowing transcripts to be mapped more than once,
      but only to two different chromosomes.
      
      This fixes #82.
      
      
      git-svn-id: https://humgenprojects.lumc.nl/svn/mutalyzer/trunk@467 eb6bd6ab-9ccd-42b9-aceb-e2899b4a52f1
      5896fe3b
  31. 27 Jan, 2012 1 commit
  32. 24 Nov, 2011 2 commits
  33. 23 Nov, 2011 1 commit
  34. 04 Nov, 2011 1 commit