1. 17 Nov, 2020 1 commit
    • Mihai's avatar
      Improve warnings (#513) · 62e0c1e8
      Mihai authored
      * Fix #428
      
      * Improve warning messages for positions outside of the sequence range (#479).
      
      * Improve intronic positions with non-genomic references warning (#464).
      
      * Improve duplication warning (#466).
      62e0c1e8
  2. 04 Dec, 2018 1 commit
  3. 14 Jun, 2018 1 commit
  4. 11 Jun, 2018 1 commit
    • Mihai's avatar
      Support for LRG 1.9 · 5d54d4f5
      Mihai authored
      - Switch to new LRG location (fix for #447).
      - Extract only the gene name from the `updatable` section.
      - Fix for LRG transcripts with no coding region.
      - Adapted LRG examples on the website.
      - Extract the annotation set based on attribute type.
      - More informative error message (part of #135).
      - Makes #338 obsolete.
      5d54d4f5
  5. 24 May, 2018 1 commit
  6. 22 Jun, 2016 1 commit
  7. 15 Jun, 2016 1 commit
    • mkroon's avatar
      Accept accession number as transcript selector · 81b21202
      mkroon authored and Vermaat's avatar Vermaat committed
      * Added failing test with accession number as transcript variant
        identifier.
      * Internally the accession number is translated to the transcript
        name (i.e. v-number) and subsequent processing is untouched.
      81b21202
  8. 12 Jun, 2016 1 commit
  9. 25 May, 2016 1 commit
  10. 18 Dec, 2015 1 commit
  11. 23 Sep, 2015 3 commits
    • Vermaat's avatar
      Show diff for variant protein from non-reference start codon · 3c98a1af
      Vermaat authored
      The alternative variant protein sequence translated from a
      non-reference start codon (created by the variant), was not
      color-diffed as normal variant protein sequences are.
      
      In the process we also rename the `oldprotein` and `newprotein`
      fields in the output object to `oldProtein` and `newProtein` to
      be more consistent with other field names.
      3c98a1af
    • Vermaat's avatar
      Visualise protein change, also with alternative start · 851e71fe
      Vermaat authored
      In the case of an alternative start codon (in the reference CDS),
      protein changes were not visualised. This is fixed and a WALTSTART
      warning is also issued.
      
      Also, if a new non-reference start codon is created by the variant,
      visualise this as such.
      851e71fe
    • Vermaat's avatar
      Translate alternative start to M, also in variant · ae70ddfd
      Vermaat authored
      In case of an alternative start codon, the variant CDS was not
      translated to a protein starting with M. This caused the protein
      description machinery to conclude a variant affecting the start
      codon, hence reporting `p.?`.
      
      We fix this by always translating the start codon to M (except
      when the variant actually affects it).
      
      Example: `NM_024426.4:c.1107A>G` (a synomymous mutation) should
      yield `NM_024426.4(WT1_i001):p.(=)`, not `p.?`. The start codon
      for that protein is `CTG`.
      ae70ddfd
  12. 15 Jul, 2015 1 commit
    • Vermaat's avatar
      Uncertain stop codon in protein descriptions (fs and ext) · d2f91690
      Vermaat authored
      When a variant results in a frame shift or extension and we don't
      see a new stop codon in the RNA, the protein description should use
      the notation for an uncertain stop codon, e.g., `p.(Gln730Profs*?)`
      instead of `p.(Gln730Profs*96)` where 96 is just the last codon in
      our transcript [1].
      
      To detect this, we now use `to_stop=False` in our `.translate()`
      calls, since that will explicitely return `*` characters for stop
      codons.
      
      We also slightly fix the coloring of changes in the protein sequence
      where previously changed stop codon characters where not included.
      
      [1] http://www.hgvs.org/mutnomen/FAQ.html#nostop
      d2f91690
  13. 20 Oct, 2014 1 commit
    • Vermaat's avatar
      Use unicode strings · 2a4dc3c1
      Vermaat authored
      Don't fix what ain't broken. Unfortunately, string handling in Mutalyzer
      really is broken. So we fix it.
      
      Internally, all strings should be represented by unicode strings as much as
      possible. The main exception are large reference sequence strings. These can
      often better be BioPython sequence objects, since that is how we usually get
      them in the first place.
      
      These changes will hopefully make Mutalyzer more reliable in working with
      incoming data. As a bonus, they're a first (small) step towards Python 3
      compatibility [1].
      
      Our strategy is as follows:
      
      1. We use `from __future__ import unicode_literals` at the top of every file.
      2. All incoming strings are decoded to unicode (if necessary) as soon as
         possible.
      3. Outgoing strings are encoded to UTF8 (if necessary) as late as possible.
      4. BioPython sequence objects can be based on byte strings as well as unicode
         strings.
      5. In the database, everything is UTF8.
      6. We worry about uploaded and downloaded reference files and batch jobs in a
         later commit.
      
      Point 1 will ensure that all string literals in our source code will be
      unicode strings [2].
      
      As for point 4, sometimes this may even change under our eyes (e.g., calling
      `.reverse_complement()` will change it to a byte string). We don't care as
      long as they're BioPython objects, only when we get the sequence out we must
      have it as unicode string. Their contents are always in the ASCII range
      anyway.
      
      Although `Bio.Seq.reverse_complement` works fine on Python byte strings (and
      we used to rely on that), it crashes on a Python unicode string. So we take
      care to only use it on BioPython sequence objects and wrote our own reverse
      complement function for unicode strings (`mutalyzer.util.reverse_complement`).
      
      As for point 5, SQLAlchemy already does a very good job at presenting decoding
      from and encoding to UTF8 for us.
      
      The Spyne documentation has the following to say about their `String` and
      `Unicode` types [3]:
      
      > There are two string types in Spyne: `spyne.model.primitive.Unicode` and
      > `spyne.model.primitive.String` whose native types are `unicode` and `str`
      > respectively.
      >
      > Unlike the Python `str`, the Spyne `String` is not for arbitrary byte
      > streams. You should not use it unless you are absolutely, positively sure
      > that you need to deal with text data with an unknown encoding. In all other
      > cases, you should just use the `Unicode` type. They actually look the same
      > from outside, this distinction is made just to properly deal with the quirks
      > surrounding Python-2's `unicode` type.
      >
      > Remember that you have the `ByteArray` and `File` types at your disposal
      > when you need to deal with arbitrary byte streams.
      >
      > The `String` type will be just an alias for `Unicode` once Spyne gets ported
      > to Python 3. It might even be deprecated and removed in the future, so make
      > sure you are using either `Unicode` or `ByteArray` in your interface
      > definitions.
      
      So let's not ignore that and never use `String` anymore in our webservice
      interface.
      
      For the command line interface it's a bit more complicated, since there seems
      to be no reliable way to get the encoding of command line arguments. We use
      `sys.stdin.encoding` as a best guess.
      
      For us to interpret a sequence of bytes as text, it's key to be aware of their
      encoding. Once decoded, a text string can be safely used without having to
      worry about bytes. Without unicode we're nothing, and nothing will help
      us. Maybe we're lying, then you better not stay. But we could be safer, just
      for one day. Oh-oh-oh-ohh, oh-oh-oh-ohh, just for one day.
      
      [1] https://docs.python.org/2.7/howto/pyporting.html
      [2] http://python-future.org/unicode_literals.html
      [3] http://spyne.io/docs/2.10/manual/03_types.html#strings
      2a4dc3c1
  14. 15 Oct, 2014 1 commit
    • Vermaat's avatar
      Fix several error cases in LOVD2 getGS call · bcef1633
      Vermaat authored
      The `getGS` website view for LOVD2 would report "transcript not found" if
      the genomic reference has multiple transcripts annotated or if the variant
      description raises an error in the variant checker.
      bcef1633
  15. 01 Jul, 2014 1 commit
  16. 01 Mar, 2014 1 commit
    • Vermaat's avatar
      Reverse complement range insertions/insertion-deletions · 57120a89
      Vermaat authored
      The name checker supports reverse complement ranges in insertions
      and insertions-deletions, for example `3_4ins8_12inv'.
      
      Reverse complement range insertions and insertion-deletions are not
      part of the current HGVS nomenclature, but will be proposed.
      57120a89
  17. 28 Feb, 2014 1 commit
    • Vermaat's avatar
      Range and compound insertions/insertion-deletions · 31b2f13a
      Vermaat authored
      The name checker supports ranges in insertions and insertion-
      deletions, for example `3_4ins8_12`, and compound insertions and
      insertion-deletions, for example `3_4ins[ATC;8_12]`.
      The inserted sequences are accepted and concatenated before any
      further processing, so reported descriptions show only the
      concatenated sequences.
      The support for ranges is limited to genomic descriptions.
      
      The position converter supports compound insertions and
      insertion-deletions, not ranges.
      
      Compound insertions and insertion-deletions are not part of the
      current HGVS nomenclature, but will be proposed.
      31b2f13a
  18. 16 Jan, 2014 1 commit
    • Vermaat's avatar
      Port website from web.py to Flask · 0abce583
      Vermaat authored
      This includes changing a lot of routes and parameter names to be more
      consistent. We try to remain backwards compatible as much as possible
      by providing redirects from old routes and parameter names.
      0abce583
  19. 10 Jan, 2014 1 commit
    • Vermaat's avatar
      Port Mapping database module to SQLAlchemy · e9bf1bc9
      Vermaat authored
      This introduces a proper notion of genome assemblies. Transcript
      mappings for alle genome assemblies are in the same database, which
      is better for maintenance. Updating transcript mappings is also
      simplified a lot, especially from NCBI mapview files where we now
      require a preprocessing sort on the input file.
      
      Overall, this port touches a lot of Mutalyzer code, so beware.
      e9bf1bc9
  20. 04 Jan, 2014 1 commit
  21. 23 Dec, 2013 1 commit
  22. 19 Dec, 2013 1 commit
    • Vermaat's avatar
      Move from configobj to Python module based config · 7c7f19c3
      Vermaat authored
      Remove the dependency on configobj and have default values for all
      configuration settings. User settings are defined in a Python module
      pointed to by the MUTALYZER_SETTINGS environment variable.
      
      We also clean up many configuration settings and remove some that
      are no longer used.
      7c7f19c3
  23. 25 Mar, 2013 1 commit
  24. 12 Feb, 2013 1 commit
  25. 15 Dec, 2012 1 commit
  26. 29 Nov, 2012 1 commit
  27. 26 Nov, 2012 1 commit
  28. 04 Oct, 2012 2 commits
  29. 26 Jul, 2012 1 commit
  30. 21 Jun, 2012 1 commit
  31. 21 May, 2012 1 commit
  32. 20 Mar, 2012 1 commit
  33. 21 Feb, 2012 1 commit
  34. 18 Feb, 2012 1 commit
  35. 31 Jan, 2012 3 commits