Skip to content
Snippets Groups Projects
  1. Feb 22, 2016
    • Vermaat's avatar
      Support LRG transcripts in the position converter · d9335656
      Vermaat authored
      Note that we explicitely only support LRG references as transcripts,
      so using c. positioning to convert to/from chromosomal positioning.
      
      Supporting LRG references as genomic referenes, so using g. positioning
      can be future work but converting them to/from LRG transcripts is of
      course already done by the name checker.
      
      Converting between genomic LRG positioning and chromosomal positioning
      directly is not something that can be easily supported in the current
      setup of the position converter.
      d9335656
  2. Feb 10, 2016
    • Vermaat's avatar
      Don't report ext*? when variant RNA has stop codon · 9191352b
      Vermaat authored
      With the change introduced by #65 we forgot if the variant RNA has an
      alternative downstream stop codon and therefore always reported ext*?
      when the original stop codon was removed.
      
      Fixes #145
      9191352b
  3. Dec 19, 2015
    • Vermaat's avatar
      Keep incomplete genes with complete features · 8fac2dc7
      Vermaat authored
      With this change the genbank parser no longer discards incomplete genes
      directly but keeps them as long as they have complete features
      annotated.
      
      For example, the PIK3R2 gene is annotated on NC_000019.9 (or a slice) as
      4973..>22328 with two RNA entries. One of these, however, is complete so
      it would be a shame to discard the entire gene.
      8fac2dc7
    • Vermaat's avatar
      Add gene feature to genbank file without version · c1ea8bc3
      Vermaat authored
      This genbank file is incomplete and incorrect anyway, but this was
      not the mistake we want to test.
      c1ea8bc3
  4. Dec 18, 2015
  5. Nov 10, 2015
  6. Oct 29, 2015
  7. Oct 26, 2015
  8. Oct 23, 2015
  9. Oct 22, 2015
  10. Oct 20, 2015
    • Vermaat's avatar
      Cache transcript protein links in Redis · 473c732c
      Vermaat authored
      Caching of transcript protein links received from the NCBI Entrez
      service is a typical use case for Redis. This implements this cache
      in Redis and removes all use of our original database table.
      
      An Alembic migration copies all existing links from the database to
      Redis. The original `TranscriptProteinLink` database table is not
      dropped. This will be done in a future migration to ensure running
      processes don't error and to provide a rollback scenario.
      
      We also remove the expiration of links (originally defaulting to 30
      days), since we don't expect them to ever change. Negative links
      (caching a 'not found' result from Entrez) *are* still expiring,
      but with a longer default of 30 days (was 5 days).
      
      The configuration setting for the latter was renamed, yielding the
      following changes in the default configuration settings.
      
      Removed default settings:
      
          # Expiration time for transcript<->protein links from the NCBI (in seconds).
          PROTEIN_LINK_EXPIRATION = 60 * 60 * 24 * 30
      
          # Expiration time for negative transcript<->protein links from the NCBI (in
          # seconds).
          NEGATIVE_PROTEIN_LINK_EXPIRATION = 60 * 60 * 24 * 5
      
      Added default setting:
      
          # Cache expiration time for negative transcript<->protein links from the NCBI
          # (in seconds).
          NEGATIVE_LINK_CACHE_EXPIRATION = 60 * 60 * 24 * 30
      473c732c
  11. Oct 13, 2015
  12. Oct 10, 2015
  13. Oct 01, 2015
  14. Sep 30, 2015
  15. Sep 27, 2015
  16. Sep 23, 2015
    • Vermaat's avatar
      Show diff for variant protein from non-reference start codon · 3c98a1af
      Vermaat authored
      The alternative variant protein sequence translated from a
      non-reference start codon (created by the variant), was not
      color-diffed as normal variant protein sequences are.
      
      In the process we also rename the `oldprotein` and `newprotein`
      fields in the output object to `oldProtein` and `newProtein` to
      be more consistent with other field names.
      3c98a1af
    • Vermaat's avatar
      Visualise protein change, also with alternative start · 851e71fe
      Vermaat authored
      In the case of an alternative start codon (in the reference CDS),
      protein changes were not visualised. This is fixed and a WALTSTART
      warning is also issued.
      
      Also, if a new non-reference start codon is created by the variant,
      visualise this as such.
      851e71fe
    • Vermaat's avatar
      Translate alternative start to M, also in variant · ae70ddfd
      Vermaat authored
      In case of an alternative start codon, the variant CDS was not
      translated to a protein starting with M. This caused the protein
      description machinery to conclude a variant affecting the start
      codon, hence reporting `p.?`.
      
      We fix this by always translating the start codon to M (except
      when the variant actually affects it).
      
      Example: `NM_024426.4:c.1107A>G` (a synomymous mutation) should
      yield `NM_024426.4(WT1_i001):p.(=)`, not `p.?`. The start codon
      for that protein is `CTG`.
      ae70ddfd
  17. Aug 10, 2015
  18. Aug 04, 2015
  19. Jul 15, 2015
    • Vermaat's avatar
      Uncertain stop codon in protein descriptions (fs and ext) · d2f91690
      Vermaat authored
      When a variant results in a frame shift or extension and we don't
      see a new stop codon in the RNA, the protein description should use
      the notation for an uncertain stop codon, e.g., `p.(Gln730Profs*?)`
      instead of `p.(Gln730Profs*96)` where 96 is just the last codon in
      our transcript [1].
      
      To detect this, we now use `to_stop=False` in our `.translate()`
      calls, since that will explicitely return `*` characters for stop
      codons.
      
      We also slightly fix the coloring of changes in the protein sequence
      where previously changed stop codon characters where not included.
      
      [1] http://www.hgvs.org/mutnomen/FAQ.html#nostop
      d2f91690
  20. Jul 09, 2015
  21. Jul 03, 2015
    • Vermaat's avatar
      Use chardet instead of cchardet · dedad241
      Vermaat authored
      Issue #50 showed a problem in our file encoding detection, caused
      by our cut-off for the confidence as reported by the cchardet [1]
      library:
      
          >>> import cchardet
          >>> s = u'NM_000052.4:c.2407\u20132A>G'
          >>> b = s.encode('WINDOWS-1252')
          >>> cchardet.detect(b)
          {'confidence': 0.5, 'encoding': u'WINDOWS-1252'}
      
      We require a confidence stictly greater than 0.5 and default to
      UTF8 otherwise.
      
      If, however, we try the same thing using the chardet [2] library,
      we get a higher confidence for the same string:
      
          >>> import chardet
          >>> chardet.detect(b)
          {'confidence': 0.73, 'encoding': 'windows-1252'}
      
      So the two obvious ways to solve this are:
      
      1. Lower the confidence threshold.
      2. Use chardet instead of cchardet.
      
      We implement the second solution here, since it also removes a C
      library dependency and we are not worried by performance.
      
      Of course the detected encoding remains a guess which can still
      be wrong!
      
      [1] https://github.com/PyYoshi/cChardet
      [2] https://github.com/chardet/chardet
      
      Fixes #50
      dedad241
  22. May 31, 2015
  23. May 18, 2015
  24. May 01, 2015
  25. Apr 30, 2015
Loading