Skip to content
Snippets Groups Projects
  1. Oct 26, 2015
  2. Oct 23, 2015
  3. Oct 22, 2015
  4. Oct 20, 2015
    • Vermaat's avatar
      Cache transcript protein links in Redis · 473c732c
      Vermaat authored
      Caching of transcript protein links received from the NCBI Entrez
      service is a typical use case for Redis. This implements this cache
      in Redis and removes all use of our original database table.
      
      An Alembic migration copies all existing links from the database to
      Redis. The original `TranscriptProteinLink` database table is not
      dropped. This will be done in a future migration to ensure running
      processes don't error and to provide a rollback scenario.
      
      We also remove the expiration of links (originally defaulting to 30
      days), since we don't expect them to ever change. Negative links
      (caching a 'not found' result from Entrez) *are* still expiring,
      but with a longer default of 30 days (was 5 days).
      
      The configuration setting for the latter was renamed, yielding the
      following changes in the default configuration settings.
      
      Removed default settings:
      
          # Expiration time for transcript<->protein links from the NCBI (in seconds).
          PROTEIN_LINK_EXPIRATION = 60 * 60 * 24 * 30
      
          # Expiration time for negative transcript<->protein links from the NCBI (in
          # seconds).
          NEGATIVE_PROTEIN_LINK_EXPIRATION = 60 * 60 * 24 * 5
      
      Added default setting:
      
          # Cache expiration time for negative transcript<->protein links from the NCBI
          # (in seconds).
          NEGATIVE_LINK_CACHE_EXPIRATION = 60 * 60 * 24 * 30
      473c732c
  5. Oct 13, 2015
  6. Oct 10, 2015
  7. Oct 01, 2015
  8. Sep 30, 2015
  9. Sep 27, 2015
  10. Sep 23, 2015
    • Vermaat's avatar
      Show diff for variant protein from non-reference start codon · 3c98a1af
      Vermaat authored
      The alternative variant protein sequence translated from a
      non-reference start codon (created by the variant), was not
      color-diffed as normal variant protein sequences are.
      
      In the process we also rename the `oldprotein` and `newprotein`
      fields in the output object to `oldProtein` and `newProtein` to
      be more consistent with other field names.
      3c98a1af
    • Vermaat's avatar
      Visualise protein change, also with alternative start · 851e71fe
      Vermaat authored
      In the case of an alternative start codon (in the reference CDS),
      protein changes were not visualised. This is fixed and a WALTSTART
      warning is also issued.
      
      Also, if a new non-reference start codon is created by the variant,
      visualise this as such.
      851e71fe
    • Vermaat's avatar
      Translate alternative start to M, also in variant · ae70ddfd
      Vermaat authored
      In case of an alternative start codon, the variant CDS was not
      translated to a protein starting with M. This caused the protein
      description machinery to conclude a variant affecting the start
      codon, hence reporting `p.?`.
      
      We fix this by always translating the start codon to M (except
      when the variant actually affects it).
      
      Example: `NM_024426.4:c.1107A>G` (a synomymous mutation) should
      yield `NM_024426.4(WT1_i001):p.(=)`, not `p.?`. The start codon
      for that protein is `CTG`.
      ae70ddfd
  11. Aug 10, 2015
  12. Aug 04, 2015
  13. Jul 15, 2015
    • Vermaat's avatar
      Uncertain stop codon in protein descriptions (fs and ext) · d2f91690
      Vermaat authored
      When a variant results in a frame shift or extension and we don't
      see a new stop codon in the RNA, the protein description should use
      the notation for an uncertain stop codon, e.g., `p.(Gln730Profs*?)`
      instead of `p.(Gln730Profs*96)` where 96 is just the last codon in
      our transcript [1].
      
      To detect this, we now use `to_stop=False` in our `.translate()`
      calls, since that will explicitely return `*` characters for stop
      codons.
      
      We also slightly fix the coloring of changes in the protein sequence
      where previously changed stop codon characters where not included.
      
      [1] http://www.hgvs.org/mutnomen/FAQ.html#nostop
      d2f91690
  14. Jul 09, 2015
  15. Jul 03, 2015
    • Vermaat's avatar
      Use chardet instead of cchardet · dedad241
      Vermaat authored
      Issue #50 showed a problem in our file encoding detection, caused
      by our cut-off for the confidence as reported by the cchardet [1]
      library:
      
          >>> import cchardet
          >>> s = u'NM_000052.4:c.2407\u20132A>G'
          >>> b = s.encode('WINDOWS-1252')
          >>> cchardet.detect(b)
          {'confidence': 0.5, 'encoding': u'WINDOWS-1252'}
      
      We require a confidence stictly greater than 0.5 and default to
      UTF8 otherwise.
      
      If, however, we try the same thing using the chardet [2] library,
      we get a higher confidence for the same string:
      
          >>> import chardet
          >>> chardet.detect(b)
          {'confidence': 0.73, 'encoding': 'windows-1252'}
      
      So the two obvious ways to solve this are:
      
      1. Lower the confidence threshold.
      2. Use chardet instead of cchardet.
      
      We implement the second solution here, since it also removes a C
      library dependency and we are not worried by performance.
      
      Of course the detected encoding remains a guess which can still
      be wrong!
      
      [1] https://github.com/PyYoshi/cChardet
      [2] https://github.com/chardet/chardet
      
      Fixes #50
      dedad241
  16. May 31, 2015
  17. May 18, 2015
  18. May 01, 2015
  19. Apr 30, 2015
  20. Jan 30, 2015
    • Vermaat's avatar
      Discard incomplete genes in genbank reference files · 73c0862f
      Vermaat authored
      Many genbank reference files contain more than one gene, especially
      slices from an assembly. Some of these genes may be incomplete in
      the reference file (i.e., either start or end exceeds the outer
      coordinates). We cannot really do anything with these genes, so we
      discard them during parsing.
      73c0862f
    • Vermaat's avatar
      Fix broken DMD reference in unit tests · 51d8cc50
      Vermaat authored
      51d8cc50
    • Vermaat's avatar
      Add getGeneLocation webservice method · e06452a1
      Vermaat authored
      Given a gene symbol and optional genome build, this returns the location
      of the gene.
      
      Primary motivation for this is LOVD, where it will be used in combination
      with sliceChromsome as an alternative for sliceChromosomeByGene which only
      works on the fixed Ensembl genome build.
      e06452a1
  21. Nov 24, 2014
Loading