Skip to content
Snippets Groups Projects
  1. Jul 03, 2015
    • Vermaat's avatar
      Use chardet instead of cchardet · dedad241
      Vermaat authored
      Issue #50 showed a problem in our file encoding detection, caused
      by our cut-off for the confidence as reported by the cchardet [1]
      library:
      
          >>> import cchardet
          >>> s = u'NM_000052.4:c.2407\u20132A>G'
          >>> b = s.encode('WINDOWS-1252')
          >>> cchardet.detect(b)
          {'confidence': 0.5, 'encoding': u'WINDOWS-1252'}
      
      We require a confidence stictly greater than 0.5 and default to
      UTF8 otherwise.
      
      If, however, we try the same thing using the chardet [2] library,
      we get a higher confidence for the same string:
      
          >>> import chardet
          >>> chardet.detect(b)
          {'confidence': 0.73, 'encoding': 'windows-1252'}
      
      So the two obvious ways to solve this are:
      
      1. Lower the confidence threshold.
      2. Use chardet instead of cchardet.
      
      We implement the second solution here, since it also removes a C
      library dependency and we are not worried by performance.
      
      Of course the detected encoding remains a guess which can still
      be wrong!
      
      [1] https://github.com/PyYoshi/cChardet
      [2] https://github.com/chardet/chardet
      
      Fixes #50
      dedad241
    • Vermaat's avatar
      Merge pull request #51 from mutalyzer/ng-example · e0490337
      Vermaat authored
      Add NG example to name checker website form
      e0490337
    • Vermaat's avatar
      Add NG example to name checker website form · 654393ea
      Vermaat authored
      654393ea
  2. May 31, 2015
  3. May 27, 2015
  4. May 26, 2015
  5. May 18, 2015
  6. May 04, 2015
  7. May 01, 2015
  8. Apr 30, 2015
Loading