• Hoogenboom, Jerry's avatar
    Filtering and aggregation in Samplestats · a3e610e8
    Hoogenboom, Jerry authored
    Fixed:
    * When converting STR allele names to sequences, FDSTools would reject
      any prefix variants with a false message stating that the variant does
      not match the reference sequence.
    * The Samplestats tool would not allow the -b/--min-per-strand option to
      be set to zero.
    
    Improved:
    * Moved the flags generated by BGCorrect to a new column named
      correction_flags. Some of the values have been renamed for clarity,
      and this column now always contains a value.
      * The Samplestats tool will no longer add the not_corrected flag to
        each sequence, as it does not add the correction_flags column.
    * The Samplestats tool now supports filtering sequences. For filtering,
      the same set of options is available as those used for marking
      alleles. The filtering options use upper case letters and have '-filt'
      appended to their long name. The new -a/--filter-action option defines
      what should be done with filtered sequencies. 'off', the default,
      disables filtering; 'combine' replaces filtered sequences with a new
      line containing aggregated data; 'delete' removes filtered sequences
      without leaving a trace.
      * The seqconvert tool is aware of the special 'Other sequences' value
        produced by Samplestats with -a/--filter-action set to 'combine'.
    	Other tools will give an informative error message when the input
    	contains this special value.
    * The Samplestats tool now accepts non-integer and negative numbers for
      -n/--min-reads and -b/--min-per-strand because after correction read
      counts are not necessarily nonnegative integers anymore.
    * The forward_correction and reverse_correction columns of Samplestats
      will now contain 0 if the sequence had exactly 0 reads both before and
      after correction (previously, this was -100).
    * Renamed the _mp columns of Samplestats to _mp_sum ("per-marker
      percentage of the sum") and introduced _mp_max columns ("per-marker
      percentage of the maximum").
    * Samplestats and Samplevis HTML visualisations will now mark a sequence
      as 'allele' if the minimum amount of correction OR the minimum number
      of recovered reads is reached (as opposed to AND). This allows alleles
      on stutter positions to be detected.
    
    Changed:
    * The -r/--min-recovery option of Samplestats has been renamed to
      -y/--min-recovery, analogous to the new -Y/--min-recovery-filt.
    
    Visualisations:
    * Updated Vega to version 2.4.1.
    * Replaced the regular expression-based filters in all visualisations
      with a much simpler syntax. The new syntax uses space-separated search
      terms, defaulting to a 'contains'-type search method. If any search
      term is preceded by an equals sign, that term must be matched exactly.
      (The search terms themselves are actually still matched as regexes!)
    * Added 'show negative alleles' option (default on) to Samplevis. When
      enabled, the graph filtering options work on abs(value) instead of the
      value itself.
    * When sorting alleles in Samplevis, the allele name is now used as the
      final tiebreaker instead of the primary sorting column.
    * HTML visualisations no longer re-render the entire graph when changing
      the width. The same holds true for the height setting of Allelevis.
    * The tables in Samplevis HTML visualisations will now contain the
      information from BGCorrect's correction_flags column in the Notes
      column.
    a3e610e8
notes.txt 7.59 KB