Filtering and aggregation in Samplestats
Fixed: * When converting STR allele names to sequences, FDSTools would reject any prefix variants with a false message stating that the variant does not match the reference sequence. * The Samplestats tool would not allow the -b/--min-per-strand option to be set to zero. Improved: * Moved the flags generated by BGCorrect to a new column named correction_flags. Some of the values have been renamed for clarity, and this column now always contains a value. * The Samplestats tool will no longer add the not_corrected flag to each sequence, as it does not add the correction_flags column. * The Samplestats tool now supports filtering sequences. For filtering, the same set of options is available as those used for marking alleles. The filtering options use upper case letters and have '-filt' appended to their long name. The new -a/--filter-action option defines what should be done with filtered sequencies. 'off', the default, disables filtering; 'combine' replaces filtered sequences with a new line containing aggregated data; 'delete' removes filtered sequences without leaving a trace. * The seqconvert tool is aware of the special 'Other sequences' value produced by Samplestats with -a/--filter-action set to 'combine'. Other tools will give an informative error message when the input contains this special value. * The Samplestats tool now accepts non-integer and negative numbers for -n/--min-reads and -b/--min-per-strand because after correction read counts are not necessarily nonnegative integers anymore. * The forward_correction and reverse_correction columns of Samplestats will now contain 0 if the sequence had exactly 0 reads both before and after correction (previously, this was -100). * Renamed the _mp columns of Samplestats to _mp_sum ("per-marker percentage of the sum") and introduced _mp_max columns ("per-marker percentage of the maximum"). * Samplestats and Samplevis HTML visualisations will now mark a sequence as 'allele' if the minimum amount of correction OR the minimum number of recovered reads is reached (as opposed to AND). This allows alleles on stutter positions to be detected. Changed: * The -r/--min-recovery option of Samplestats has been renamed to -y/--min-recovery, analogous to the new -Y/--min-recovery-filt. Visualisations: * Updated Vega to version 2.4.1. * Replaced the regular expression-based filters in all visualisations with a much simpler syntax. The new syntax uses space-separated search terms, defaulting to a 'contains'-type search method. If any search term is preceded by an equals sign, that term must be matched exactly. (The search terms themselves are actually still matched as regexes!) * Added 'show negative alleles' option (default on) to Samplevis. When enabled, the graph filtering options work on abs(value) instead of the value itself. * When sorting alleles in Samplevis, the allele name is now used as the final tiebreaker instead of the primary sorting column. * HTML visualisations no longer re-render the entire graph when changing the width. The same holds true for the height setting of Allelevis. * The tables in Samplevis HTML visualisations will now contain the information from BGCorrect's correction_flags column in the Notes column.
Showing with 907 additions and 421 deletions