1. 19 Mar, 2019 1 commit
    • Hoogenboom, Jerry's avatar
      TSSV v2.0.0 · bf7a9aff
      Hoogenboom, Jerry authored
        - Removed dependency on external tssv package (it is no longer compatible).
        - Greatly increased performance by deduplicating the input reads.
        - Removed the -q/--is-fastq option in favour of automatic detection.
        - Changed the default value for -m/--mismatches from 0.08 to 0.1.
        - Changed the default value for -n/--indel-score from 1 to 2.
        - Added the -X/--no-deduplicate option to disable deduplication.
        - Fixed potential crash that could occur under very specific circumstances.
      bf7a9aff
  2. 03 Jul, 2018 1 commit
  3. 08 Mar, 2017 1 commit
    • Hoogenboom, Jerry's avatar
      FDSTools v1.1.0.dev3: Fixes and pipelining enhancements · be8dbe46
      Hoogenboom, Jerry authored
      * General changes in v1.1.0.dev3:
        * Allele name heuristics: don't produce insertions at the end of the
          prefix or at the beginning of the suffix; just include extra STR
          blocks.
        * FDSTools will no longer crash with a 'column not found' error when
          an input file is empty. This situation is now treated as if the
          expected columns existed, but no lines of actual data were present.
          This greatly helps in tracking down issues in pipelines involving
          multiple tools, as tools will now shutdown gracefully if an upstream
          tool fails to write output.
      * Allelefinder v1.0.1:
        * Fixed crash that occurred when converting sequences to allele names
          format while no library file was provided.
        * Don't crash when output pipe is closed.
      * BGAnalyse v1.0.1:
        * Don't crash when output pipe is closed.
      * BGCorrect v1.0.2:
        * Don't crash on empty input files.
        * Don't crash when output pipe is closed.
      * BGEstimate v1.1.2:
        * Don't crash when output pipe is closed.
      * BGHomRaw v1.0.1:
        * Clarified the 'Allele x of marker y has 0 reads' error message with
          the sample tag.
        * Don't crash when output pipe is closed.
      * BGHomStats v1.0.1:
        * Error messages about the input data now contain the sample tag of
          the sample that triggered the error.
        * Don't crash when output pipe is closed.
      * BGMerge v1.0.3:
        * Don't crash when output pipe is closed.
      * BGPredict v1.0.2:
        * Don't crash on empty input files.
        * Don't crash when output pipe is closed.
      * FindNewAlleles v1.0.1:
        * Don't crash on empty input files.
        * Don't crash when output pipe is closed.
      * Libconvert v1.1.2:
        * Don't crash when output pipe is closed.
      * Library v1.0.3:
        * Don't crash when output pipe is closed.
      * Seqconvert v1.0.2:
        * Don't crash when output pipe is closed.
      * Samplestats v1.1.1:
        * Don't crash on empty input files.
        * Don't crash when output pipe is closed.
      * Stuttermark v1.5.1:
        * Don't crash on empty input files.
        * Don't crash when output pipe is closed.
      * Stuttermodel v1.1.2:
        * Don't crash when output pipe is closed.
      * TSSV v1.1.0 (additionally):
        * When running analysis in parallel, make tasks of 1 million alignments.
          Previously, this was 10k reads, with the number of alignments per task
          depending on the size of the library file. This caused memory issues for
          huge libraries like whole mt interval libraries.
        * Don't crash when output pipe is closed.
      * Vis v1.0.4:
        * Don't crash when output pipe is closed.
      be8dbe46
  4. 09 Feb, 2017 1 commit
  5. 21 Dec, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      FDSTools v1.0.1.dev3 · 14b17206
      Hoogenboom, Jerry authored
      * General changes in v1.0.1:
        * Fixed crash that occurred when using the -i option to run the same command
          on multiple input files.
        * The usage string now always starts with 'fdstools', even if FDSTools was
          invoked using some other command (e.g. on Windows, FDSTools gets invoked
          through a file called 'fdstools-script.py').
        * Fixed bug with the -d/--debug option being ignored if placed before the
          tool name on systems running Python 2.7.9 or later.
        * FDSTools library files may now contain IUPAC ambiguous bases in the
          prefix and suffix sequences of STR markers (except the first sequence,
          as it is used as the reference).  Additionally, optional bases may be
          represented by lowercase letters.
        * If no explicit prefix/suffix is given for an alias, the prefix/suffix of
          the corresponding marker is assumed instead. This situation was not
          handled correctly when converting from raw sequences to TSSV or allelename
          format, which resulted in the alias remaining unused.
      * Stuttermodelvis v2.0.2:
        * Added filtering option for the stutter amount (-1, +1, -2, etc.).
        * Added filtering option for the coefficient of determination (r squared
          value) of the fit functions.
      * Libconvert v1.1.1:
        * Adjustments for supporting IUPAC notation in prefix and suffix sequences
          when converting from FDSTools to TSSV library format.
      * Library v1.0.2:
        * Added documentation for IUPAC support to the descriptive comment of the
          [prefix] section.
      14b17206
  6. 26 Oct, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      FDSTools v1.0.1.dev2 · 98c434ed
      Hoogenboom, Jerry authored
      * Samplevis v2.1.2 (additionally):
        * The net effect of the allele calling thresholds (table filtering options)
          is now visualised in the graphs as a dashed vertical red line.
        * Fixed issue with allele calling thresholds not working anymore after having
          used the 'Save page' link in HTML visualisations.
      98c434ed
  7. 13 Oct, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      FDSTools v1.0.1.dev1 · f2ccd67d
      Hoogenboom, Jerry authored
      * Samplevis v2.1.2:
        * Added 'Save page' link to HTML visualisations, which offers for download a
          copy of the entire HTML visualisation including the user's changes.
        * Added automatic allele calling to static visualisations.
      * Pipeline v1.0.2:
        * Added -A/--in-allelelist option to the pipeline tool to provide an existing
          allele list file when running the ref-db analysis, bypassing Allelefinder.
      * Vis v1.0.3:
        * The -n/--min-abs and -s/--min-per-strand options now accept non-integer
          values as well.
        * Added six options to control the Table Filtering Options of Samplevis.
        * The Display Options now have a separate option group on the command line.
      f2ccd67d
  8. 03 Oct, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      FDSTools v1.0.0 Release Candidate 1 · de3593a8
      Hoogenboom, Jerry authored
      * General changes in v1.0.0rc1:
        * Fixed bug that caused variant descriptions in allele names of
          non-STR markers to be prepended with plus signs similar to suffix variants
          in STR markers.  When attempting to convert these allele names back to raw
          sequences, FDSTools would crash with an 'Invalid allele name' error.
      * Allelevis v2.0.1 (additionally):
        * In the tooltip in HTML visualisations, a line break may now only be
          inserted in allele names after an underscore character (_) or after a
          repeat block in STR allele names.  If the input file contains raw
          sequences, line breaks may now be introduced anywhere in the sequence.
      * Samplevis v2.1.1:
        * Added tooltip support to HTML visualisations.  Moving the mouse pointer
          over one of the alleles in the graph now displays a tooltip giving
          per-strand read counts of that allele.  The tooltip may include a
          'new allele' note if the input sample was analysed with FindNewAlleles.
        * The allele t...
      de3593a8
  9. 20 Sep, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Developing towards v0.0.6 (v0.0.6.dev1) · 5a2addb7
      Hoogenboom, Jerry authored
      * General changes in v0.0.6.dev1:
        * Tools that take a list of files as their argument (through the -i option or
          as positionals) now explicitly support glob patterns.  This means they will
          interpret '*' and '?' characters as wildcards for 'zero or more characters'
          and 'any one character', respectively.  On Unix-like systems this is
          generally done by the shell, but on Windows one had to specify every file
          name completely.
      * BGEstimate v1.1.1:
        * Added option -p/--profiles which can be used to provide a previously
          created background noise profiles file.  BGEstimate will read starting
          values from this file instead of assuming zero noise.
      * BGMerge v1.0.2:
        * Small code changes to facilitate explicit glob pattern matching support.
      * Pipeline v1.0.1:
        * The Pipeline tool will no longer check the existence of the files specified
          for the -S/--in-samples option; instead, this is left to the downstream
          tools to find out, consistent with how this works with the other input file
          options.
      * Allelevis v2.0.1:
        * Added tooltip support to HTML visualisations.  Moving the mouse pointer
          over a node or edge in the graph now displays a tooltip giving allele names
          and sample counts.
      * Stuttermodelvis v2.0.1:
        * Changed the unit in the horizontal axis title from 'bp' to 'nt'.
      * Library v1.0.1:
        * Updated some of the comments describing the sections.
      5a2addb7
  10. 06 Sep, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      FDSTools v0.0.5: new tools, changed defaults · abba1c04
      Hoogenboom, Jerry authored
      * General changes in v0.0.5:
        * The TSSV tool now depends on version 0.4.0 of TSSV.
        * Added new Pipeline tool that runs one of three default analysis pipelines
          automatically given a configuration file with tool options and input/output
          file names. The three available pipeline options are 'reference-sample',
          analysing a single reference sample with TSSV and Stuttermark;
          'reference-database', analysing a collection of reference samples with
          BGEstimate and Stuttermodel; and 'case-sample', analysing a single case
          sample with TSSV, BGPredict, BGMerge, BGCorrect, and Samplestats.
        * Added new Library tool that creates an empty FDSTools library file. Users
          may optionally specify the intented use of the library (STR markers,
          non-STR-markers, or both). Only the sections that apply to the given types
          of markers will be included in the output. The [aliases] section is not
          included by default, but an option is available to add it...
      abba1c04
  11. 26 Jul, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Various bug fixes and refinements throughout FDSTools · 08cf6ddd
      Hoogenboom, Jerry authored
      * Global changes in v0.0.4:
        * FDSTools will now print profiling information to stdout when the -d/--debug
          option was specified.
        * Fixed bug where specifying '-' as the output filename would be taken
          literally, while it should have been interpreted as 'write to standard out'
          (Affected tools: BGCorrect, Samplestats, Seqconvert, Stuttermark).
        * Added more detailed license information to FDSTools.
      * BGEstimate v1.1.0:
        * Added a new option -g/--min-genotypes (default: 3). Only alleles that occur
          in at least this number of unique heterozygous genotypes will be
          considered. This is to avoid 'contamination' of the noise profile of one
          allele with the noise of another. If homozygous samples are available for
          an allele, this filter is not applied to that allele. Setting this option
          to 1 effectively disables it. This option has the same cascading effect as
          the -s/--min-samples option, that is, if one allele does not meet...
      08cf6ddd
  12. 22 Mar, 2016 2 commits
    • Hoogenboom, Jerry's avatar
      Updates to Samplevis HTML visualisations (mostly the tables) · 3a495653
      Hoogenboom, Jerry authored
      * Added Noise column to the allele tables.
      * Added number of reads before correction to the allele tables.
      * Added raw numbers of reads to the Correction and Recovery columns of
        the allele tables.
      * Fixed issue with Samplevis HTML visualisations in Firefox and Internet
        Explorer that caused an uncessesary horizontal scroll bar in the
        options panel.
      3a495653
    • Hoogenboom, Jerry's avatar
      PctRecovery relative to number of reads after correction · 0065d2cb
      Hoogenboom, Jerry authored
      Changed:
      * The PctRecovery as used for automatic allele selection in Samplevis
        HTML visualisations as well as Samplestats is now computed w.r.t. the
        number of reads after correction, instead of the number of reads
        before correction.
      
      Added:
      * Added X_recovery columns to the output of the Samplestats tool. The
        value is equal to X_add / X_corrected * 100.
      0065d2cb
  13. 21 Mar, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Fixed Samplevis allele selection glitch, removed TSSV -L · 9565c837
      Hoogenboom, Jerry authored
      Fixed:
      * Fixed a glitch in Samplevis HTML visualisations, where it would fail
        to correctly maintain the user-(de)selected alleles when switching
        Split Markers on or off.
      
      Removed:
      * Removed -L/--check-length option from the TSSV tool, because it had no
        effect. Instead, the TSSV tool will always enforce the expected allele
        lengths specified in the library file. Behaviour has not changed.
      9565c837
  14. 09 Mar, 2016 1 commit
  15. 08 Mar, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Improved sequence aligment (variant calling) quality · e1bd3e41
      Hoogenboom, Jerry authored
      FDSTools would sometimes produce suboptimal alignments. Most notably, it
      it would produce multiple smaller insertions/deletions when the
      difference between two sequences could be described by one larger
      insertion/deletion in combination with a base substitution. The latter
      description is often more biologically sound and also usually results in
      a shorter allele name.
      
      * Fixed a bug that sometimes caused FDSTools to choose an incorrect path
        through the alignment matrix, producing a suboptimal alignment.
      * Tweaked the alignment parameters to produce more meaningful results.
      e1bd3e41
  16. 29 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Expected allele lengths, and more · 29fcc171
      Hoogenboom, Jerry authored
      Added:
      * Added a new section expected_allele_length to the FDSTools library
        format. In this section, the minimum and (optionally) maximum allele
        length of each marker can be specified.
      * Added -L/--check-length option to the TSSV tool. If specified, the
        tool will use the expected_allele_length values to filter the results.
      * Samplevis can now truncate long allele names to a given number of
        characters (defaulting to 70).
      * Added an option to Samplestats to keep negatives when filtering (abs
        filter).
      
      Changed:
      * Renamed the --aggregate-below-minimum option of the TSSV tool to
        --aggregate-filtered.
      
      Improved:
      * Added an option to read_sample_data_file such that other code can
        request or require that the X_corrected columns are used.
      * Samplestats will now round to 4 or 5 significant digits if a value is
        above 1000 or 10000, respectively.
      * BGHomRaw will no longer round the forward, reverse, and total columns.
      * When generating mtDNA allele names, FDSTools will now try to avoid
        creating gaps in the alignment of the sequences against the reference.
      * Grouped the filtering options of the TSSV tool in its help text.
      * Cleaned up some leftover code for special sequence value handling
        (more specifically: code that expected ensure_sequence_format to
        return False for special sequence values, which it no longer does).
      * Cleaned up some dead legacy code in reduce_read_counts.
      29fcc171
  17. 25 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Introducing findnewalleles · 4148bb50
      Hoogenboom, Jerry authored
      New tool findnewalleles:
      * Given a list of known sequences, this tool can go through sample data
        files to mark all sequences that are not on the list.
      
      Fixed:
      * BGHomRaw, BGEstimate, BGHomStats, Stuttermodel, and Blame did not
        ignore the 'Other sequences' and 'No data' values that may occur in
        the place of a sequence as they were supposed to.
      
      Improved:
      * BGHomRaw will now include the sample tag in the "Missing allele X of
        marker Y" error message.
        
      Changed:
      * The -F/--sequence-format argument from BGHomRaw now defaults to "raw".
      
      Visualisations:
      * Updated Vega to version 2.5.0.
      * The new version of Vega allowed the sorting to be fixed in Samplevis,
        Profilevis, BGRawvis, and Stuttermodelvis.
      * Samplevis:
        * The 'Other sequences' bars are now drawn with an outline only.
        * STR alleles are now sorted by allele length by default (this can be
          toggled with a checkbox in HTML visualisations, and with an option
          in the Vis tool).
        * Fixed the clipping of the start of long allele names when printing
          SVGs from Google Chrome.
        * Added a note (as '?' help tooltip) to the Common axis range option
          in the HTML visualisation, to inform the user of the fact that the
          Split markers option needs to be off for it to work.
      4148bb50
  18. 22 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Fixed a crash in Samplestats, and minor improvements · 13e0d781
      Hoogenboom, Jerry authored
      Fixed:
      * Fixed crash in Samplestats. It would crash if BGCorrect columns were
        present.
      * Fixed glitch in Samplevis that allowed clicking the 'Other sequences'
        bars if the input data already contained the 'Other sequences' entry.
      
      Improved:
      * The TSSV tool will now drop any sequences that contain anything other
        than A, C, T, and G. If the -A option is given, these sequences will
        still be added to the marker aggregates. Many other tools will fail
        when confronted with such invalid sequences, especially when allele
        names need to be generated.
      * In Samplevis, the sequences are now consistently sorted (except for
        some inconsistency caused by a bug in Vega). The sorting is based on
        read counts and is the same as used for the allele tables in Samplevis
        HTML visualisations.
      * Added a comment line that mentions genome build GRCh38 and rCRS to the
        genome_position block in the libconvert output. This is mainly for
        documentation purposes; users are free to change this line if they use
        a different reference.
      * Minor styling changes to Samplevis HTML visualisations.
      13e0d781
  19. 15 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Redesigned Samplevis HTML visualisations · 8d67efec
      Hoogenboom, Jerry authored
      * Samplevis now features a responsive design. The options have been
        moved from the overlay into a menu bar that changes place and shape
        depending on the width and height of the viewport.
      * All option labels are now clickable. When clicked, the corresponding
        option receives focus.
      * The 'Save image', 'Save table', and 'Clear manually added/removed'
        links are now always visible, but change appearance when unavailable.
      * When a 'No data' line is found for one or more markers, a warning is
        displayed at the bottom of the screen.
      * Fixed bug that caused user-selected and user-removed alleles to get
        lost when the corresponding marker is filtered out using the marker
        name filter.
      * Fixed bug in the printing stylesheet that caused conforming browsers
        to break pages between the graph and the table of a marker, instead of
        avoiding to do so.
      * In HTML visualisations with embedded data, the name of the sample data
        file is now shown in the place of the file selection element.
      
      Other Samplevis fixes and improvements:
      * Added option to show sequences that are filtered from the graphs as a
        single 'Other sequences' aggregate entry per marker (default: on).
      * For alleles that end up at a negative read count after correction now
        have a strand balance line in the 'overlap' portion of their bar only.
      * The strand bias mark is now correctly positioned when using the square
        root scale.
      
      Improved:
      * HTML visualisations with embedded data will now use a proper filename
        for the 'Save graph' and 'Save table' options.
      8d67efec
  20. 04 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Updated handling of 'No data' and 'Other sequences' · 9ed2f3d1
      Hoogenboom, Jerry authored
      Improved:
      * Added -A/--aggregate-below-minimum option to the TSSV tool. This will
        add a line with 'Other sequences' to the output summing all sequences
        that were not reported because they had less reads than was specified
        with the -a/--minimum option.
      * Clarified the help text for the -D/--dir option of the TSSV tool.
      
      Fixed:
      * Updated all tools to consistently handle cases where 'No data' or
        'Other sequences' occurs in place of a sequence.
      9ed2f3d1
  21. 02 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Big update: Bumped version to v0.0.3 · ebf700a7
      Hoogenboom, Jerry authored
      Updated Stuttermark to v1.5. WARNING: This version of Stuttermark is
      INCOMPATIBLE with output from previous versions of FDSTools and TSSV.
      
      Introducing TSSV-Lite
      * New tool tssv acts as a wrapper around TSSV-Lite (tssvl). Its primary
        purpose is to allow running TSSV-Lite without having to convert the
        FDSTools library to TSSV format, and to offer allelename output. Like
        all other tools in FDSTools, it also works with TSSV library files but
        its allele name generation capabilities are limited in that case.
      
      Changed:
      * TSSV-Lite and the new TSSV tool in FDSTools have two columns renamed
        w.r.t. the original TSSV program: 'name' has been changed to 'marker',
        and 'allele' has been changed to 'sequence'. All tools in FDSTools
        have been updated to use the new column names. This change affects
        Allelefinder, BGCorrect, BGEstimate, BGHomRaw, BGHomStats, BGPredict,
        Blame, Samplestats, Samplevis, Stuttermark, Stuttermodel, and
        Seqconvert. Note that this change will BREAK COMPATIBILITY of these
        tools with old data files.
      
      Fixed:
      * In Samplevis HTML visualisations, the "percentage recovery" table
        filtering option used the absolute number of recovered reads instead.
      * Added PctRecovery to the tables in Samplevis HTML visualisations.
      * BGPredict will now print a nice error message if the -n/--min-pct
        option is set to zero or a negative number, to avoid division by zero.
      * Samplestats would crash if the input file contained the flags column.
      * FDSTools would crash when trying to convert sequences to allele names
        using a TSSV library.
      
      Improved:
      * Libconvert will no longer include duplicate sequences in the STR
        defenition when converting to TSSV format and the reference sequence
        of one of the markers is the same as one of its aliases, or when
        aliases of one marker share one or more prefix or suffix sequences.
      * Updated add_input_output_args() such that the output file is a
        positional argument (instead of -o) for tools that have a single input
        file and no support for batches.
      * Updated add_sequence_format_args() such that the library file can be
        made a required argument.
      * Refined the FDSTools package description, since FDSTools does more
        than just noise filteirng.
      * FDSTools will now do a marginally better job at producing allele names
        for sequences that do not exactly match the provided STR pattern. When
        seeking the longest matching portion of the sequence, it will now also
        test the reversed sequence with a reversed pattern, which sometimes
        yields a longer match. It is still not optimal, though, but some
        refactoring has been done to move away from regular expressions.
      * BGCorrect will now also fill in correction_flags for newly added
        sequences.
      * Adjusted the help text of Samplestats to include the fact that the -c
        and -y options have an OR relation instead of an AND relation.
      * BGCorrect, BGEstimate, BGHomRaw, BGHomStats, BGPredict, and
        Stuttermodel will now ignore special values that may appear in the
        place of a sequence (currently: 'Other sequences' and 'No data').
      
      Removed:
      * The -m/--marker-column and -a/--allele-column arguments of BGPredict
        had no effect and have been removed.
      
      Visualisations:
      * Updated bundled D3 to v3.5.12.
      * In HTML visualisations, if the page is scrolled to the right edge when
        an option is changed that causes the graphs to become wider, the page
        now remains scrolled to the right.
      * Samplevis HTML visualisations:
        * Added 'Clear manually added/removed' link to the table filtering.
        * Reduced flicker of the mouse cursor in Internet Explorer.
        * Added 'Common axis range' checkbox (only available when 'Split
          markers' is off).
        * Added 'Save table' link to save the table of selected alleles to a
          tab-separated file.
        * Added 'PctRecovery' column to the tables of selected alleles.
        * An alert box is now shown when a data file is loaded that contains
          markers that have 'No data'.
        * Added 'Percentage of total reads' to the graph filtering options.
        * Added a note to the table filtering options to explain that the
          minimum percentage correction and recovery have an OR relation.
      ebf700a7
  22. 18 Jan, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Various fixes and improvements · 4f9286e4
      Hoogenboom, Jerry authored
      Fixed:
      * Fixed a crash in BGMerge.
      * Fixed bug in BGCorrect that resulted in incorrect values in the
        *_add and *_corrected columns (yes, you, 8685a304).
      * Fixed a glitch in BGCorrect that prevented it from ever writing
        corrected_bgestimate in the correction_flags column.
      
      Improved:
      * BGEstimate will now include the sample tag in the error messages for
        missing alleles and alleles with 0 reads.
      * Strand bias lines in Samplevis are now clamped to the 0-100% range.
        BGCorrect may cause forward read percentages outside this range.
      
      Visualisations:
      * Updated Vega to version 2.4.2.
      * Fixed drag-'n-drop behaviour for HTML visualisations in Internet
        Explorer and Firefox.
      * Fixed the Save Image link when viewing HTML visualisations in
        Internet Explorer 10 and above.
      * Added http-equiv="X-UA-Compatible" content="IE=edge" meta-tag to all
        visualisations to prevent Internet Explorer from entering quirks mode.
      * Samplevis:
        * Fixed glitch that would sometimes cause a second horizontal scroll
          bar to appear.
        * Graphs now render much more quickly when 'Split markers' is on, and
          Chrome no longer crashes on large sample files with this option set.
      4f9286e4
  23. 09 Dec, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Filtering and aggregation in Samplestats · a3e610e8
      Hoogenboom, Jerry authored
      Fixed:
      * When converting STR allele names to sequences, FDSTools would reject
        any prefix variants with a false message stating that the variant does
        not match the reference sequence.
      * The Samplestats tool would not allow the -b/--min-per-strand option to
        be set to zero.
      
      Improved:
      * Moved the flags generated by BGCorrect to a new column named
        correction_flags. Some of the values have been renamed for clarity,
        and this column now always contains a value.
        * The Samplestats tool will no longer add the not_corrected flag to
          each sequence, as it does not add the correction_flags column.
      * The Samplestats tool now supports filtering sequences. For filtering,
        the same set of options is available as those used for marking
        alleles. The filtering options use upper case letters and have '-filt'
        appended to their long name. The new -a/--filter-action option defines
        what should be done with filtered sequencies. 'off', the default,
        disables filtering; 'combine' replaces filtered sequences with a new
        line containing aggregated data; 'delete' removes filtered sequences
        without leaving a trace.
        * The seqconvert tool is aware of the special 'Other sequences' value
          produced by Samplestats with -a/--filter-action set to 'combine'.
      	Other tools will give an informative error message when the input
      	contains this special value.
      * The Samplestats tool now accepts non-integer and negative numbers for
        -n/--min-reads and -b/--min-per-strand because after correction read
        counts are not necessarily nonnegative integers anymore.
      * The forward_correction and reverse_correction columns of Samplestats
        will now contain 0 if the sequence had exactly 0 reads both before and
        after correction (previously, this was -100).
      * Renamed the _mp columns of Samplestats to _mp_sum ("per-marker
        percentage of the sum") and introduced _mp_max columns ("per-marker
        percentage of the maximum").
      * Samplestats and Samplevis HTML visualisations will now mark a sequence
        as 'allele' if the minimum amount of correction OR the minimum number
        of recovered reads is reached (as opposed to AND). This allows alleles
        on stutter positions to be detected.
      
      Changed:
      * The -r/--min-recovery option of Samplestats has been renamed to
        -y/--min-recovery, analogous to the new -Y/--min-recovery-filt.
      
      Visualisations:
      * Updated Vega to version 2.4.1.
      * Replaced the regular expression-based filters in all visualisations
        with a much simpler syntax. The new syntax uses space-separated search
        terms, defaulting to a 'contains'-type search method. If any search
        term is preceded by an equals sign, that term must be matched exactly.
        (The search terms themselves are actually still matched as regexes!)
      * Added 'show negative alleles' option (default on) to Samplevis. When
        enabled, the graph filtering options work on abs(value) instead of the
        value itself.
      * When sorting alleles in Samplevis, the allele name is now used as the
        final tiebreaker instead of the primary sorting column.
      * HTML visualisations no longer re-render the entire graph when changing
        the width. The same holds true for the height setting of Allelevis.
      * The tables in Samplevis HTML visualisations will now contain the
        information from BGCorrect's correction_flags column in the Notes
        column.
      a3e610e8
  24. 03 Dec, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Bug fixes and improvements in allele name gen and auto allele selection · 7820cad0
      Hoogenboom, Jerry authored
      Fixed:
      * In Samplevis HTML visualisations, the automatic allele selection was
        only checking the number of reverse reads for the 'minimum number of
        reads per orientation' setting.
      * In Samplevis HTML visualisations, automatic allele selection would
        fail to select alleles that had exactly the given minimum number of
        reads.
      * FDSTools would sometimes calculate incorrect and even negative repeat
        counts when producing TSSV-style sequences and allele names for
        sequences that did not exactly fit the STR structure given in the
        library.
      
      Improved:
      * The Samplestats tool now offers the same possibilities to mark alleles
        as Samplevis HTML visualisations do.
      * In Samplevis HTML visualisations, user-removed alleles now have a line
        through their table row.
      * Added a reference to https://docs.python.org/howto/regex in the sample
        tag parsing options section of the help text of many tools.
      * FDSTools will now do a better job of finding the longest possible
        match of the STR repeat definition to produce TSSV-style sequences and
        allele names for seqences that do not exactly fit the STR structure
        given in the library.
      
      Added:
      * New visualisation type 'allele'. With Allelevis, you can generate a
        graph of the alleles of the reference samples (output from
        Allelefinder). (Known bug: it has a 'funny' amount of padding.)
      7820cad0
  25. 01 Dec, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Grand update to all visualisations, especially Samplevis · e7517bbd
      Hoogenboom, Jerry authored
      Fixed:
      * The Vis tool no longer crashes if you specify '-' as the input file
        without piping data in from another program. It will just produce a
        visualisation file with no embedded data instead.
      * FDSTools would crash when generating an allele name for a sequence of
        an STR marker that contained the prefix and suffix of the marker but
        not the actual STR (yes, this happened).
      * Stuttermodelvis would draw all 'All data' fits in the graphs of all
        repeat unit sequences, instead of just the 'All data' fit that was
        fitted to the data of a particular repeat sequence.
      
      Improved:
      * BGHomStats, BGHomRaw, and Samplestats now round their output to three
        significant digits.
      * BGCorrect now rounds its output to 3 decimal positions.
      
      Various enhancements to Samplevis HTML visualisations:
      * Added a whole new set of options which are used to automatically
        select the true alleles in a sample.
      * Added an option to split the graphs and the table up per marker.
      * The selected alleles are no longer lost when the graphs are
        re-rendered due to changed options.
      * Added some more columns to the table of selected alleles and made the
        table prettier.
      * Added a dedicated stylesheet for printing, which transforms the web
        page into a nicely formatted report when printed.
      * Option groups can now be hidden separately.
      * Filtering options are now based on the read numbers after correction.
      * The mouse cursor now changes to a 'pointer' style cursor (usually a
        hand with stretched index finger) when hovered over the clickable
        portion of the graph.
      
      Visualisations:
      * Updated Vega to version 2.4.0 and d3 to version 3.5.10.
      * All visualisations now use signals to set the options. This allows
        them to be updated without re-parsing the entire graph spec in most
        cases, which is much faster.
      * Using new cross-and-filter capabilities in bgrawvis, profilevis,
        samplevis, and stuttermodelvis. This greatly reduces Vega's memory
        usage and speeds up rendering.
      * The name of the currently loaded data file is prepended to the page
        title in HTML visualisations.
      * If a file is loaded into an HTML visualisation by drag-and-drop, the
        name of the loaded file is displayed on the file input element.
      * A new -T/--title option for the Vis tool allows for specifying
        something that should be prepended to the page title of HTML
        visualisations. This is particularly useful when data is piped in,
        because no file name is available in that case.
      * Asynchronous rendering of visualisations is now cancelled if a new
        asynchronous rendering task has already been scheduled (HTML
        visualisations only).
      e7517bbd
  26. 23 Nov, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Introducing Samplestats · 559ee083
      Hoogenboom, Jerry authored
      * New tool Samplestats computes various sequence-centric statistics for
        sample data files. Most statistics relate to correction amounts and
        are thus only included if the input file contains BGCorrect columns.
      * The starting position can now be ommitted from the [genome_position]
        in FDSTools library files. A default value of 1 will be used in this
        case.
      * The setup.py script can now also be run without explicitly specifying
        Python as the interpreter (it now has a shebang line).
      559ee083
  27. 16 Nov, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Various fixes and improvements · 313867bc
      Hoogenboom, Jerry authored
      Fixed:
      * The 'to' base in variants called on mtDNA was incorrect. This bug could also cause FDSTools to crash.
      * FDSTools would crash if you tried to generate an allele name for a primer dimer of an mtDNA marker. (Now, you get an insane but entirely accurate allele name instead.)
      * Fixed bug that caused some perfectly valid mtDNA allele names to be rejected when attempting to convert them back to raw sequences.
      
      Improved:
      * You can now also specify the ending position of the markers in the FDSTools library. If you do, you may also additionally specify a second start position (and optionally also a second end position, and so on). FDSTools will interpret this as that the marker is the concatenation of each of these fragments. This was primarily introduced to support mtDNA fragments that contain (somewhere in the middle) the origin of mtDNA base numbering.
      * More helpful error message when format violations are detected while parsing the library file.
      * More helpful error message when the -e/--tag-expr regular expression could not be compiled.
      * Added a paragraph about sequence alignment caching to the help text of Seqconvert.
      * Added a 'flags' column to BGCorrect output, which gives information about the data that was used to do the correction.
      
      Background noise profiles:
      * Removed -C/--cross-tabular option from BGEstimate, BGPredict, and BGMerge and also removed the ability to read files in this format.
      * BGEstimate, BGHomStats, and BGPredict now add a column 'tool' with their name to the output.
      313867bc
  28. 10 Sep, 2015 1 commit
    • jhoogenboom's avatar
      Finishing StuttermodelVis · 4eee1a33
      jhoogenboom authored
      * Properly implemented the options on the StuttermodelVis HTML
        visualisation.
      * Added filtering options for marker and repeat unit to
        StuttermodelVis.
      * Added StuttermodelVis to the Vis tool.
      
      General visualisation changes:
      * Updated Vega to v2.2.4.
      * Fixed glitch that caused mouseover events in HTML visualisations
        to stop working after the renderer was switched.
      * The file name suggested by the Save Image link in HTML
        visualisations is now derived from the name of the loaded data
        file.
      4eee1a33
  29. 03 Sep, 2015 1 commit
    • jhoogenboom's avatar
      Introducing StuttermodelVis (not complete yet) · e0eef88d
      jhoogenboom authored
      * Added StuttermodelVis HTML file and JSON spec. The rendering
        works, but some of the options are not implemented yet. It is
        also not yet added to the Vis tool.
      * Changed the order of stuttermodel's coefficients: 'a' used to be
        the most significant coefficient, now it is the least significant
        coefficient (the shift). The benefit of this is that when moving
        to higher-order polynomials, the extra coefficients do not change
        the meaning of the others. So 'a' is now always the shift, 'b' is
        the linear component, 'c' the quadratic, etc.
      * Added some development notes (including todo list) that I had
        kept outside of the project until now.
      e0eef88d