1. 08 Mar, 2017 1 commit
    • Hoogenboom, Jerry's avatar
      FDSTools v1.1.0.dev3: Fixes and pipelining enhancements · be8dbe46
      Hoogenboom, Jerry authored
      * General changes in v1.1.0.dev3:
        * Allele name heuristics: don't produce insertions at the end of the
          prefix or at the beginning of the suffix; just include extra STR
          blocks.
        * FDSTools will no longer crash with a 'column not found' error when
          an input file is empty. This situation is now treated as if the
          expected columns existed, but no lines of actual data were present.
          This greatly helps in tracking down issues in pipelines involving
          multiple tools, as tools will now shutdown gracefully if an upstream
          tool fails to write output.
      * Allelefinder v1.0.1:
        * Fixed crash that occurred when converting sequences to allele names
          format while no library file was provided.
        * Don't crash when output pipe is closed.
      * BGAnalyse v1.0.1:
        * Don't crash when output pipe is closed.
      * BGCorrect v1.0.2:
        * Don't crash on empty input files.
        * Don't crash when output pipe is closed.
      * BGEstimate v1.1.2:
        * Don't crash when output pipe is closed.
      * BGHomRaw v1.0.1:
        * Clarified the 'Allele x of marker y has 0 reads' error message with
          the sample tag.
        * Don't crash when output pipe is closed.
      * BGHomStats v1.0.1:
        * Error messages about the input data now contain the sample tag of
          the sample that triggered the error.
        * Don't crash when output pipe is closed.
      * BGMerge v1.0.3:
        * Don't crash when output pipe is closed.
      * BGPredict v1.0.2:
        * Don't crash on empty input files.
        * Don't crash when output pipe is closed.
      * FindNewAlleles v1.0.1:
        * Don't crash on empty input files.
        * Don't crash when output pipe is closed.
      * Libconvert v1.1.2:
        * Don't crash when output pipe is closed.
      * Library v1.0.3:
        * Don't crash when output pipe is closed.
      * Seqconvert v1.0.2:
        * Don't crash when output pipe is closed.
      * Samplestats v1.1.1:
        * Don't crash on empty input files.
        * Don't crash when output pipe is closed.
      * Stuttermark v1.5.1:
        * Don't crash on empty input files.
        * Don't crash when output pipe is closed.
      * Stuttermodel v1.1.2:
        * Don't crash when output pipe is closed.
      * TSSV v1.1.0 (additionally):
        * When running analysis in parallel, make tasks of 1 million alignments.
          Previously, this was 10k reads, with the number of alignments per task
          depending on the size of the library file. This caused memory issues for
          huge libraries like whole mt interval libraries.
        * Don't crash when output pipe is closed.
      * Vis v1.0.4:
        * Don't crash when output pipe is closed.
      be8dbe46
  2. 26 Jul, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Various bug fixes and refinements throughout FDSTools · 08cf6ddd
      Hoogenboom, Jerry authored
      * Global changes in v0.0.4:
        * FDSTools will now print profiling information to stdout when the -d/--debug
          option was specified.
        * Fixed bug where specifying '-' as the output filename would be taken
          literally, while it should have been interpreted as 'write to standard out'
          (Affected tools: BGCorrect, Samplestats, Seqconvert, Stuttermark).
        * Added more detailed license information to FDSTools.
      * BGEstimate v1.1.0:
        * Added a new option -g/--min-genotypes (default: 3). Only alleles that occur
          in at least this number of unique heterozygous genotypes will be
          considered. This is to avoid 'contamination' of the noise profile of one
          allele with the noise of another. If homozygous samples are available for
          an allele, this filter is not applied to that allele. Setting this option
          to 1 effectively disables it. This option has the same cascading effect as
          the -s/--min-samples option, that is, if one allele does not meet the
          threshold, the samples with this allele are excluded which may cause some
          of the other alleles of these samples to fall below the threshold as well.
      * Stuttermodel v1.1.0:
        * Stuttermodel will now only output a fit for one strand if it could also
          obtain a fit for the other strand (for the same marker, unit, and stutter
          depth). This new behaviour can be disabled with a new -O/--orphans option.
        * Fixed bug that caused Stuttermodel to output only the raw data points for
          -1 and +1 stutter when normal output was supressed.
      * BGCorrect v1.0.1:
        * Added new column 'weight' to the output. The value in this column expresses
          the number of times that the noise profile of that allele fitted in the
          sample.
      * Samplestats v1.0.1:
        * Samplestats will now round to 4 or 5 significant digits if a value is
          above 1000 or 10000, respectively. Previously, this was only done for the
          combined 'Other sequences' values.
        * The 'Other sequences' lines will now also include values for
          total_recovery, forward_recovery, and reverse_recovery.
        * The total_recovery, forward_recovery, and reverse_recovery columns are no
          longer placed to the left of all the other columns generated by
          Samplestats.
        * The help text for Samplestats erroneously listed the X_recovery_pct instead
          of X_recovery.
        * Added support for the new 'weight' column produced by BGCorrect when the
          -a/--filter-action option is set to 'combine'.
      * BGPredict v1.0.1:
        * Greatly reduced memory usage.
        * BGPredict will now output nonzero values below the threshold set by
          -n/--min-pct if the predicted noise ratio of the same stutter on the other
          strand is above the threshold. Previously, values below the threshold were
          clipped to zero, which may cause unnecessarily high strand bias in the
          predicted profile.
      * BGMerge v1.0.1:
        * Reduced memory usage.
      * TSSV v1.0.1:
        * Renamed the '--is_fastq' option to '--is-fastq'. It was the only option
          with an underscore instead of a hyphen in FDSTools.
        * Fixed crash that would occur if -F/--sequence-format was set to anything
          other than 'raw'.
      * Libconvert v1.0.1:
        * Specifying '-' as the first positional argument to libconvert will now
          correctly interpret this as "read from stdin" instead of throwing a "file
          not found" error (or reading from a file named "-" if it exists).
      * Seqconvert v1.0.1:
        * Internal naming of the first positional argument was changed from 'format'
          to 'sequence-format'. This was done for consistency with the
          -F/--sequence-format option in other tools, giving it the same name in
          Pipeline configuration files.
      * Vis v1.0.1:
        * Added -j/--jitter option for Stuttermodelvis (default: 0.25).
        * Vis would not allow the -n/--min-abs and the -s/--min-per-strand options to
          be set to 0.
      * Stuttermodelvis v1.0.0beta2:
        * HTML visualisations now support drawing raw data points on top of the fit
          functions. The points can be drawn with an adjustable jitter to reduce
          overlap.
        * Fixed a JavaScript crash that would occur in HTML visualisations if the
          Repeat unit or Marker name filter resulted in an invalid regular expression
          (e.g., when the entered value ends with a backslash).
        * Reduced Vega graph spec complexity by using the new Rank transform to
          position the subgraphs.
        * HTML visualisations made with the -O/--online option of the Vis tool will
          now contain https URLs instead of http.
      * Samplevis v1.0.1:
        * Fixed a JavaScript crash that would occur in HTML visualisations if the
          Marker name filter resulted in an invalid regular expression (e.g., when
          the entered value ends with a backslash).
        * Reduced Vega graph spec complexity by using the new Rank transform to
          position the subgraphs.
        * Fixed a glitch where clicking the 'Truncate sequences to' label would
          select the marker spacing input.
        * The 'Notes' table cells with 'BGPredict' in them now get a light orange
          background to warn the user that their background profile was computed.
          If a sequence was explicitly 'not corrected', 'not in ref db', or
          'corrected as background only', the same colour is used.
        * The message bar at the bottom of Samplevis HTML visualisations will now
          grow no larger than 3 lines. A scroll bar will appear as needed.
        * HTML visualisations made with the -O/--online option of the Vis tool will
          now contain https URLs instead of http.
      * BGRawVis v1.0.1:
        * Fixed a JavaScript crash that would occur in HTML visualisations if the
          Marker name filter resulted in an invalid regular expression (e.g., when
          the entered value ends with a backslash).
        * Reduced Vega graph spec complexity by using the new Rank transform to
          position the subgraphs.
        * HTML visualisations made with the -O/--online option of the Vis tool will
          now contain https URLs instead of http.
      * Profilevis v1.0.1:
        * Fixed a JavaScript crash that would occur in HTML visualisations if the
          Marker name filter resulted in an invalid regular expression (e.g., when
          the entered value ends with a backslash).
        * Reduced Vega graph spec complexity by using the new Rank transform to
          position the subgraphs.
        * HTML visualisations made with the -O/--online option of the Vis tool will
          now contain https URLs instead of http.
      * Allelevis v1.0.0beta2:
        * Fixed potential crash/corruption that could occur with very unfortunate
          combinations of sample names and marker names.
        * HTML visualisations made with the -O/--online option of the Vis tool will
          now contain https URLs instead of http.
        * Added two more colours to the legend, such that a maximum of 22 markers is
          now supported without re-using colours.
      * Updated bundled D3 to v3.5.17.
      * Updated bundled Vega to v2.6.0.
      08cf6ddd
  3. 08 Mar, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Version numbering for everything · da8d1d1b
      Hoogenboom, Jerry authored
      * Turned all '0.1dev' tool version numbers to '1.0.0'.
      * Changed Stuttermark's version number from '1.5' to '1.5.0'.
      * Added version numbers to the visualisations.
      * Updated README.rst to include all tools, but removed the usage details
        of Stuttermark because it is highly impractical to include usage
        details for all tools in the README file. I'll leave that to the
        -h/--help option and the yet-to-write FDSTools User's Handbook.
      da8d1d1b
  4. 04 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Updated handling of 'No data' and 'Other sequences' · 9ed2f3d1
      Hoogenboom, Jerry authored
      Improved:
      * Added -A/--aggregate-below-minimum option to the TSSV tool. This will
        add a line with 'Other sequences' to the output summing all sequences
        that were not reported because they had less reads than was specified
        with the -a/--minimum option.
      * Clarified the help text for the -D/--dir option of the TSSV tool.
      
      Fixed:
      * Updated all tools to consistently handle cases where 'No data' or
        'Other sequences' occurs in place of a sequence.
      9ed2f3d1
  5. 02 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Big update: Bumped version to v0.0.3 · ebf700a7
      Hoogenboom, Jerry authored
      Updated Stuttermark to v1.5. WARNING: This version of Stuttermark is
      INCOMPATIBLE with output from previous versions of FDSTools and TSSV.
      
      Introducing TSSV-Lite
      * New tool tssv acts as a wrapper around TSSV-Lite (tssvl). Its primary
        purpose is to allow running TSSV-Lite without having to convert the
        FDSTools library to TSSV format, and to offer allelename output. Like
        all other tools in FDSTools, it also works with TSSV library files but
        its allele name generation capabilities are limited in that case.
      
      Changed:
      * TSSV-Lite and the new TSSV tool in FDSTools have two columns renamed
        w.r.t. the original TSSV program: 'name' has been changed to 'marker',
        and 'allele' has been changed to 'sequence'. All tools in FDSTools
        have been updated to use the new column names. This change affects
        Allelefinder, BGCorrect, BGEstimate, BGHomRaw, BGHomStats, BGPredict,
        Blame, Samplestats, Samplevis, Stuttermark, Stuttermodel, and
        Seqconvert. Note that this change will BREAK COMPATIBILITY of these
        tools with old data files.
      
      Fixed:
      * In Samplevis HTML visualisations, the "percentage recovery" table
        filtering option used the absolute number of recovered reads instead.
      * Added PctRecovery to the tables in Samplevis HTML visualisations.
      * BGPredict will now print a nice error message if the -n/--min-pct
        option is set to zero or a negative number, to avoid division by zero.
      * Samplestats would crash if the input file contained the flags column.
      * FDSTools would crash when trying to convert sequences to allele names
        using a TSSV library.
      
      Improved:
      * Libconvert will no longer include duplicate sequences in the STR
        defenition when converting to TSSV format and the reference sequence
        of one of the markers is the same as one of its aliases, or when
        aliases of one marker share one or more prefix or suffix sequences.
      * Updated add_input_output_args() such that the output file is a
        positional argument (instead of -o) for tools that have a single input
        file and no support for batches.
      * Updated add_sequence_format_args() such that the library file can be
        made a required argument.
      * Refined the FDSTools package description, since FDSTools does more
        than just noise filteirng.
      * FDSTools will now do a marginally better job at producing allele names
        for sequences that do not exactly match the provided STR pattern. When
        seeking the longest matching portion of the sequence, it will now also
        test the reversed sequence with a reversed pattern, which sometimes
        yields a longer match. It is still not optimal, though, but some
        refactoring has been done to move away from regular expressions.
      * BGCorrect will now also fill in correction_flags for newly added
        sequences.
      * Adjusted the help text of Samplestats to include the fact that the -c
        and -y options have an OR relation instead of an AND relation.
      * BGCorrect, BGEstimate, BGHomRaw, BGHomStats, BGPredict, and
        Stuttermodel will now ignore special values that may appear in the
        place of a sequence (currently: 'Other sequences' and 'No data').
      
      Removed:
      * The -m/--marker-column and -a/--allele-column arguments of BGPredict
        had no effect and have been removed.
      
      Visualisations:
      * Updated bundled D3 to v3.5.12.
      * In HTML visualisations, if the page is scrolled to the right edge when
        an option is changed that causes the graphs to become wider, the page
        now remains scrolled to the right.
      * Samplevis HTML visualisations:
        * Added 'Clear manually added/removed' link to the table filtering.
        * Reduced flicker of the mouse cursor in Internet Explorer.
        * Added 'Common axis range' checkbox (only available when 'Split
          markers' is off).
        * Added 'Save table' link to save the table of selected alleles to a
          tab-separated file.
        * Added 'PctRecovery' column to the tables of selected alleles.
        * An alert box is now shown when a data file is loaded that contains
          markers that have 'No data'.
        * Added 'Percentage of total reads' to the graph filtering options.
        * Added a note to the table filtering options to explain that the
          minimum percentage correction and recovery have an OR relation.
      ebf700a7
  6. 23 Nov, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Introducing Samplestats · 559ee083
      Hoogenboom, Jerry authored
      * New tool Samplestats computes various sequence-centric statistics for
        sample data files. Most statistics relate to correction amounts and
        are thus only included if the input file contains BGCorrect columns.
      * The starting position can now be ommitted from the [genome_position]
        in FDSTools library files. A default value of 1 will be used in this
        case.
      * The setup.py script can now also be run without explicitly specifying
        Python as the interpreter (it now has a shebang line).
      559ee083
  7. 16 Nov, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Various fixes and improvements · 313867bc
      Hoogenboom, Jerry authored
      Fixed:
      * The 'to' base in variants called on mtDNA was incorrect. This bug could also cause FDSTools to crash.
      * FDSTools would crash if you tried to generate an allele name for a primer dimer of an mtDNA marker. (Now, you get an insane but entirely accurate allele name instead.)
      * Fixed bug that caused some perfectly valid mtDNA allele names to be rejected when attempting to convert them back to raw sequences.
      
      Improved:
      * You can now also specify the ending position of the markers in the FDSTools library. If you do, you may also additionally specify a second start position (and optionally also a second end position, and so on). FDSTools will interpret this as that the marker is the concatenation of each of these fragments. This was primarily introduced to support mtDNA fragments that contain (somewhere in the middle) the origin of mtDNA base numbering.
      * More helpful error message when format violations are detected while parsing the library file.
      * More helpful error message when the -e/--tag-expr regular expression could not be compiled.
      * Added a paragraph about sequence alignment caching to the help text of Seqconvert.
      * Added a 'flags' column to BGCorrect output, which gives information about the data that was used to do the correction.
      
      Background noise profiles:
      * Removed -C/--cross-tabular option from BGEstimate, BGPredict, and BGMerge and also removed the ability to read files in this format.
      * BGEstimate, BGHomStats, and BGPredict now add a column 'tool' with their name to the output.
      313867bc
  8. 04 Nov, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Implemented support for non-STR markers, improved file handling and more · 1083919c
      Hoogenboom, Jerry authored
      Additions and improvements to the FDSTools library file format:
      * New [genome_position] section in FDSTools-style library files allows
      for specifying the chromosome and position of each marker.
      * New [no_repeat] section in FDSTools-style library files allows for
      including non-STR markers.
      * Comma/semicolon/space-separated values in FDSTools-style library files
      can now also be separated by tab characters and multiple consecutive
      separators are no longer collapsed (with the exception of whitespace).
      * If no prefix and/or suffix has been specified for an alias, the
      prefix/suffix of the marker itself is used.
      * Implemented support for non-STR markers (e.g. SNP clusters) and mtDNA
      markers. Allele names of the latter follow mtDNA nomenclature.
      * Improved the logic of generating STR allele names for sequences that
      have a prefix or suffix sequence that was not included in the library
      file.
      * Updated and clarified various explanatory texts in generated FDSTools
      library files.
      
      Fixed:
      * Fixed a bug that caused prefix/suffix variants in aliases to go
      missing in allele names.
      
      Improved file handling:
      * Library files are now closed immediately after parsing them.
      * Sample data input files are opened one at a time now.
      
      Visualisations:
      * Updated Vega to version 2.3.1.
      * Worked around a bug in Google Chrome that caused the 'Save image' link
      to stop working after having been used once.
      1083919c
  9. 03 Sep, 2015 1 commit
    • jhoogenboom's avatar
      Introducing StuttermodelVis (not complete yet) · e0eef88d
      jhoogenboom authored
      * Added StuttermodelVis HTML file and JSON spec. The rendering
        works, but some of the options are not implemented yet. It is
        also not yet added to the Vis tool.
      * Changed the order of stuttermodel's coefficients: 'a' used to be
        the most significant coefficient, now it is the least significant
        coefficient (the shift). The benefit of this is that when moving
        to higher-order polynomials, the extra coefficients do not change
        the meaning of the others. So 'a' is now always the shift, 'b' is
        the linear component, 'c' the quadratic, etc.
      * Added some development notes (including todo list) that I had
        kept outside of the project until now.
      e0eef88d
  10. 01 Sep, 2015 1 commit
    • jhoogenboom's avatar
      Cleanup and minor enhancements · 03fc3d49
      jhoogenboom authored
      * BGCorrect and Stuttermark will now exit with an error message if
        more than one input file for the same sample is specified and no
        separate output files are given. Previously these tools would
        just overwrite the output file repeatedly, discarding the output
        of all but the last data file of the sample.
      * Removed to main() functions and related stubs from the tools
        because they are not actually runnable directly anyway.
      * Added some more help text to some of the tools.
      * Doubled the size of the marker name filter input element on the
        HTML visualisations.
      03fc3d49
  11. 12 Aug, 2015 1 commit
    • jhoogenboom's avatar
      Introducing BGMerge · 6207d485
      jhoogenboom authored
      * New tool BGMerge can be used to merge background noise profiles
        (e.g., merge BGPredict output with a database previously
        obtained from BGEstimate).
      * Fixed two major bugs in BGPredict that resulted in incorrect fit
        functions being used.
      * BGEstimate, BGPredict, BGHomStats, Blame, and StutterModel no
        longer crash if a library file is specified.
      * Added reverse strand profile estimation to BGPredict.
      6207d485
  12. 11 Aug, 2015 1 commit
    • jhoogenboom's avatar
      Introducing BGPredict · 276a0439
      jhoogenboom authored
      * New tool BGPredict predicts background noise profiles (containing
        only stutter products) for user-supplied alleles/sequences using
        a trained stutter model obtained from Stuttermodel. Currently
        only the amounts of the forward strand are predicted.
      * New option -L/--min-lengths for Stuttermodel allows to set a
        minimum required number of unique repeat lengths to base the
        fits on (default: 5).
      * Updated formatting of output of Stuttermodel: added '+' sign to
        positive stutter, limited r2 scores to 3 decimal places, and now
        all coefficients are written in scientific notation with 3
        decimal places.
      * The --output-column option of SeqConvert now defaults to using
        the value of --allele-column.
      276a0439