1. 08 Mar, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Improved sequence aligment (variant calling) quality · e1bd3e41
      Hoogenboom, Jerry authored
      FDSTools would sometimes produce suboptimal alignments. Most notably, it
      it would produce multiple smaller insertions/deletions when the
      difference between two sequences could be described by one larger
      insertion/deletion in combination with a base substitution. The latter
      description is often more biologically sound and also usually results in
      a shorter allele name.
      
      * Fixed a bug that sometimes caused FDSTools to choose an incorrect path
        through the alignment matrix, producing a suboptimal alignment.
      * Tweaked the alignment parameters to produce more meaningful results.
      e1bd3e41
  2. 29 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Expected allele lengths, and more · 29fcc171
      Hoogenboom, Jerry authored
      Added:
      * Added a new section expected_allele_length to the FDSTools library
        format. In this section, the minimum and (optionally) maximum allele
        length of each marker can be specified.
      * Added -L/--check-length option to the TSSV tool. If specified, the
        tool will use the expected_allele_length values to filter the results.
      * Samplevis can now truncate long allele names to a given number of
        characters (defaulting to 70).
      * Added an option to Samplestats to keep negatives when filtering (abs
        filter).
      
      Changed:
      * Renamed the --aggregate-below-minimum option of the TSSV tool to
        --aggregate-filtered.
      
      Improved:
      * Added an option to read_sample_data_file such that other code can
        request or require that the X_corrected columns are used.
      * Samplestats will now round to 4 or 5 significant digits if a value is
        above 1000 or 10000, respectively.
      * BGHomRaw will no longer round the forward, reverse, and total columns.
      * When generating mtDNA allele names, FDSTools will now try to avoid
        creating gaps in the alignment of the sequences against the reference.
      * Grouped the filtering options of the TSSV tool in its help text.
      * Cleaned up some leftover code for special sequence value handling
        (more specifically: code that expected ensure_sequence_format to
        return False for special sequence values, which it no longer does).
      * Cleaned up some dead legacy code in reduce_read_counts.
      29fcc171
  3. 25 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Introducing findnewalleles · 4148bb50
      Hoogenboom, Jerry authored
      New tool findnewalleles:
      * Given a list of known sequences, this tool can go through sample data
        files to mark all sequences that are not on the list.
      
      Fixed:
      * BGHomRaw, BGEstimate, BGHomStats, Stuttermodel, and Blame did not
        ignore the 'Other sequences' and 'No data' values that may occur in
        the place of a sequence as they were supposed to.
      
      Improved:
      * BGHomRaw will now include the sample tag in the "Missing allele X of
        marker Y" error message.
        
      Changed:
      * The -F/--sequence-format argument from BGHomRaw now defaults to "raw".
      
      Visualisations:
      * Updated Vega to version 2.5.0.
      * The new version of Vega allowed the sorting to be fixed in Samplevis,
        Profilevis, BGRawvis, and Stuttermodelvis.
      * Samplevis:
        * The 'Other sequences' bars are now drawn with an outline only.
        * STR alleles are now sorted by allele length by default (this can be
          toggled with a checkbox in HTML visualisations, and with an option
          in the Vis tool).
        * Fixed the clipping of the start of long allele names when printing
          SVGs from Google Chrome.
        * Added a note (as '?' help tooltip) to the Common axis range option
          in the HTML visualisation, to inform the user of the fact that the
          Split markers option needs to be off for it to work.
      4148bb50
  4. 04 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Updated handling of 'No data' and 'Other sequences' · 9ed2f3d1
      Hoogenboom, Jerry authored
      Improved:
      * Added -A/--aggregate-below-minimum option to the TSSV tool. This will
        add a line with 'Other sequences' to the output summing all sequences
        that were not reported because they had less reads than was specified
        with the -a/--minimum option.
      * Clarified the help text for the -D/--dir option of the TSSV tool.
      
      Fixed:
      * Updated all tools to consistently handle cases where 'No data' or
        'Other sequences' occurs in place of a sequence.
      9ed2f3d1
  5. 02 Feb, 2016 1 commit
    • Hoogenboom, Jerry's avatar
      Big update: Bumped version to v0.0.3 · ebf700a7
      Hoogenboom, Jerry authored
      Updated Stuttermark to v1.5. WARNING: This version of Stuttermark is
      INCOMPATIBLE with output from previous versions of FDSTools and TSSV.
      
      Introducing TSSV-Lite
      * New tool tssv acts as a wrapper around TSSV-Lite (tssvl). Its primary
        purpose is to allow running TSSV-Lite without having to convert the
        FDSTools library to TSSV format, and to offer allelename output. Like
        all other tools in FDSTools, it also works with TSSV library files but
        its allele name generation capabilities are limited in that case.
      
      Changed:
      * TSSV-Lite and the new TSSV tool in FDSTools have two columns renamed
        w.r.t. the original TSSV program: 'name' has been changed to 'marker',
        and 'allele' has been changed to 'sequence'. All tools in FDSTools
        have been updated to use the new column names. This change affects
        Allelefinder, BGCorrect, BGEstimate, BGHomRaw, BGHomStats, BGPredict,
        Blame, Samplestats, Samplevis, Stuttermark, Stuttermodel, and
        Seqconvert. Note that this change will BREAK COMPATIBILITY of these
        tools with old data files.
      
      Fixed:
      * In Samplevis HTML visualisations, the "percentage recovery" table
        filtering option used the absolute number of recovered reads instead.
      * Added PctRecovery to the tables in Samplevis HTML visualisations.
      * BGPredict will now print a nice error message if the -n/--min-pct
        option is set to zero or a negative number, to avoid division by zero.
      * Samplestats would crash if the input file contained the flags column.
      * FDSTools would crash when trying to convert sequences to allele names
        using a TSSV library.
      
      Improved:
      * Libconvert will no longer include duplicate sequences in the STR
        defenition when converting to TSSV format and the reference sequence
        of one of the markers is the same as one of its aliases, or when
        aliases of one marker share one or more prefix or suffix sequences.
      * Updated add_input_output_args() such that the output file is a
        positional argument (instead of -o) for tools that have a single input
        file and no support for batches.
      * Updated add_sequence_format_args() such that the library file can be
        made a required argument.
      * Refined the FDSTools package description, since FDSTools does more
        than just noise filteirng.
      * FDSTools will now do a marginally better job at producing allele names
        for sequences that do not exactly match the provided STR pattern. When
        seeking the longest matching portion of the sequence, it will now also
        test the reversed sequence with a reversed pattern, which sometimes
        yields a longer match. It is still not optimal, though, but some
        refactoring has been done to move away from regular expressions.
      * BGCorrect will now also fill in correction_flags for newly added
        sequences.
      * Adjusted the help text of Samplestats to include the fact that the -c
        and -y options have an OR relation instead of an AND relation.
      * BGCorrect, BGEstimate, BGHomRaw, BGHomStats, BGPredict, and
        Stuttermodel will now ignore special values that may appear in the
        place of a sequence (currently: 'Other sequences' and 'No data').
      
      Removed:
      * The -m/--marker-column and -a/--allele-column arguments of BGPredict
        had no effect and have been removed.
      
      Visualisations:
      * Updated bundled D3 to v3.5.12.
      * In HTML visualisations, if the page is scrolled to the right edge when
        an option is changed that causes the graphs to become wider, the page
        now remains scrolled to the right.
      * Samplevis HTML visualisations:
        * Added 'Clear manually added/removed' link to the table filtering.
        * Reduced flicker of the mouse cursor in Internet Explorer.
        * Added 'Common axis range' checkbox (only available when 'Split
          markers' is off).
        * Added 'Save table' link to save the table of selected alleles to a
          tab-separated file.
        * Added 'PctRecovery' column to the tables of selected alleles.
        * An alert box is now shown when a data file is loaded that contains
          markers that have 'No data'.
        * Added 'Percentage of total reads' to the graph filtering options.
        * Added a note to the table filtering options to explain that the
          minimum percentage correction and recovery have an OR relation.
      ebf700a7
  6. 09 Dec, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Filtering and aggregation in Samplestats · a3e610e8
      Hoogenboom, Jerry authored
      Fixed:
      * When converting STR allele names to sequences, FDSTools would reject
        any prefix variants with a false message stating that the variant does
        not match the reference sequence.
      * The Samplestats tool would not allow the -b/--min-per-strand option to
        be set to zero.
      
      Improved:
      * Moved the flags generated by BGCorrect to a new column named
        correction_flags. Some of the values have been renamed for clarity,
        and this column now always contains a value.
        * The Samplestats tool will no longer add the not_corrected flag to
          each sequence, as it does not add the correction_flags column.
      * The Samplestats tool now supports filtering sequences. For filtering,
        the same set of options is available as those used for marking
        alleles. The filtering options use upper case letters and have '-filt'
        appended to their long name. The new -a/--filter-action option defines
        what should be done with filtered sequencies. 'off', the default,
        disables filtering; 'combine' replaces filtered sequences with a new
        line containing aggregated data; 'delete' removes filtered sequences
        without leaving a trace.
        * The seqconvert tool is aware of the special 'Other sequences' value
          produced by Samplestats with -a/--filter-action set to 'combine'.
      	Other tools will give an informative error message when the input
      	contains this special value.
      * The Samplestats tool now accepts non-integer and negative numbers for
        -n/--min-reads and -b/--min-per-strand because after correction read
        counts are not necessarily nonnegative integers anymore.
      * The forward_correction and reverse_correction columns of Samplestats
        will now contain 0 if the sequence had exactly 0 reads both before and
        after correction (previously, this was -100).
      * Renamed the _mp columns of Samplestats to _mp_sum ("per-marker
        percentage of the sum") and introduced _mp_max columns ("per-marker
        percentage of the maximum").
      * Samplestats and Samplevis HTML visualisations will now mark a sequence
        as 'allele' if the minimum amount of correction OR the minimum number
        of recovered reads is reached (as opposed to AND). This allows alleles
        on stutter positions to be detected.
      
      Changed:
      * The -r/--min-recovery option of Samplestats has been renamed to
        -y/--min-recovery, analogous to the new -Y/--min-recovery-filt.
      
      Visualisations:
      * Updated Vega to version 2.4.1.
      * Replaced the regular expression-based filters in all visualisations
        with a much simpler syntax. The new syntax uses space-separated search
        terms, defaulting to a 'contains'-type search method. If any search
        term is preceded by an equals sign, that term must be matched exactly.
        (The search terms themselves are actually still matched as regexes!)
      * Added 'show negative alleles' option (default on) to Samplevis. When
        enabled, the graph filtering options work on abs(value) instead of the
        value itself.
      * When sorting alleles in Samplevis, the allele name is now used as the
        final tiebreaker instead of the primary sorting column.
      * HTML visualisations no longer re-render the entire graph when changing
        the width. The same holds true for the height setting of Allelevis.
      * The tables in Samplevis HTML visualisations will now contain the
        information from BGCorrect's correction_flags column in the Notes
        column.
      a3e610e8
  7. 03 Dec, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Bug fixes and improvements in allele name gen and auto allele selection · 7820cad0
      Hoogenboom, Jerry authored
      Fixed:
      * In Samplevis HTML visualisations, the automatic allele selection was
        only checking the number of reverse reads for the 'minimum number of
        reads per orientation' setting.
      * In Samplevis HTML visualisations, automatic allele selection would
        fail to select alleles that had exactly the given minimum number of
        reads.
      * FDSTools would sometimes calculate incorrect and even negative repeat
        counts when producing TSSV-style sequences and allele names for
        sequences that did not exactly fit the STR structure given in the
        library.
      
      Improved:
      * The Samplestats tool now offers the same possibilities to mark alleles
        as Samplevis HTML visualisations do.
      * In Samplevis HTML visualisations, user-removed alleles now have a line
        through their table row.
      * Added a reference to https://docs.python.org/howto/regex in the sample
        tag parsing options section of the help text of many tools.
      * FDSTools will now do a better job of finding the longest possible
        match of the STR repeat definition to produce TSSV-style sequences and
        allele names for seqences that do not exactly fit the STR structure
        given in the library.
      
      Added:
      * New visualisation type 'allele'. With Allelevis, you can generate a
        graph of the alleles of the reference samples (output from
        Allelefinder). (Known bug: it has a 'funny' amount of padding.)
      7820cad0
  8. 01 Dec, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Grand update to all visualisations, especially Samplevis · e7517bbd
      Hoogenboom, Jerry authored
      Fixed:
      * The Vis tool no longer crashes if you specify '-' as the input file
        without piping data in from another program. It will just produce a
        visualisation file with no embedded data instead.
      * FDSTools would crash when generating an allele name for a sequence of
        an STR marker that contained the prefix and suffix of the marker but
        not the actual STR (yes, this happened).
      * Stuttermodelvis would draw all 'All data' fits in the graphs of all
        repeat unit sequences, instead of just the 'All data' fit that was
        fitted to the data of a particular repeat sequence.
      
      Improved:
      * BGHomStats, BGHomRaw, and Samplestats now round their output to three
        significant digits.
      * BGCorrect now rounds its output to 3 decimal positions.
      
      Various enhancements to Samplevis HTML visualisations:
      * Added a whole new set of options which are used to automatically
        select the true alleles in a sample.
      * Added an option to split the graphs and the table up per marker.
      * The selected alleles are no longer lost when the graphs are
        re-rendered due to changed options.
      * Added some more columns to the table of selected alleles and made the
        table prettier.
      * Added a dedicated stylesheet for printing, which transforms the web
        page into a nicely formatted report when printed.
      * Option groups can now be hidden separately.
      * Filtering options are now based on the read numbers after correction.
      * The mouse cursor now changes to a 'pointer' style cursor (usually a
        hand with stretched index finger) when hovered over the clickable
        portion of the graph.
      
      Visualisations:
      * Updated Vega to version 2.4.0 and d3 to version 3.5.10.
      * All visualisations now use signals to set the options. This allows
        them to be updated without re-parsing the entire graph spec in most
        cases, which is much faster.
      * Using new cross-and-filter capabilities in bgrawvis, profilevis,
        samplevis, and stuttermodelvis. This greatly reduces Vega's memory
        usage and speeds up rendering.
      * The name of the currently loaded data file is prepended to the page
        title in HTML visualisations.
      * If a file is loaded into an HTML visualisation by drag-and-drop, the
        name of the loaded file is displayed on the file input element.
      * A new -T/--title option for the Vis tool allows for specifying
        something that should be prepended to the page title of HTML
        visualisations. This is particularly useful when data is piped in,
        because no file name is available in that case.
      * Asynchronous rendering of visualisations is now cancelled if a new
        asynchronous rendering task has already been scheduled (HTML
        visualisations only).
      e7517bbd
  9. 23 Nov, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Introducing Samplestats · 559ee083
      Hoogenboom, Jerry authored
      * New tool Samplestats computes various sequence-centric statistics for
        sample data files. Most statistics relate to correction amounts and
        are thus only included if the input file contains BGCorrect columns.
      * The starting position can now be ommitted from the [genome_position]
        in FDSTools library files. A default value of 1 will be used in this
        case.
      * The setup.py script can now also be run without explicitly specifying
        Python as the interpreter (it now has a shebang line).
      559ee083
  10. 16 Nov, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Various fixes and improvements · 313867bc
      Hoogenboom, Jerry authored
      Fixed:
      * The 'to' base in variants called on mtDNA was incorrect. This bug could also cause FDSTools to crash.
      * FDSTools would crash if you tried to generate an allele name for a primer dimer of an mtDNA marker. (Now, you get an insane but entirely accurate allele name instead.)
      * Fixed bug that caused some perfectly valid mtDNA allele names to be rejected when attempting to convert them back to raw sequences.
      
      Improved:
      * You can now also specify the ending position of the markers in the FDSTools library. If you do, you may also additionally specify a second start position (and optionally also a second end position, and so on). FDSTools will interpret this as that the marker is the concatenation of each of these fragments. This was primarily introduced to support mtDNA fragments that contain (somewhere in the middle) the origin of mtDNA base numbering.
      * More helpful error message when format violations are detected while parsing the library file.
      * More helpful error message when the -e/--tag-expr regular expression could not be compiled.
      * Added a paragraph about sequence alignment caching to the help text of Seqconvert.
      * Added a 'flags' column to BGCorrect output, which gives information about the data that was used to do the correction.
      
      Background noise profiles:
      * Removed -C/--cross-tabular option from BGEstimate, BGPredict, and BGMerge and also removed the ability to read files in this format.
      * BGEstimate, BGHomStats, and BGPredict now add a column 'tool' with their name to the output.
      313867bc
  11. 04 Nov, 2015 1 commit
    • Hoogenboom, Jerry's avatar
      Implemented support for non-STR markers, improved file handling and more · 1083919c
      Hoogenboom, Jerry authored
      Additions and improvements to the FDSTools library file format:
      * New [genome_position] section in FDSTools-style library files allows
      for specifying the chromosome and position of each marker.
      * New [no_repeat] section in FDSTools-style library files allows for
      including non-STR markers.
      * Comma/semicolon/space-separated values in FDSTools-style library files
      can now also be separated by tab characters and multiple consecutive
      separators are no longer collapsed (with the exception of whitespace).
      * If no prefix and/or suffix has been specified for an alias, the
      prefix/suffix of the marker itself is used.
      * Implemented support for non-STR markers (e.g. SNP clusters) and mtDNA
      markers. Allele names of the latter follow mtDNA nomenclature.
      * Improved the logic of generating STR allele names for sequences that
      have a prefix or suffix sequence that was not included in the library
      file.
      * Updated and clarified various explanatory texts in generated FDSTools
      library files.
      
      Fixed:
      * Fixed a bug that caused prefix/suffix variants in aliases to go
      missing in allele names.
      
      Improved file handling:
      * Library files are now closed immediately after parsing them.
      * Sample data input files are opened one at a time now.
      
      Visualisations:
      * Updated Vega to version 2.3.1.
      * Worked around a bug in Google Chrome that caused the 'Save image' link
      to stop working after having been used once.
      1083919c
  12. 01 Sep, 2015 1 commit
    • jhoogenboom's avatar
      Various bug fixes and additions · ce7f34fb
      jhoogenboom authored
      Fixed:
      * Fixed crash that would occur when an empty sequence (primer dimer) is converted from raw to TSSV-style (or allelename) format.
      * Fixed bug in BGHomRaw that caused incorrect sample tags in the output.
      * Fixed bug that caused allele names with negative CE numbers and names of primer dimers to be regarded as 'invalid allele names' even though FDSTools generated those names itself.
      * Fixed crash when reading sample data while looking for an annotation column.
      * Fixed bug in Allelefinder resulting in the complete absence of output that occurred when a column name with Stuttermark output was specified.
      
      Changed:
      * Restyled the Options box on HTML visualisations. It is now less transparent and oriented more vertically to reduce overlap with the visualisation. Options are now presented in groups.
      * Updated Vega to version 2.2.1.
      
      New:
      * Added *_corrected columns to BGCorrect output for convenience. E.g., the total_corrected column contains the value of total-total_noise+total_add.
      * Added -L/--log-scale option to the Vis tool.
      ce7f34fb
  13. 21 Aug, 2015 1 commit
    • jhoogenboom's avatar
      Introducing Profilevis, and various bug fixes · b7d64a4f
      jhoogenboom authored
      * New visualisation Profilevis added to the package, but not yet to
        the Vis tool.
      * The Vis tool now prints a helpful error message if no output file
        was specified, instead of printing half a megabyte of HTML and
        minified JavaScript to the terminal.
      * Fixed crash that occurred when attempting to convert the sequence
        of an alias to its allele name.
      * Fixed various bugs in the functions that convert sequences to
        TSSV-style and allele names. Only the conversion of non-matching
        sequences was affected.
      * Added "max_expected_copies" section to the FDSTools library
        format. The default value is 2. Allelefinder will now use these
        as the maximum number of alleles per marker if the
        -a/--max-alleles option is not specified.
      * The section headers in the FDSTools library format are now case
        insensitive.
      b7d64a4f
  14. 12 Aug, 2015 1 commit
    • jhoogenboom's avatar
      Introducing BGMerge · 6207d485
      jhoogenboom authored
      * New tool BGMerge can be used to merge background noise profiles
        (e.g., merge BGPredict output with a database previously
        obtained from BGEstimate).
      * Fixed two major bugs in BGPredict that resulted in incorrect fit
        functions being used.
      * BGEstimate, BGPredict, BGHomStats, Blame, and StutterModel no
        longer crash if a library file is specified.
      * Added reverse strand profile estimation to BGPredict.
      6207d485
  15. 11 Aug, 2015 1 commit
    • jhoogenboom's avatar
      Introducing BGPredict · 276a0439
      jhoogenboom authored
      * New tool BGPredict predicts background noise profiles (containing
        only stutter products) for user-supplied alleles/sequences using
        a trained stutter model obtained from Stuttermodel. Currently
        only the amounts of the forward strand are predicted.
      * New option -L/--min-lengths for Stuttermodel allows to set a
        minimum required number of unique repeat lengths to base the
        fits on (default: 5).
      * Updated formatting of output of Stuttermodel: added '+' sign to
        positive stutter, limited r2 scores to 3 decimal places, and now
        all coefficients are written in scientific notation with 3
        decimal places.
      * The --output-column option of SeqConvert now defaults to using
        the value of --allele-column.
      276a0439
  16. 10 Aug, 2015 1 commit
    • jhoogenboom's avatar
      Intoducing StutterModel · 818ddd2b
      jhoogenboom authored
      * New tool StutterModel fits polynomials to stutter ratio vs repeat
        length.
      * Changed -R to -Q (--limit-reads) so that I can reassign -R to an
        option that is used more often.
      * Changed -r to -R (--report) to make sure it will not collide with
        the -r option in Stuttermark, if I ever want to add report output
        to Stuttermark.
      * BGHomStats now checks whether all alleles are detected
      818ddd2b
  17. 07 Aug, 2015 1 commit
    • jhoogenboom's avatar
      Reworked input/output file arguments · 7f23c2e0
      jhoogenboom authored
      * All tools now write to stdout by default. Tools that support
        writing report files write those to stderr by default. The
        -o/--output and -r/--report options can be used to override
        these.
      * Tools that operated on one sample at a time (bgcorrect,
        seqconvert, stuttermark) now support batch processing. The new
        -i/--input argument takes a list of files. In batch mode,
        the -o/--output argument can be used to specify a list of
        corresponding output files (which must be the same length). It
        is also possible to specify a format string to automatically
        generate file names. -o/--output defaults to "\1-\2.out" which is
        automatically expanded to "sampletag-toolname.out". The old
        positional arguments [IN] and [OUT] are maintained and allow for
        conveniently running the tools on a single sample file.
        [IN] is mutually exclusive with -i/--input and [OUT] is mutually
        exclusive with -o/--output. [OUT] now also accepts the filename
        format, but when not in batch mode, it still defaults to stdout.
        Note that by default, the sample tag is extracted from the input
        filenames by simply stripping the extension. This means a minimal
        batch processing command like "fdstools stuttermark -i *.csv"
        automatically creates a "...-stuttermark.out" file next to each
        CSV file in the current working directory.
      * Libconvert now also supports only specifying an output file.
        This makes it easier to write the default FDSTools library to a
        new file. E.g., "fdstools libconvert mynewfile.txt" now creates
        "mynewfile.txt" if it does not exist, and writes the default
        library to it. Most helpful.
      7f23c2e0
  18. 06 Aug, 2015 1 commit
    • jhoogenboom's avatar
      Greatly increased argument help · b30bdbbc
      jhoogenboom authored
      * All tools now have a longer description in the tool-specific help
        page.
      * Arguments are now presented in groups and the order is the same
        across tools.
      
      Furthermore:
      * Fixed bug that rendered BGHomStats and BGEstimate with the -H
        option useless.
      * The report of Allelefinder and BGEstimate is now written to
        sys.stderr by default. This means the report is now always
        generated (but it may be sent directly to /dev/null explicitly by
        the user). The big plus is that the progress of the tools is
        visible in the terminal when the tools are run by hand.
      b30bdbbc
  19. 05 Aug, 2015 1 commit
  20. 04 Aug, 2015 1 commit
    • jhoogenboom's avatar
      Introducing Blame · 8685a304
      jhoogenboom authored
      * New tool Blame can be used to find particularly dirty samples and
        to construct a DNA profile of the contaminator.
      * Fixed bug BGCorrect that resulted in incorrect values in the
        *_add columns.
      * BGEstimate and BGHomStats no longer crash if a library file is
        provided.
      * SeqConvert can now use a different library file for the output,
        thereby offering some possibilities to update allele names when a
        library file gets updated.
      * Replaced various uses of map() by generator expressions and
        listcomps for increased readability speed (although slightly).
      8685a304
  21. 03 Aug, 2015 1 commit
    • jhoogenboom's avatar
      Introducing BGHomStats · a09131d9
      jhoogenboom authored
      * New tool BGHomStats computes statistics (minimum, maximum, mean,
        and sample variance) of noise ratios in homozygous samples.
      * The default BGEstimate output format has been changed to be
        compatible with that of BGHomStats. The cross-tabular output
        format is still available as an option because it easily uses 90%
        less disk space. BGCorrect (and other future tools that use noise
        profiles) will work with both formats.
      * Fixed bug in the --min-samples option of BGEstimate that could
        cause some alleles with less than the specified number of samples
        to be included if --drop-samples is used at the same time.
      * The user now receives an error message if there are unknown
        arguments. The error message lists the usage string of the
        requested tool. (Argparse's default was to print the general
        FDSTools usage string, which is not helpful.)
      a09131d9
  22. 31 Jul, 2015 1 commit
    • jhoogenboom's avatar
      Various FDSTools-wide enhancements · 7b12cccb
      jhoogenboom authored
      * Unknown arguments are now silently ignored. If this results in
        the tool not being able to run, the usage information of the tool
        is printed instead of the general fdstools usage.
      * Seqconvert no longer crashes on an empty line in the input.
      * Libconvert now maintains the order of prefix/suffix sequences.
      * Allele names with aliases other than 'X' or 'Y' are now correctly
        recognised. These were previously rejected as 'unknown format'.
      * Fixed bug where a prefix/suffix other than the first listed in
        the library file was sometimes used as the canonical sequence.
      * Sequence format conversion from raw to TSSV-style sequences now
        attempts to match the prefix, suffix, and STR pattern to
        non-matching sequences on a best effort basis. This is
        especially useful when converting to allelenames (which is done
        via TSSV-style sequences), since it results in an allele name
        that matches more closely the names of other alleles.
      * Generating allele names for sequences that lack a prefix and/or
        suffix is now supported (by adding a variant description that
        deletes the entire prefix/suffix).
      7b12cccb
  23. 30 Jul, 2015 1 commit
  24. 29 Jul, 2015 1 commit
    • jhoogenboom's avatar
      Introducing bgestimate · be745e64
      jhoogenboom authored
      I could write about all its features here, but instead I will point
      out some future plans to highlight the things that are possibly not
      optimal in their current implementation.
      
      There are a number of things I plan to change in the future:
      * The output format is currently JSON, perhaps a carefully designed
        tabular format is a better choice. The benefit of switching to a
        tabluar format is that the data can be loaded into e.g. Excel as
        well.
      * The profiles are currently produced separately for forward and
        reverse reads. I would prefer to integrate these into a single
        computation that estimates allele balance in the heterozygotes
        using both strands as well.
      * I would like to add information about strand bias of the alleles
        as well. The most straightforward way to do this is to set only
        the forward reads of the true allele to 100 and treat the reverse
        reads the same as all background products. You will then obtain a
        number of reverse reads observed for ever 100 forward reads of
        the true allele.
      * I think it would be appropriate to make sure the values in the
        allele balance matrices of each sample ('Ax' in the source code)
        should add up to 1. For homozygotes, it is currently a scalar 1,
        the sum of the elements tend to be more than 1. This means that a
        heterozygous sample has a stronger influence on the profiles than
        a homozygous sample.
      be745e64
  25. 27 Jul, 2015 1 commit
    • jhoogenboom's avatar
      Updates to allelefinder · f9543ed9
      jhoogenboom authored
      * Allelefinder can now combine data from multiple files into a
        single sample (this happens when the same sample tag was
        extracted from their names).
      * Allelefinder can now automatically convert sequences to a given
        format (this is optional though). This is particularly useful
        when combining the knownalleles.csv and newalleles.csv files of
        a sample. (Note that allelefinder still assumes that the files
        contain different alleles; no attempt is made to check whether
        the same allele was represented in multiple files.)
      f9543ed9
  26. 24 Jul, 2015 1 commit
  27. 23 Jul, 2015 1 commit
    • jhoogenboom's avatar
      Laying foundations · 160594c5
      jhoogenboom authored
      * Introducing a new, extended library file format to support
        allele name generation.  The new libconvert tool can convert
        TSSV libraries to the new format and vice versa.
      * Added functions for converting between raw sequences, TSSV-style
        sequences, and allele names.
      * Added global -d/--debug option.
      
      Stuttermark updates:
      * Stuttermark now automatically converts input sequences to
        TSSV-style if a library is provided.
      * Stuttermark will no longer crash if there is no 'name' column.
        Instead, all sequences are taken to belong to the same marker.
      
      New tools:
      * libconvert converts between FDSTools and TSSV library formats.
      * seqconvert converts between raw sequences, TSSV-style sequences,
        and allele names.
      * allelefinder detects the true alleles in reference samples.
      160594c5
  28. 02 Jul, 2015 1 commit
    • jhoogenboom's avatar
      Initial commit · 668970ed
      jhoogenboom authored
      FDSTools v0.0.1 with Stuttermark v1.3.
      Other tools will come later.
      668970ed