- 08 Mar, 2017 1 commit
-
-
Hoogenboom, Jerry authored
* General changes in v1.1.0.dev3: * Allele name heuristics: don't produce insertions at the end of the prefix or at the beginning of the suffix; just include extra STR blocks. * FDSTools will no longer crash with a 'column not found' error when an input file is empty. This situation is now treated as if the expected columns existed, but no lines of actual data were present. This greatly helps in tracking down issues in pipelines involving multiple tools, as tools will now shutdown gracefully if an upstream tool fails to write output. * Allelefinder v1.0.1: * Fixed crash that occurred when converting sequences to allele names format while no library file was provided. * Don't crash when output pipe is closed. * BGAnalyse v1.0.1: * Don't crash when output pipe is closed. * BGCorrect v1.0.2: * Don't crash on empty input files. * Don't crash when output pipe is closed. * BGEstimate v1.1.2: * Don't crash when output pipe is closed. * BGHomRaw v1.0.1: * Clarified the 'Allele x of marker y has 0 reads' error message with the sample tag. * Don't crash when output pipe is closed. * BGHomStats v1.0.1: * Error messages about the input data now contain the sample tag of the sample that triggered the error. * Don't crash when output pipe is closed. * BGMerge v1.0.3: * Don't crash when output pipe is closed. * BGPredict v1.0.2: * Don't crash on empty input files. * Don't crash when output pipe is closed. * FindNewAlleles v1.0.1: * Don't crash on empty input files. * Don't crash when output pipe is closed. * Libconvert v1.1.2: * Don't crash when output pipe is closed. * Library v1.0.3: * Don't crash when output pipe is closed. * Seqconvert v1.0.2: * Don't crash when output pipe is closed. * Samplestats v1.1.1: * Don't crash on empty input files. * Don't crash when output pipe is closed. * Stuttermark v1.5.1: * Don't crash on empty input files. * Don't crash when output pipe is closed. * Stuttermodel v1.1.2: * Don't crash when output pipe is closed. * TSSV v1.1.0 (additionally): * When running analysis in parallel, make tasks of 1 million alignments. Previously, this was 10k reads, with the number of alignments per task depending on the size of the library file. This caused memory issues for huge libraries like whole mt interval libraries. * Don't crash when output pipe is closed. * Vis v1.0.4: * Don't crash when output pipe is closed.
-
- 21 Dec, 2016 1 commit
-
-
Hoogenboom, Jerry authored
* General changes in v1.0.1: * Fixed crash that occurred when using the -i option to run the same command on multiple input files. * The usage string now always starts with 'fdstools', even if FDSTools was invoked using some other command (e.g. on Windows, FDSTools gets invoked through a file called 'fdstools-script.py'). * Fixed bug with the -d/--debug option being ignored if placed before the tool name on systems running Python 2.7.9 or later. * FDSTools library files may now contain IUPAC ambiguous bases in the prefix and suffix sequences of STR markers (except the first sequence, as it is used as the reference). Additionally, optional bases may be represented by lowercase letters. * If no explicit prefix/suffix is given for an alias, the prefix/suffix of the corresponding marker is assumed instead. This situation was not handled correctly when converting from raw sequences to TSSV or allelename format, which resulted in the alias remaining unused. * Stuttermodelvis v2.0.2: * Added filtering option for the stutter amount (-1, +1, -2, etc.). * Added filtering option for the coefficient of determination (r squared value) of the fit functions. * Libconvert v1.1.1: * Adjustments for supporting IUPAC notation in prefix and suffix sequences when converting from FDSTools to TSSV library format. * Library v1.0.2: * Added documentation for IUPAC support to the descriptive comment of the [prefix] section.
-
- 03 Oct, 2016 1 commit
-
-
Hoogenboom, Jerry authored
* General changes in v1.0.0rc1: * Fixed bug that caused variant descriptions in allele names of non-STR markers to be prepended with plus signs similar to suffix variants in STR markers. When attempting to convert these allele names back to raw sequences, FDSTools would crash with an 'Invalid allele name' error. * Allelevis v2.0.1 (additionally): * In the tooltip in HTML visualisations, a line break may now only be inserted in allele names after an underscore character (_) or after a repeat block in STR allele names. If the input file contains raw sequences, line breaks may now be introduced anywhere in the sequence. * Samplevis v2.1.1: * Added tooltip support to HTML visualisations. Moving the mouse pointer over one of the alleles in the graph now displays a tooltip giving per-strand read counts of that allele. The tooltip may include a 'new allele' note if the input sample was analysed with FindNewAlleles. * The allele tables in HTML visualisations will now grow much wider than before if the screen (or window) is very narrow. * In the tables in HTML visualisations, a line break may now only be inserted in allele names after an underscore character (_) or after a repeat block in STR allele names. If the input file contains raw sequences, line breaks may now be introduced anywhere in the sequence. * Improved determination of column widths of the allele tables when printing an HTML visualisation. * When printing an HTML visualisation, the graph and the corresponding table of a marker will be kept on the same page in all browsers now. * Fixed glitch that caused 'Infinity%' or 'NaN%' to be written in some cells in the allele tables in HTML visualisations for sequences that had zero reads (before or after correction). These cells will remain empty now. * Pipeline v1.0.1 (additionally): * The Pipeline tool will now only write the command lines of the tools it runs if the -d/--debug option was specified. * Library v1.0.1 (additionally): * Added proper examples for non-STR markers and aliases. * Stuttermodel v1.1.1: * Minor change to internal variant representation.
-
- 20 Sep, 2016 1 commit
-
-
Hoogenboom, Jerry authored
* General changes in v0.0.6.dev1: * Tools that take a list of files as their argument (through the -i option or as positionals) now explicitly support glob patterns. This means they will interpret '*' and '?' characters as wildcards for 'zero or more characters' and 'any one character', respectively. On Unix-like systems this is generally done by the shell, but on Windows one had to specify every file name completely. * BGEstimate v1.1.1: * Added option -p/--profiles which can be used to provide a previously created background noise profiles file. BGEstimate will read starting values from this file instead of assuming zero noise. * BGMerge v1.0.2: * Small code changes to facilitate explicit glob pattern matching support. * Pipeline v1.0.1: * The Pipeline tool will no longer check the existence of the files specified for the -S/--in-samples option; instead, this is left to the downstream tools to find out, consistent with how this works with the other input file options. * Allelevis v2.0.1: * Added tooltip support to HTML visualisations. Moving the mouse pointer over a node or edge in the graph now displays a tooltip giving allele names and sample counts. * Stuttermodelvis v2.0.1: * Changed the unit in the horizontal axis title from 'bp' to 'nt'. * Library v1.0.1: * Updated some of the comments describing the sections.
-
- 06 Sep, 2016 1 commit
-
-
Hoogenboom, Jerry authored
* General changes in v0.0.5: * The TSSV tool now depends on version 0.4.0 of TSSV. * Added new Pipeline tool that runs one of three default analysis pipelines automatically given a configuration file with tool options and input/output file names. The three available pipeline options are 'reference-sample', analysing a single reference sample with TSSV and Stuttermark; 'reference-database', analysing a collection of reference samples with BGEstimate and Stuttermodel; and 'case-sample', analysing a single case sample with TSSV, BGPredict, BGMerge, BGCorrect, and Samplestats. * Added new Library tool that creates an empty FDSTools library file. Users may optionally specify the intented use of the library (STR markers, non-STR-markers, or both). Only the sections that apply to the given types of markers will be included in the output. The [aliases] section is not included by default, but an option is available to add it. * Added new tool BGAnalyse which can be used to analyse the remaining amount of noise in reference samples after correction. This tool is a more sensitive successor of the 'Blame' tool. * Added new visualisation BGAnalysevis for visualising data obtained from BGAnalyse. This visualisation allows for identifying unclean or otherwise suboptimal samples by comparing the lowest, highest, and/or total remaining noise after correction for each marker in each sample. * The Blame tool was removed in favour of BGAnalyse. * Libconvert v1.1.0: * When converting to FDSTools format, Libconvert automatically creates an empty FDSTools library file with the same contents as what would be obtained from the new Library tool without arguments. * The -a/--aliases option was modified such that it has the same effect as the -a/--aliases option of the new Library tool. This means that without this option specified, the [aliases] section will not be present in the output anymore. * The ability of the Libconvert tool to produce an empty FDSTools library file if no input file was given has been removed from the documentation (but not from the tool itself). * TSSV v1.0.2: * Added new option -n/--indel-score which can be used to increase the penalty given to insertions and deletions in the flanking sequences w.r.t. the penalty given to mismatches. * NOTE: Requires TSSV v0.4.0 or newer to be installed. * Vis v1.0.2: * Changed default value of -n/--min-abs from 15 to 5. * Added -I/--input2 option, which allows for specifying a file with raw data points for Stuttermodelvis and Profilevis. * Added support for creating BGAnalysevis visualisations. * Profilevis v2.0.0: * Replaced the simple Options overlay with responsive design options panels in HTML visualisations. * Alleles and sequences are now sorted by CE allele length when applicable. * Added option to plot BGHomRaw data on top of the profiles. * Added marker selection menu for easier filtering. * BGRawvis v2.0.0: * Replaced the simple Options overlay with responsive design options panels in HTML visualisations. * Sequences are now sorted by CE allele length when applicable. * Changed default minimum number of reads from 15 to 5. * Added marker selection menu for easier filtering. * Stuttermodelvis v2.0.0: * Replaced the simple Options overlay with responsive design options panels in HTML visualisations. * Fixed glitch that caused the graphs to be re-rendered twice when loading a file by drag-and-drop in HTML visualisations. * Fixed glitch that made it possible to replace the data that was embedded in an HTML visualisation through drag-and-drop. * Added repeat unit selection menu for easier filtering. * Allelevis v2.0.0: * Replaced the simple Options overlay with responsive design options panels in HTML visualisations. * Reduced Vega graph spec complexity by using the new Rank transform to position the subgraphs. * Fixed glitch that caused unnecessary padding around the graph. * Samplestats v1.1.0: * Changed default allele calling option thresholds: * Changed default value of -m/--min-pct-of-max from 5.0 to 2.0. * Changed default value of -p/--min-pct-of-sum from 3.0 to 1.5. * Mentioned allele calling in the tool descriptions. * Samplevis v2.1.0: * Changed default minimum number of reads for graph filtering from 15 to 5. * Changed default table filtering options: * Percentage of highest allele per marker changed from 5% to 2%. * Percentage of the marker's total reads changed from 3% to 1.5%. * Minimum number of reads in both orientations changed from 0 to 1.
-
- 26 Jul, 2016 1 commit
-
-
Hoogenboom, Jerry authored
* Global changes in v0.0.4: * FDSTools will now print profiling information to stdout when the -d/--debug option was specified. * Fixed bug where specifying '-' as the output filename would be taken literally, while it should have been interpreted as 'write to standard out' (Affected tools: BGCorrect, Samplestats, Seqconvert, Stuttermark). * Added more detailed license information to FDSTools. * BGEstimate v1.1.0: * Added a new option -g/--min-genotypes (default: 3). Only alleles that occur in at least this number of unique heterozygous genotypes will be considered. This is to avoid 'contamination' of the noise profile of one allele with the noise of another. If homozygous samples are available for an allele, this filter is not applied to that allele. Setting this option to 1 effectively disables it. This option has the same cascading effect as the -s/--min-samples option, that is, if one allele does not meet the threshold, the samples with this allele are excluded which may cause some of the other alleles of these samples to fall below the threshold as well. * Stuttermodel v1.1.0: * Stuttermodel will now only output a fit for one strand if it could also obtain a fit for the other strand (for the same marker, unit, and stutter depth). This new behaviour can be disabled with a new -O/--orphans option. * Fixed bug that caused Stuttermodel to output only the raw data points for -1 and +1 stutter when normal output was supressed. * BGCorrect v1.0.1: * Added new column 'weight' to the output. The value in this column expresses the number of times that the noise profile of that allele fitted in the sample. * Samplestats v1.0.1: * Samplestats will now round to 4 or 5 significant digits if a value is above 1000 or 10000, respectively. Previously, this was only done for the combined 'Other sequences' values. * The 'Other sequences' lines will now also include values for total_recovery, forward_recovery, and reverse_recovery. * The total_recovery, forward_recovery, and reverse_recovery columns are no longer placed to the left of all the other columns generated by Samplestats. * The help text for Samplestats erroneously listed the X_recovery_pct instead of X_recovery. * Added support for the new 'weight' column produced by BGCorrect when the -a/--filter-action option is set to 'combine'. * BGPredict v1.0.1: * Greatly reduced memory usage. * BGPredict will now output nonzero values below the threshold set by -n/--min-pct if the predicted noise ratio of the same stutter on the other strand is above the threshold. Previously, values below the threshold were clipped to zero, which may cause unnecessarily high strand bias in the predicted profile. * BGMerge v1.0.1: * Reduced memory usage. * TSSV v1.0.1: * Renamed the '--is_fastq' option to '--is-fastq'. It was the only option with an underscore instead of a hyphen in FDSTools. * Fixed crash that would occur if -F/--sequence-format was set to anything other than 'raw'. * Libconvert v1.0.1: * Specifying '-' as the first positional argument to libconvert will now correctly interpret this as "read from stdin" instead of throwing a "file not found" error (or reading from a file named "-" if it exists). * Seqconvert v1.0.1: * Internal naming of the first positional argument was changed from 'format' to 'sequence-format'. This was done for consistency with the -F/--sequence-format option in other tools, giving it the same name in Pipeline configuration files. * Vis v1.0.1: * Added -j/--jitter option for Stuttermodelvis (default: 0.25). * Vis would not allow the -n/--min-abs and the -s/--min-per-strand options to be set to 0. * Stuttermodelvis v1.0.0beta2: * HTML visualisations now support drawing raw data points on top of the fit functions. The points can be drawn with an adjustable jitter to reduce overlap. * Fixed a JavaScript crash that would occur in HTML visualisations if the Repeat unit or Marker name filter resulted in an invalid regular expression (e.g., when the entered value ends with a backslash). * Reduced Vega graph spec complexity by using the new Rank transform to position the subgraphs. * HTML visualisations made with the -O/--online option of the Vis tool will now contain https URLs instead of http. * Samplevis v1.0.1: * Fixed a JavaScript crash that would occur in HTML visualisations if the Marker name filter resulted in an invalid regular expression (e.g., when the entered value ends with a backslash). * Reduced Vega graph spec complexity by using the new Rank transform to position the subgraphs. * Fixed a glitch where clicking the 'Truncate sequences to' label would select the marker spacing input. * The 'Notes' table cells with 'BGPredict' in them now get a light orange background to warn the user that their background profile was computed. If a sequence was explicitly 'not corrected', 'not in ref db', or 'corrected as background only', the same colour is used. * The message bar at the bottom of Samplevis HTML visualisations will now grow no larger than 3 lines. A scroll bar will appear as needed. * HTML visualisations made with the -O/--online option of the Vis tool will now contain https URLs instead of http. * BGRawVis v1.0.1: * Fixed a JavaScript crash that would occur in HTML visualisations if the Marker name filter resulted in an invalid regular expression (e.g., when the entered value ends with a backslash). * Reduced Vega graph spec complexity by using the new Rank transform to position the subgraphs. * HTML visualisations made with the -O/--online option of the Vis tool will now contain https URLs instead of http. * Profilevis v1.0.1: * Fixed a JavaScript crash that would occur in HTML visualisations if the Marker name filter resulted in an invalid regular expression (e.g., when the entered value ends with a backslash). * Reduced Vega graph spec complexity by using the new Rank transform to position the subgraphs. * HTML visualisations made with the -O/--online option of the Vis tool will now contain https URLs instead of http. * Allelevis v1.0.0beta2: * Fixed potential crash/corruption that could occur with very unfortunate combinations of sample names and marker names. * HTML visualisations made with the -O/--online option of the Vis tool will now contain https URLs instead of http. * Added two more colours to the legend, such that a maximum of 22 markers is now supported without re-using colours. * Updated bundled D3 to v3.5.17. * Updated bundled Vega to v2.6.0.
-
- 10 Mar, 2016 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * The -c/--stuttermark-column option of Allelefinder was not filtering out the non-ALLELE sequences as it was supposed to. (This issue was introduced in ce7f34fb, in which a bug was fixed that caused this option to filter all sequences, including the ones marked with ALLELE. So it turns out this option has been broken since 732e83ba.) Improved: * Allelefinder will no longer reject a marker based on the number of reads of 'Other sequences'. * Adjusted sequence alignment parameters for mtDNA sequences to produce allele names that more closely follow historical mtDNA mutation nomenclature.
-
- 08 Mar, 2016 1 commit
-
-
Hoogenboom, Jerry authored
FDSTools would sometimes produce suboptimal alignments. Most notably, it it would produce multiple smaller insertions/deletions when the difference between two sequences could be described by one larger insertion/deletion in combination with a base substitution. The latter description is often more biologically sound and also usually results in a shorter allele name. * Fixed a bug that sometimes caused FDSTools to choose an incorrect path through the alignment matrix, producing a suboptimal alignment. * Tweaked the alignment parameters to produce more meaningful results.
-
- 29 Feb, 2016 1 commit
-
-
Hoogenboom, Jerry authored
Added: * Added a new section expected_allele_length to the FDSTools library format. In this section, the minimum and (optionally) maximum allele length of each marker can be specified. * Added -L/--check-length option to the TSSV tool. If specified, the tool will use the expected_allele_length values to filter the results. * Samplevis can now truncate long allele names to a given number of characters (defaulting to 70). * Added an option to Samplestats to keep negatives when filtering (abs filter). Changed: * Renamed the --aggregate-below-minimum option of the TSSV tool to --aggregate-filtered. Improved: * Added an option to read_sample_data_file such that other code can request or require that the X_corrected columns are used. * Samplestats will now round to 4 or 5 significant digits if a value is above 1000 or 10000, respectively. * BGHomRaw will no longer round the forward, reverse, and total columns. * When generating mtDNA allele names, FDSTools will now try to avoid creating gaps in the alignment of the sequences against the reference. * Grouped the filtering options of the TSSV tool in its help text. * Cleaned up some leftover code for special sequence value handling (more specifically: code that expected ensure_sequence_format to return False for special sequence values, which it no longer does). * Cleaned up some dead legacy code in reduce_read_counts.
-
- 25 Feb, 2016 1 commit
-
-
Hoogenboom, Jerry authored
New tool findnewalleles: * Given a list of known sequences, this tool can go through sample data files to mark all sequences that are not on the list. Fixed: * BGHomRaw, BGEstimate, BGHomStats, Stuttermodel, and Blame did not ignore the 'Other sequences' and 'No data' values that may occur in the place of a sequence as they were supposed to. Improved: * BGHomRaw will now include the sample tag in the "Missing allele X of marker Y" error message. Changed: * The -F/--sequence-format argument from BGHomRaw now defaults to "raw". Visualisations: * Updated Vega to version 2.5.0. * The new version of Vega allowed the sorting to be fixed in Samplevis, Profilevis, BGRawvis, and Stuttermodelvis. * Samplevis: * The 'Other sequences' bars are now drawn with an outline only. * STR alleles are now sorted by allele length by default (this can be toggled with a checkbox in HTML visualisations, and with an option in the Vis tool). * Fixed the clipping of the start of long allele names when printing SVGs from Google Chrome. * Added a note (as '?' help tooltip) to the Common axis range option in the HTML visualisation, to inform the user of the fact that the Split markers option needs to be off for it to work.
-
- 04 Feb, 2016 1 commit
-
-
Hoogenboom, Jerry authored
Improved: * Added -A/--aggregate-below-minimum option to the TSSV tool. This will add a line with 'Other sequences' to the output summing all sequences that were not reported because they had less reads than was specified with the -a/--minimum option. * Clarified the help text for the -D/--dir option of the TSSV tool. Fixed: * Updated all tools to consistently handle cases where 'No data' or 'Other sequences' occurs in place of a sequence.
-
- 02 Feb, 2016 1 commit
-
-
Hoogenboom, Jerry authored
Updated Stuttermark to v1.5. WARNING: This version of Stuttermark is INCOMPATIBLE with output from previous versions of FDSTools and TSSV. Introducing TSSV-Lite * New tool tssv acts as a wrapper around TSSV-Lite (tssvl). Its primary purpose is to allow running TSSV-Lite without having to convert the FDSTools library to TSSV format, and to offer allelename output. Like all other tools in FDSTools, it also works with TSSV library files but its allele name generation capabilities are limited in that case. Changed: * TSSV-Lite and the new TSSV tool in FDSTools have two columns renamed w.r.t. the original TSSV program: 'name' has been changed to 'marker', and 'allele' has been changed to 'sequence'. All tools in FDSTools have been updated to use the new column names. This change affects Allelefinder, BGCorrect, BGEstimate, BGHomRaw, BGHomStats, BGPredict, Blame, Samplestats, Samplevis, Stuttermark, Stuttermodel, and Seqconvert. Note that this change will BREAK COMPATIBILITY of these tools with old data files. Fixed: * In Samplevis HTML visualisations, the "percentage recovery" table filtering option used the absolute number of recovered reads instead. * Added PctRecovery to the tables in Samplevis HTML visualisations. * BGPredict will now print a nice error message if the -n/--min-pct option is set to zero or a negative number, to avoid division by zero. * Samplestats would crash if the input file contained the flags column. * FDSTools would crash when trying to convert sequences to allele names using a TSSV library. Improved: * Libconvert will no longer include duplicate sequences in the STR defenition when converting to TSSV format and the reference sequence of one of the markers is the same as one of its aliases, or when aliases of one marker share one or more prefix or suffix sequences. * Updated add_input_output_args() such that the output file is a positional argument (instead of -o) for tools that have a single input file and no support for batches. * Updated add_sequence_format_args() such that the library file can be made a required argument. * Refined the FDSTools package description, since FDSTools does more than just noise filteirng. * FDSTools will now do a marginally better job at producing allele names for sequences that do not exactly match the provided STR pattern. When seeking the longest matching portion of the sequence, it will now also test the reversed sequence with a reversed pattern, which sometimes yields a longer match. It is still not optimal, though, but some refactoring has been done to move away from regular expressions. * BGCorrect will now also fill in correction_flags for newly added sequences. * Adjusted the help text of Samplestats to include the fact that the -c and -y options have an OR relation instead of an AND relation. * BGCorrect, BGEstimate, BGHomRaw, BGHomStats, BGPredict, and Stuttermodel will now ignore special values that may appear in the place of a sequence (currently: 'Other sequences' and 'No data'). Removed: * The -m/--marker-column and -a/--allele-column arguments of BGPredict had no effect and have been removed. Visualisations: * Updated bundled D3 to v3.5.12. * In HTML visualisations, if the page is scrolled to the right edge when an option is changed that causes the graphs to become wider, the page now remains scrolled to the right. * Samplevis HTML visualisations: * Added 'Clear manually added/removed' link to the table filtering. * Reduced flicker of the mouse cursor in Internet Explorer. * Added 'Common axis range' checkbox (only available when 'Split markers' is off). * Added 'Save table' link to save the table of selected alleles to a tab-separated file. * Added 'PctRecovery' column to the tables of selected alleles. * An alert box is now shown when a data file is loaded that contains markers that have 'No data'. * Added 'Percentage of total reads' to the graph filtering options. * Added a note to the table filtering options to explain that the minimum percentage correction and recovery have an OR relation.
-
- 09 Dec, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * When converting STR allele names to sequences, FDSTools would reject any prefix variants with a false message stating that the variant does not match the reference sequence. * The Samplestats tool would not allow the -b/--min-per-strand option to be set to zero. Improved: * Moved the flags generated by BGCorrect to a new column named correction_flags. Some of the values have been renamed for clarity, and this column now always contains a value. * The Samplestats tool will no longer add the not_corrected flag to each sequence, as it does not add the correction_flags column. * The Samplestats tool now supports filtering sequences. For filtering, the same set of options is available as those used for marking alleles. The filtering options use upper case letters and have '-filt' appended to their long name. The new -a/--filter-action option defines what should be done with filtered sequencies. 'off', the default, disables filtering; 'combine' replaces filtered sequences with a new line containing aggregated data; 'delete' removes filtered sequences without leaving a trace. * The seqconvert tool is aware of the special 'Other sequences' value produced by Samplestats with -a/--filter-action set to 'combine'. Other tools will give an informative error message when the input contains this special value. * The Samplestats tool now accepts non-integer and negative numbers for -n/--min-reads and -b/--min-per-strand because after correction read counts are not necessarily nonnegative integers anymore. * The forward_correction and reverse_correction columns of Samplestats will now contain 0 if the sequence had exactly 0 reads both before and after correction (previously, this was -100). * Renamed the _mp columns of Samplestats to _mp_sum ("per-marker percentage of the sum") and introduced _mp_max columns ("per-marker percentage of the maximum"). * Samplestats and Samplevis HTML visualisations will now mark a sequence as 'allele' if the minimum amount of correction OR the minimum number of recovered reads is reached (as opposed to AND). This allows alleles on stutter positions to be detected. Changed: * The -r/--min-recovery option of Samplestats has been renamed to -y/--min-recovery, analogous to the new -Y/--min-recovery-filt. Visualisations: * Updated Vega to version 2.4.1. * Replaced the regular expression-based filters in all visualisations with a much simpler syntax. The new syntax uses space-separated search terms, defaulting to a 'contains'-type search method. If any search term is preceded by an equals sign, that term must be matched exactly. (The search terms themselves are actually still matched as regexes!) * Added 'show negative alleles' option (default on) to Samplevis. When enabled, the graph filtering options work on abs(value) instead of the value itself. * When sorting alleles in Samplevis, the allele name is now used as the final tiebreaker instead of the primary sorting column. * HTML visualisations no longer re-render the entire graph when changing the width. The same holds true for the height setting of Allelevis. * The tables in Samplevis HTML visualisations will now contain the information from BGCorrect's correction_flags column in the Notes column.
-
- 03 Dec, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * In Samplevis HTML visualisations, the automatic allele selection was only checking the number of reverse reads for the 'minimum number of reads per orientation' setting. * In Samplevis HTML visualisations, automatic allele selection would fail to select alleles that had exactly the given minimum number of reads. * FDSTools would sometimes calculate incorrect and even negative repeat counts when producing TSSV-style sequences and allele names for sequences that did not exactly fit the STR structure given in the library. Improved: * The Samplestats tool now offers the same possibilities to mark alleles as Samplevis HTML visualisations do. * In Samplevis HTML visualisations, user-removed alleles now have a line through their table row. * Added a reference to https://docs.python.org/howto/regex in the sample tag parsing options section of the help text of many tools. * FDSTools will now do a better job of finding the longest possible match of the STR repeat definition to produce TSSV-style sequences and allele names for seqences that do not exactly fit the STR structure given in the library. Added: * New visualisation type 'allele'. With Allelevis, you can generate a graph of the alleles of the reference samples (output from Allelefinder). (Known bug: it has a 'funny' amount of padding.)
-
- 01 Dec, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * The Vis tool no longer crashes if you specify '-' as the input file without piping data in from another program. It will just produce a visualisation file with no embedded data instead. * FDSTools would crash when generating an allele name for a sequence of an STR marker that contained the prefix and suffix of the marker but not the actual STR (yes, this happened). * Stuttermodelvis would draw all 'All data' fits in the graphs of all repeat unit sequences, instead of just the 'All data' fit that was fitted to the data of a particular repeat sequence. Improved: * BGHomStats, BGHomRaw, and Samplestats now round their output to three significant digits. * BGCorrect now rounds its output to 3 decimal positions. Various enhancements to Samplevis HTML visualisations: * Added a whole new set of options which are used to automatically select the true alleles in a sample. * Added an option to split the graphs and the table up per marker. * The selected alleles are no longer lost when the graphs are re-rendered due to changed options. * Added some more columns to the table of selected alleles and made the table prettier. * Added a dedicated stylesheet for printing, which transforms the web page into a nicely formatted report when printed. * Option groups can now be hidden separately. * Filtering options are now based on the read numbers after correction. * The mouse cursor now changes to a 'pointer' style cursor (usually a hand with stretched index finger) when hovered over the clickable portion of the graph. Visualisations: * Updated Vega to version 2.4.0 and d3 to version 3.5.10. * All visualisations now use signals to set the options. This allows them to be updated without re-parsing the entire graph spec in most cases, which is much faster. * Using new cross-and-filter capabilities in bgrawvis, profilevis, samplevis, and stuttermodelvis. This greatly reduces Vega's memory usage and speeds up rendering. * The name of the currently loaded data file is prepended to the page title in HTML visualisations. * If a file is loaded into an HTML visualisation by drag-and-drop, the name of the loaded file is displayed on the file input element. * A new -T/--title option for the Vis tool allows for specifying something that should be prepended to the page title of HTML visualisations. This is particularly useful when data is piped in, because no file name is available in that case. * Asynchronous rendering of visualisations is now cancelled if a new asynchronous rendering task has already been scheduled (HTML visualisations only).
-
- 23 Nov, 2015 1 commit
-
-
Hoogenboom, Jerry authored
* New tool Samplestats computes various sequence-centric statistics for sample data files. Most statistics relate to correction amounts and are thus only included if the input file contains BGCorrect columns. * The starting position can now be ommitted from the [genome_position] in FDSTools library files. A default value of 1 will be used in this case. * The setup.py script can now also be run without explicitly specifying Python as the interpreter (it now has a shebang line).
-
- 16 Nov, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * The 'to' base in variants called on mtDNA was incorrect. This bug could also cause FDSTools to crash. * FDSTools would crash if you tried to generate an allele name for a primer dimer of an mtDNA marker. (Now, you get an insane but entirely accurate allele name instead.) * Fixed bug that caused some perfectly valid mtDNA allele names to be rejected when attempting to convert them back to raw sequences. Improved: * You can now also specify the ending position of the markers in the FDSTools library. If you do, you may also additionally specify a second start position (and optionally also a second end position, and so on). FDSTools will interpret this as that the marker is the concatenation of each of these fragments. This was primarily introduced to support mtDNA fragments that contain (somewhere in the middle) the origin of mtDNA base numbering. * More helpful error message when format violations are detected while parsing the library file. * More helpful error message when the -e/--tag-expr regular expression could not be compiled. * Added a paragraph about sequence alignment caching to the help text of Seqconvert. * Added a 'flags' column to BGCorrect output, which gives information about the data that was used to do the correction. Background noise profiles: * Removed -C/--cross-tabular option from BGEstimate, BGPredict, and BGMerge and also removed the ability to read files in this format. * BGEstimate, BGHomStats, and BGPredict now add a column 'tool' with their name to the output.
-
- 04 Nov, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Additions and improvements to the FDSTools library file format: * New [genome_position] section in FDSTools-style library files allows for specifying the chromosome and position of each marker. * New [no_repeat] section in FDSTools-style library files allows for including non-STR markers. * Comma/semicolon/space-separated values in FDSTools-style library files can now also be separated by tab characters and multiple consecutive separators are no longer collapsed (with the exception of whitespace). * If no prefix and/or suffix has been specified for an alias, the prefix/suffix of the marker itself is used. * Implemented support for non-STR markers (e.g. SNP clusters) and mtDNA markers. Allele names of the latter follow mtDNA nomenclature. * Improved the logic of generating STR allele names for sequences that have a prefix or suffix sequence that was not included in the library file. * Updated and clarified various explanatory texts in generated FDSTools library files. Fixed: * Fixed a bug that caused prefix/suffix variants in aliases to go missing in allele names. Improved file handling: * Library files are now closed immediately after parsing them. * Sample data input files are opened one at a time now. Visualisations: * Updated Vega to version 2.3.1. * Worked around a bug in Google Chrome that caused the 'Save image' link to stop working after having been used once.
-
- 01 Sep, 2015 1 commit
-
-
jhoogenboom authored
Fixed: * Fixed crash that would occur when an empty sequence (primer dimer) is converted from raw to TSSV-style (or allelename) format. * Fixed bug in BGHomRaw that caused incorrect sample tags in the output. * Fixed bug that caused allele names with negative CE numbers and names of primer dimers to be regarded as 'invalid allele names' even though FDSTools generated those names itself. * Fixed crash when reading sample data while looking for an annotation column. * Fixed bug in Allelefinder resulting in the complete absence of output that occurred when a column name with Stuttermark output was specified. Changed: * Restyled the Options box on HTML visualisations. It is now less transparent and oriented more vertically to reduce overlap with the visualisation. Options are now presented in groups. * Updated Vega to version 2.2.1. New: * Added *_corrected columns to BGCorrect output for convenience. E.g., the total_corrected column contains the value of total-total_noise+total_add. * Added -L/--log-scale option to the Vis tool.
-
- 21 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New visualisation Profilevis added to the package, but not yet to the Vis tool. * The Vis tool now prints a helpful error message if no output file was specified, instead of printing half a megabyte of HTML and minified JavaScript to the terminal. * Fixed crash that occurred when attempting to convert the sequence of an alias to its allele name. * Fixed various bugs in the functions that convert sequences to TSSV-style and allele names. Only the conversion of non-matching sequences was affected. * Added "max_expected_copies" section to the FDSTools library format. The default value is 2. Allelefinder will now use these as the maximum number of alleles per marker if the -a/--max-alleles option is not specified. * The section headers in the FDSTools library format are now case insensitive.
-
- 12 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGMerge can be used to merge background noise profiles (e.g., merge BGPredict output with a database previously obtained from BGEstimate). * Fixed two major bugs in BGPredict that resulted in incorrect fit functions being used. * BGEstimate, BGPredict, BGHomStats, Blame, and StutterModel no longer crash if a library file is specified. * Added reverse strand profile estimation to BGPredict.
-
- 11 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGPredict predicts background noise profiles (containing only stutter products) for user-supplied alleles/sequences using a trained stutter model obtained from Stuttermodel. Currently only the amounts of the forward strand are predicted. * New option -L/--min-lengths for Stuttermodel allows to set a minimum required number of unique repeat lengths to base the fits on (default: 5). * Updated formatting of output of Stuttermodel: added '+' sign to positive stutter, limited r2 scores to 3 decimal places, and now all coefficients are written in scientific notation with 3 decimal places. * The --output-column option of SeqConvert now defaults to using the value of --allele-column.
-
- 10 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool StutterModel fits polynomials to stutter ratio vs repeat length. * Changed -R to -Q (--limit-reads) so that I can reassign -R to an option that is used more often. * Changed -r to -R (--report) to make sure it will not collide with the -r option in Stuttermark, if I ever want to add report output to Stuttermark. * BGHomStats now checks whether all alleles are detected
-
- 07 Aug, 2015 1 commit
-
-
jhoogenboom authored
* All tools now write to stdout by default. Tools that support writing report files write those to stderr by default. The -o/--output and -r/--report options can be used to override these. * Tools that operated on one sample at a time (bgcorrect, seqconvert, stuttermark) now support batch processing. The new -i/--input argument takes a list of files. In batch mode, the -o/--output argument can be used to specify a list of corresponding output files (which must be the same length). It is also possible to specify a format string to automatically generate file names. -o/--output defaults to "\1-\2.out" which is automatically expanded to "sampletag-toolname.out". The old positional arguments [IN] and [OUT] are maintained and allow for conveniently running the tools on a single sample file. [IN] is mutually exclusive with -i/--input and [OUT] is mutually exclusive with -o/--output. [OUT] now also accepts the filename format, but when not in batch mode, it still defaults to stdout. Note that by default, the sample tag is extracted from the input filenames by simply stripping the extension. This means a minimal batch processing command like "fdstools stuttermark -i *.csv" automatically creates a "...-stuttermark.out" file next to each CSV file in the current working directory. * Libconvert now also supports only specifying an output file. This makes it easier to write the default FDSTools library to a new file. E.g., "fdstools libconvert mynewfile.txt" now creates "mynewfile.txt" if it does not exist, and writes the default library to it. Most helpful.
-
- 06 Aug, 2015 1 commit
-
-
jhoogenboom authored
* All tools now have a longer description in the tool-specific help page. * Arguments are now presented in groups and the order is the same across tools. Furthermore: * Fixed bug that rendered BGHomStats and BGEstimate with the -H option useless. * The report of Allelefinder and BGEstimate is now written to sys.stderr by default. This means the report is now always generated (but it may be sent directly to /dev/null explicitly by the user). The big plus is that the progress of the tools is visible in the terminal when the tools are run by hand.
-
- 05 Aug, 2015 1 commit
-
-
jhoogenboom authored
-
- 04 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool Blame can be used to find particularly dirty samples and to construct a DNA profile of the contaminator. * Fixed bug BGCorrect that resulted in incorrect values in the *_add columns. * BGEstimate and BGHomStats no longer crash if a library file is provided. * SeqConvert can now use a different library file for the output, thereby offering some possibilities to update allele names when a library file gets updated. * Replaced various uses of map() by generator expressions and listcomps for increased readability speed (although slightly).
-
- 03 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGHomStats computes statistics (minimum, maximum, mean, and sample variance) of noise ratios in homozygous samples. * The default BGEstimate output format has been changed to be compatible with that of BGHomStats. The cross-tabular output format is still available as an option because it easily uses 90% less disk space. BGCorrect (and other future tools that use noise profiles) will work with both formats. * Fixed bug in the --min-samples option of BGEstimate that could cause some alleles with less than the specified number of samples to be included if --drop-samples is used at the same time. * The user now receives an error message if there are unknown arguments. The error message lists the usage string of the requested tool. (Argparse's default was to print the general FDSTools usage string, which is not helpful.)
-
- 31 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Unknown arguments are now silently ignored. If this results in the tool not being able to run, the usage information of the tool is printed instead of the general fdstools usage. * Seqconvert no longer crashes on an empty line in the input. * Libconvert now maintains the order of prefix/suffix sequences. * Allele names with aliases other than 'X' or 'Y' are now correctly recognised. These were previously rejected as 'unknown format'. * Fixed bug where a prefix/suffix other than the first listed in the library file was sometimes used as the canonical sequence. * Sequence format conversion from raw to TSSV-style sequences now attempts to match the prefix, suffix, and STR pattern to non-matching sequences on a best effort basis. This is especially useful when converting to allelenames (which is done via TSSV-style sequences), since it results in an allele name that matches more closely the names of other alleles. * Generating allele names for sequences that lack a prefix and/or suffix is now supported (by adding a variant description that deletes the entire prefix/suffix).
-
- 30 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Added BGCorrect tool for filtering noise in case samples. * BGEstimate now writes its output in tab-separated format, instead of JSON. * Small changes to help output formatting.
-
- 29 Jul, 2015 1 commit
-
-
jhoogenboom authored
I could write about all its features here, but instead I will point out some future plans to highlight the things that are possibly not optimal in their current implementation. There are a number of things I plan to change in the future: * The output format is currently JSON, perhaps a carefully designed tabular format is a better choice. The benefit of switching to a tabluar format is that the data can be loaded into e.g. Excel as well. * The profiles are currently produced separately for forward and reverse reads. I would prefer to integrate these into a single computation that estimates allele balance in the heterozygotes using both strands as well. * I would like to add information about strand bias of the alleles as well. The most straightforward way to do this is to set only the forward reads of the true allele to 100 and treat the reverse reads the same as all background products. You will then obtain a number of reverse reads observed for ever 100 forward reads of the true allele. * I think it would be appropriate to make sure the values in the allele balance matrices of each sample ('Ax' in the source code) should add up to 1. For homozygotes, it is currently a scalar 1, the sum of the elements tend to be more than 1. This means that a heterozygous sample has a stronger influence on the profiles than a homozygous sample.
-
- 27 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Allelefinder can now combine data from multiple files into a single sample (this happens when the same sample tag was extracted from their names). * Allelefinder can now automatically convert sequences to a given format (this is optional though). This is particularly useful when combining the knownalleles.csv and newalleles.csv files of a sample. (Note that allelefinder still assumes that the files contain different alleles; no attempt is made to check whether the same allele was represented in multiple files.)
-
- 24 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Fixed crash when attempting to read a TSSV library from sys.stdin. * Various large updates to allelefinder. * libconvert now gives a useful default FDSTools library when given no input.
-
- 23 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Introducing a new, extended library file format to support allele name generation. The new libconvert tool can convert TSSV libraries to the new format and vice versa. * Added functions for converting between raw sequences, TSSV-style sequences, and allele names. * Added global -d/--debug option. Stuttermark updates: * Stuttermark now automatically converts input sequences to TSSV-style if a library is provided. * Stuttermark will no longer crash if there is no 'name' column. Instead, all sequences are taken to belong to the same marker. New tools: * libconvert converts between FDSTools and TSSV library formats. * seqconvert converts between raw sequences, TSSV-style sequences, and allele names. * allelefinder detects the true alleles in reference samples.
-
- 02 Jul, 2015 1 commit
-
-
jhoogenboom authored
FDSTools v0.0.1 with Stuttermark v1.3. Other tools will come later.
-