- 03 Dec, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * In Samplevis HTML visualisations, the automatic allele selection was only checking the number of reverse reads for the 'minimum number of reads per orientation' setting. * In Samplevis HTML visualisations, automatic allele selection would fail to select alleles that had exactly the given minimum number of reads. * FDSTools would sometimes calculate incorrect and even negative repeat counts when producing TSSV-style sequences and allele names for sequences that did not exactly fit the STR structure given in the library. Improved: * The Samplestats tool now offers the same possibilities to mark alleles as Samplevis HTML visualisations do. * In Samplevis HTML visualisations, user-removed alleles now have a line through their table row. * Added a reference to https://docs.python.org/howto/regex in the sample tag parsing options section of the help text of many tools. * FDSTools will now do a better job of finding the longest possible match of the STR repeat definition to produce TSSV-style sequences and allele names for seqences that do not exactly fit the STR structure given in the library. Added: * New visualisation type 'allele'. With Allelevis, you can generate a graph of the alleles of the reference samples (output from Allelefinder). (Known bug: it has a 'funny' amount of padding.)
-
- 01 Dec, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * The Vis tool no longer crashes if you specify '-' as the input file without piping data in from another program. It will just produce a visualisation file with no embedded data instead. * FDSTools would crash when generating an allele name for a sequence of an STR marker that contained the prefix and suffix of the marker but not the actual STR (yes, this happened). * Stuttermodelvis would draw all 'All data' fits in the graphs of all repeat unit sequences, instead of just the 'All data' fit that was fitted to the data of a particular repeat sequence. Improved: * BGHomStats, BGHomRaw, and Samplestats now round their output to three significant digits. * BGCorrect now rounds its output to 3 decimal positions. Various enhancements to Samplevis HTML visualisations: * Added a whole new set of options which are used to automatically select the true alleles in a sample. * Added an option to split the graphs and the table up per marker. * The selected alleles are no longer lost when the graphs are re-rendered due to changed options. * Added some more columns to the table of selected alleles and made the table prettier. * Added a dedicated stylesheet for printing, which transforms the web page into a nicely formatted report when printed. * Option groups can now be hidden separately. * Filtering options are now based on the read numbers after correction. * The mouse cursor now changes to a 'pointer' style cursor (usually a hand with stretched index finger) when hovered over the clickable portion of the graph. Visualisations: * Updated Vega to version 2.4.0 and d3 to version 3.5.10. * All visualisations now use signals to set the options. This allows them to be updated without re-parsing the entire graph spec in most cases, which is much faster. * Using new cross-and-filter capabilities in bgrawvis, profilevis, samplevis, and stuttermodelvis. This greatly reduces Vega's memory usage and speeds up rendering. * The name of the currently loaded data file is prepended to the page title in HTML visualisations. * If a file is loaded into an HTML visualisation by drag-and-drop, the name of the loaded file is displayed on the file input element. * A new -T/--title option for the Vis tool allows for specifying something that should be prepended to the page title of HTML visualisations. This is particularly useful when data is piped in, because no file name is available in that case. * Asynchronous rendering of visualisations is now cancelled if a new asynchronous rendering task has already been scheduled (HTML visualisations only).
-
- 23 Nov, 2015 1 commit
-
-
Hoogenboom, Jerry authored
* New tool Samplestats computes various sequence-centric statistics for sample data files. Most statistics relate to correction amounts and are thus only included if the input file contains BGCorrect columns. * The starting position can now be ommitted from the [genome_position] in FDSTools library files. A default value of 1 will be used in this case. * The setup.py script can now also be run without explicitly specifying Python as the interpreter (it now has a shebang line).
-
- 16 Nov, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * The 'to' base in variants called on mtDNA was incorrect. This bug could also cause FDSTools to crash. * FDSTools would crash if you tried to generate an allele name for a primer dimer of an mtDNA marker. (Now, you get an insane but entirely accurate allele name instead.) * Fixed bug that caused some perfectly valid mtDNA allele names to be rejected when attempting to convert them back to raw sequences. Improved: * You can now also specify the ending position of the markers in the FDSTools library. If you do, you may also additionally specify a second start position (and optionally also a second end position, and so on). FDSTools will interpret this as that the marker is the concatenation of each of these fragments. This was primarily introduced to support mtDNA fragments that contain (somewhere in the middle) the origin of mtDNA base numbering. * More helpful error message when format violations are detected while parsing the library file. * More helpful error message when the -e/--tag-expr regular expression could not be compiled. * Added a paragraph about sequence alignment caching to the help text of Seqconvert. * Added a 'flags' column to BGCorrect output, which gives information about the data that was used to do the correction. Background noise profiles: * Removed -C/--cross-tabular option from BGEstimate, BGPredict, and BGMerge and also removed the ability to read files in this format. * BGEstimate, BGHomStats, and BGPredict now add a column 'tool' with their name to the output.
-
- 04 Nov, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Additions and improvements to the FDSTools library file format: * New [genome_position] section in FDSTools-style library files allows for specifying the chromosome and position of each marker. * New [no_repeat] section in FDSTools-style library files allows for including non-STR markers. * Comma/semicolon/space-separated values in FDSTools-style library files can now also be separated by tab characters and multiple consecutive separators are no longer collapsed (with the exception of whitespace). * If no prefix and/or suffix has been specified for an alias, the prefix/suffix of the marker itself is used. * Implemented support for non-STR markers (e.g. SNP clusters) and mtDNA markers. Allele names of the latter follow mtDNA nomenclature. * Improved the logic of generating STR allele names for sequences that have a prefix or suffix sequence that was not included in the library file. * Updated and clarified various explanatory texts in generated FDSTools library files. Fixed: * Fixed a bug that caused prefix/suffix variants in aliases to go missing in allele names. Improved file handling: * Library files are now closed immediately after parsing them. * Sample data input files are opened one at a time now. Visualisations: * Updated Vega to version 2.3.1. * Worked around a bug in Google Chrome that caused the 'Save image' link to stop working after having been used once.
-
- 01 Sep, 2015 1 commit
-
-
jhoogenboom authored
Fixed: * Fixed crash that would occur when an empty sequence (primer dimer) is converted from raw to TSSV-style (or allelename) format. * Fixed bug in BGHomRaw that caused incorrect sample tags in the output. * Fixed bug that caused allele names with negative CE numbers and names of primer dimers to be regarded as 'invalid allele names' even though FDSTools generated those names itself. * Fixed crash when reading sample data while looking for an annotation column. * Fixed bug in Allelefinder resulting in the complete absence of output that occurred when a column name with Stuttermark output was specified. Changed: * Restyled the Options box on HTML visualisations. It is now less transparent and oriented more vertically to reduce overlap with the visualisation. Options are now presented in groups. * Updated Vega to version 2.2.1. New: * Added *_corrected columns to BGCorrect output for convenience. E.g., the total_corrected column contains the value of total-total_noise+total_add. * Added -L/--log-scale option to the Vis tool.
-
- 21 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New visualisation Profilevis added to the package, but not yet to the Vis tool. * The Vis tool now prints a helpful error message if no output file was specified, instead of printing half a megabyte of HTML and minified JavaScript to the terminal. * Fixed crash that occurred when attempting to convert the sequence of an alias to its allele name. * Fixed various bugs in the functions that convert sequences to TSSV-style and allele names. Only the conversion of non-matching sequences was affected. * Added "max_expected_copies" section to the FDSTools library format. The default value is 2. Allelefinder will now use these as the maximum number of alleles per marker if the -a/--max-alleles option is not specified. * The section headers in the FDSTools library format are now case insensitive.
-
- 12 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGMerge can be used to merge background noise profiles (e.g., merge BGPredict output with a database previously obtained from BGEstimate). * Fixed two major bugs in BGPredict that resulted in incorrect fit functions being used. * BGEstimate, BGPredict, BGHomStats, Blame, and StutterModel no longer crash if a library file is specified. * Added reverse strand profile estimation to BGPredict.
-
- 11 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGPredict predicts background noise profiles (containing only stutter products) for user-supplied alleles/sequences using a trained stutter model obtained from Stuttermodel. Currently only the amounts of the forward strand are predicted. * New option -L/--min-lengths for Stuttermodel allows to set a minimum required number of unique repeat lengths to base the fits on (default: 5). * Updated formatting of output of Stuttermodel: added '+' sign to positive stutter, limited r2 scores to 3 decimal places, and now all coefficients are written in scientific notation with 3 decimal places. * The --output-column option of SeqConvert now defaults to using the value of --allele-column.
-
- 10 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool StutterModel fits polynomials to stutter ratio vs repeat length. * Changed -R to -Q (--limit-reads) so that I can reassign -R to an option that is used more often. * Changed -r to -R (--report) to make sure it will not collide with the -r option in Stuttermark, if I ever want to add report output to Stuttermark. * BGHomStats now checks whether all alleles are detected
-
- 07 Aug, 2015 1 commit
-
-
jhoogenboom authored
* All tools now write to stdout by default. Tools that support writing report files write those to stderr by default. The -o/--output and -r/--report options can be used to override these. * Tools that operated on one sample at a time (bgcorrect, seqconvert, stuttermark) now support batch processing. The new -i/--input argument takes a list of files. In batch mode, the -o/--output argument can be used to specify a list of corresponding output files (which must be the same length). It is also possible to specify a format string to automatically generate file names. -o/--output defaults to "\1-\2.out" which is automatically expanded to "sampletag-toolname.out". The old positional arguments [IN] and [OUT] are maintained and allow for conveniently running the tools on a single sample file. [IN] is mutually exclusive with -i/--input and [OUT] is mutually exclusive with -o/--output. [OUT] now also accepts the filename format, but when not in batch mode, it still defaults to stdout. Note that by default, the sample tag is extracted from the input filenames by simply stripping the extension. This means a minimal batch processing command like "fdstools stuttermark -i *.csv" automatically creates a "...-stuttermark.out" file next to each CSV file in the current working directory. * Libconvert now also supports only specifying an output file. This makes it easier to write the default FDSTools library to a new file. E.g., "fdstools libconvert mynewfile.txt" now creates "mynewfile.txt" if it does not exist, and writes the default library to it. Most helpful.
-
- 06 Aug, 2015 1 commit
-
-
jhoogenboom authored
* All tools now have a longer description in the tool-specific help page. * Arguments are now presented in groups and the order is the same across tools. Furthermore: * Fixed bug that rendered BGHomStats and BGEstimate with the -H option useless. * The report of Allelefinder and BGEstimate is now written to sys.stderr by default. This means the report is now always generated (but it may be sent directly to /dev/null explicitly by the user). The big plus is that the progress of the tools is visible in the terminal when the tools are run by hand.
-
- 05 Aug, 2015 1 commit
-
-
jhoogenboom authored
-
- 04 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool Blame can be used to find particularly dirty samples and to construct a DNA profile of the contaminator. * Fixed bug BGCorrect that resulted in incorrect values in the *_add columns. * BGEstimate and BGHomStats no longer crash if a library file is provided. * SeqConvert can now use a different library file for the output, thereby offering some possibilities to update allele names when a library file gets updated. * Replaced various uses of map() by generator expressions and listcomps for increased readability speed (although slightly).
-
- 03 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGHomStats computes statistics (minimum, maximum, mean, and sample variance) of noise ratios in homozygous samples. * The default BGEstimate output format has been changed to be compatible with that of BGHomStats. The cross-tabular output format is still available as an option because it easily uses 90% less disk space. BGCorrect (and other future tools that use noise profiles) will work with both formats. * Fixed bug in the --min-samples option of BGEstimate that could cause some alleles with less than the specified number of samples to be included if --drop-samples is used at the same time. * The user now receives an error message if there are unknown arguments. The error message lists the usage string of the requested tool. (Argparse's default was to print the general FDSTools usage string, which is not helpful.)
-
- 31 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Unknown arguments are now silently ignored. If this results in the tool not being able to run, the usage information of the tool is printed instead of the general fdstools usage. * Seqconvert no longer crashes on an empty line in the input. * Libconvert now maintains the order of prefix/suffix sequences. * Allele names with aliases other than 'X' or 'Y' are now correctly recognised. These were previously rejected as 'unknown format'. * Fixed bug where a prefix/suffix other than the first listed in the library file was sometimes used as the canonical sequence. * Sequence format conversion from raw to TSSV-style sequences now attempts to match the prefix, suffix, and STR pattern to non-matching sequences on a best effort basis. This is especially useful when converting to allelenames (which is done via TSSV-style sequences), since it results in an allele name that matches more closely the names of other alleles. * Generating allele names for sequences that lack a prefix and/or suffix is now supported (by adding a variant description that deletes the entire prefix/suffix).
-
- 30 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Added BGCorrect tool for filtering noise in case samples. * BGEstimate now writes its output in tab-separated format, instead of JSON. * Small changes to help output formatting.
-
- 29 Jul, 2015 1 commit
-
-
jhoogenboom authored
I could write about all its features here, but instead I will point out some future plans to highlight the things that are possibly not optimal in their current implementation. There are a number of things I plan to change in the future: * The output format is currently JSON, perhaps a carefully designed tabular format is a better choice. The benefit of switching to a tabluar format is that the data can be loaded into e.g. Excel as well. * The profiles are currently produced separately for forward and reverse reads. I would prefer to integrate these into a single computation that estimates allele balance in the heterozygotes using both strands as well. * I would like to add information about strand bias of the alleles as well. The most straightforward way to do this is to set only the forward reads of the true allele to 100 and treat the reverse reads the same as all background products. You will then obtain a number of reverse reads observed for ever 100 forward reads of the true allele. * I think it would be appropriate to make sure the values in the allele balance matrices of each sample ('Ax' in the source code) should add up to 1. For homozygotes, it is currently a scalar 1, the sum of the elements tend to be more than 1. This means that a heterozygous sample has a stronger influence on the profiles than a homozygous sample.
-
- 27 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Allelefinder can now combine data from multiple files into a single sample (this happens when the same sample tag was extracted from their names). * Allelefinder can now automatically convert sequences to a given format (this is optional though). This is particularly useful when combining the knownalleles.csv and newalleles.csv files of a sample. (Note that allelefinder still assumes that the files contain different alleles; no attempt is made to check whether the same allele was represented in multiple files.)
-
- 24 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Fixed crash when attempting to read a TSSV library from sys.stdin. * Various large updates to allelefinder. * libconvert now gives a useful default FDSTools library when given no input.
-
- 23 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Introducing a new, extended library file format to support allele name generation. The new libconvert tool can convert TSSV libraries to the new format and vice versa. * Added functions for converting between raw sequences, TSSV-style sequences, and allele names. * Added global -d/--debug option. Stuttermark updates: * Stuttermark now automatically converts input sequences to TSSV-style if a library is provided. * Stuttermark will no longer crash if there is no 'name' column. Instead, all sequences are taken to belong to the same marker. New tools: * libconvert converts between FDSTools and TSSV library formats. * seqconvert converts between raw sequences, TSSV-style sequences, and allele names. * allelefinder detects the true alleles in reference samples.
-
- 02 Jul, 2015 1 commit
-
-
jhoogenboom authored
FDSTools v0.0.1 with Stuttermark v1.3. Other tools will come later.
-