- 06 Aug, 2015 1 commit
-
-
jhoogenboom authored
* All tools now have a longer description in the tool-specific help page. * Arguments are now presented in groups and the order is the same across tools. Furthermore: * Fixed bug that rendered BGHomStats and BGEstimate with the -H option useless. * The report of Allelefinder and BGEstimate is now written to sys.stderr by default. This means the report is now always generated (but it may be sent directly to /dev/null explicitly by the user). The big plus is that the progress of the tools is visible in the terminal when the tools are run by hand.
-
- 05 Aug, 2015 1 commit
-
-
jhoogenboom authored
-
- 04 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool Blame can be used to find particularly dirty samples and to construct a DNA profile of the contaminator. * Fixed bug BGCorrect that resulted in incorrect values in the *_add columns. * BGEstimate and BGHomStats no longer crash if a library file is provided. * SeqConvert can now use a different library file for the output, thereby offering some possibilities to update allele names when a library file gets updated. * Replaced various uses of map() by generator expressions and listcomps for increased readability speed (although slightly).
-
- 03 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGHomStats computes statistics (minimum, maximum, mean, and sample variance) of noise ratios in homozygous samples. * The default BGEstimate output format has been changed to be compatible with that of BGHomStats. The cross-tabular output format is still available as an option because it easily uses 90% less disk space. BGCorrect (and other future tools that use noise profiles) will work with both formats. * Fixed bug in the --min-samples option of BGEstimate that could cause some alleles with less than the specified number of samples to be included if --drop-samples is used at the same time. * The user now receives an error message if there are unknown arguments. The error message lists the usage string of the requested tool. (Argparse's default was to print the general FDSTools usage string, which is not helpful.)
-
- 31 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Unknown arguments are now silently ignored. If this results in the tool not being able to run, the usage information of the tool is printed instead of the general fdstools usage. * Seqconvert no longer crashes on an empty line in the input. * Libconvert now maintains the order of prefix/suffix sequences. * Allele names with aliases other than 'X' or 'Y' are now correctly recognised. These were previously rejected as 'unknown format'. * Fixed bug where a prefix/suffix other than the first listed in the library file was sometimes used as the canonical sequence. * Sequence format conversion from raw to TSSV-style sequences now attempts to match the prefix, suffix, and STR pattern to non-matching sequences on a best effort basis. This is especially useful when converting to allelenames (which is done via TSSV-style sequences), since it results in an allele name that matches more closely the names of other alleles. * Generating allele names for sequences that lack a prefix and/or suffix is now supported (by adding a variant description that deletes the entire prefix/suffix).
-
- 30 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Added BGCorrect tool for filtering noise in case samples. * BGEstimate now writes its output in tab-separated format, instead of JSON. * Small changes to help output formatting.
-
- 29 Jul, 2015 1 commit
-
-
jhoogenboom authored
I could write about all its features here, but instead I will point out some future plans to highlight the things that are possibly not optimal in their current implementation. There are a number of things I plan to change in the future: * The output format is currently JSON, perhaps a carefully designed tabular format is a better choice. The benefit of switching to a tabluar format is that the data can be loaded into e.g. Excel as well. * The profiles are currently produced separately for forward and reverse reads. I would prefer to integrate these into a single computation that estimates allele balance in the heterozygotes using both strands as well. * I would like to add information about strand bias of the alleles as well. The most straightforward way to do this is to set only the forward reads of the true allele to 100 and treat the reverse reads the same as all background products. You will then obtain a number of reverse reads observed for ever 100 forward reads of the true allele. * I think it would be appropriate to make sure the values in the allele balance matrices of each sample ('Ax' in the source code) should add up to 1. For homozygotes, it is currently a scalar 1, the sum of the elements tend to be more than 1. This means that a heterozygous sample has a stronger influence on the profiles than a homozygous sample.
-
- 27 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Allelefinder can now combine data from multiple files into a single sample (this happens when the same sample tag was extracted from their names). * Allelefinder can now automatically convert sequences to a given format (this is optional though). This is particularly useful when combining the knownalleles.csv and newalleles.csv files of a sample. (Note that allelefinder still assumes that the files contain different alleles; no attempt is made to check whether the same allele was represented in multiple files.)
-
- 24 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Fixed crash when attempting to read a TSSV library from sys.stdin. * Various large updates to allelefinder. * libconvert now gives a useful default FDSTools library when given no input.
-
- 23 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Introducing a new, extended library file format to support allele name generation. The new libconvert tool can convert TSSV libraries to the new format and vice versa. * Added functions for converting between raw sequences, TSSV-style sequences, and allele names. * Added global -d/--debug option. Stuttermark updates: * Stuttermark now automatically converts input sequences to TSSV-style if a library is provided. * Stuttermark will no longer crash if there is no 'name' column. Instead, all sequences are taken to belong to the same marker. New tools: * libconvert converts between FDSTools and TSSV library formats. * seqconvert converts between raw sequences, TSSV-style sequences, and allele names. * allelefinder detects the true alleles in reference samples.
-
- 02 Jul, 2015 2 commits
-
-
jhoogenboom authored
-
jhoogenboom authored
FDSTools v0.0.1 with Stuttermark v1.3. Other tools will come later.
-