- 04 Feb, 2016 2 commits
-
-
Hoogenboom, Jerry authored
Fixed: * Fixed a javascript crash in Samplevis HTML Visualisations. * Converted two unexpected tab characters to spaces in README.rst. Improved: * Samplestats will now sort the output by marker name.
-
Hoogenboom, Jerry authored
Improved: * Added -A/--aggregate-below-minimum option to the TSSV tool. This will add a line with 'Other sequences' to the output summing all sequences that were not reported because they had less reads than was specified with the -a/--minimum option. * Clarified the help text for the -D/--dir option of the TSSV tool. Fixed: * Updated all tools to consistently handle cases where 'No data' or 'Other sequences' occurs in place of a sequence.
-
- 02 Feb, 2016 1 commit
-
-
Hoogenboom, Jerry authored
Updated Stuttermark to v1.5. WARNING: This version of Stuttermark is INCOMPATIBLE with output from previous versions of FDSTools and TSSV. Introducing TSSV-Lite * New tool tssv acts as a wrapper around TSSV-Lite (tssvl). Its primary purpose is to allow running TSSV-Lite without having to convert the FDSTools library to TSSV format, and to offer allelename output. Like all other tools in FDSTools, it also works with TSSV library files but its allele name generation capabilities are limited in that case. Changed: * TSSV-Lite and the new TSSV tool in FDSTools have two columns renamed w.r.t. the original TSSV program: 'name' has been changed to 'marker', and 'allele' has been changed to 'sequence'. All tools in FDSTools have been updated to use the new column names. This change affects Allelefinder, BGCorrect, BGEstimate, BGHomRaw, BGHomStats, BGPredict, Blame, Samplestats, Samplevis, Stuttermark, Stuttermodel, and Seqconvert. Note that this change will BREAK COMPATIBILITY of these tools with old data files. Fixed: * In Samplevis HTML visualisations, the "percentage recovery" table filtering option used the absolute number of recovered reads instead. * Added PctRecovery to the tables in Samplevis HTML visualisations. * BGPredict will now print a nice error message if the -n/--min-pct option is set to zero or a negative number, to avoid division by zero. * Samplestats would crash if the input file contained the flags column. * FDSTools would crash when trying to convert sequences to allele names using a TSSV library. Improved: * Libconvert will no longer include duplicate sequences in the STR defenition when converting to TSSV format and the reference sequence of one of the markers is the same as one of its aliases, or when aliases of one marker share one or more prefix or suffix sequences. * Updated add_input_output_args() such that the output file is a positional argument (instead of -o) for tools that have a single input file and no support for batches. * Updated add_sequence_format_args() such that the library file can be made a required argument. * Refined the FDSTools package description, since FDSTools does more than just noise filteirng. * FDSTools will now do a marginally better job at producing allele names for sequences that do not exactly match the provided STR pattern. When seeking the longest matching portion of the sequence, it will now also test the reversed sequence with a reversed pattern, which sometimes yields a longer match. It is still not optimal, though, but some refactoring has been done to move away from regular expressions. * BGCorrect will now also fill in correction_flags for newly added sequences. * Adjusted the help text of Samplestats to include the fact that the -c and -y options have an OR relation instead of an AND relation. * BGCorrect, BGEstimate, BGHomRaw, BGHomStats, BGPredict, and Stuttermodel will now ignore special values that may appear in the place of a sequence (currently: 'Other sequences' and 'No data'). Removed: * The -m/--marker-column and -a/--allele-column arguments of BGPredict had no effect and have been removed. Visualisations: * Updated bundled D3 to v3.5.12. * In HTML visualisations, if the page is scrolled to the right edge when an option is changed that causes the graphs to become wider, the page now remains scrolled to the right. * Samplevis HTML visualisations: * Added 'Clear manually added/removed' link to the table filtering. * Reduced flicker of the mouse cursor in Internet Explorer. * Added 'Common axis range' checkbox (only available when 'Split markers' is off). * Added 'Save table' link to save the table of selected alleles to a tab-separated file. * Added 'PctRecovery' column to the tables of selected alleles. * An alert box is now shown when a data file is loaded that contains markers that have 'No data'. * Added 'Percentage of total reads' to the graph filtering options. * Added a note to the table filtering options to explain that the minimum percentage correction and recovery have an OR relation.
-
- 18 Jan, 2016 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * Fixed a crash in BGMerge. * Fixed bug in BGCorrect that resulted in incorrect values in the *_add and *_corrected columns (yes, you, 8685a304). * Fixed a glitch in BGCorrect that prevented it from ever writing corrected_bgestimate in the correction_flags column. Improved: * BGEstimate will now include the sample tag in the error messages for missing alleles and alleles with 0 reads. * Strand bias lines in Samplevis are now clamped to the 0-100% range. BGCorrect may cause forward read percentages outside this range. Visualisations: * Updated Vega to version 2.4.2. * Fixed drag-'n-drop behaviour for HTML visualisations in Internet Explorer and Firefox. * Fixed the Save Image link when viewing HTML visualisations in Internet Explorer 10 and above. * Added http-equiv="X-UA-Compatible" content="IE=edge" meta-tag to all visualisations to prevent Internet Explorer from entering quirks mode. * Samplevis: * Fixed glitch that would sometimes cause a second horizontal scroll bar to appear. * Graphs now render much more quickly when 'Split markers' is on, and Chrome no longer crashes on large sample files with this option set.
-
- 09 Dec, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * When converting STR allele names to sequences, FDSTools would reject any prefix variants with a false message stating that the variant does not match the reference sequence. * The Samplestats tool would not allow the -b/--min-per-strand option to be set to zero. Improved: * Moved the flags generated by BGCorrect to a new column named correction_flags. Some of the values have been renamed for clarity, and this column now always contains a value. * The Samplestats tool will no longer add the not_corrected flag to each sequence, as it does not add the correction_flags column. * The Samplestats tool now supports filtering sequences. For filtering, the same set of options is available as those used for marking alleles. The filtering options use upper case letters and have '-filt' appended to their long name. The new -a/--filter-action option defines what should be done with filtered sequencies. 'off', the default, disables filtering; 'combine' replaces filtered sequences with a new line containing aggregated data; 'delete' removes filtered sequences without leaving a trace. * The seqconvert tool is aware of the special 'Other sequences' value produced by Samplestats with -a/--filter-action set to 'combine'. Other tools will give an informative error message when the input contains this special value. * The Samplestats tool now accepts non-integer and negative numbers for -n/--min-reads and -b/--min-per-strand because after correction read counts are not necessarily nonnegative integers anymore. * The forward_correction and reverse_correction columns of Samplestats will now contain 0 if the sequence had exactly 0 reads both before and after correction (previously, this was -100). * Renamed the _mp columns of Samplestats to _mp_sum ("per-marker percentage of the sum") and introduced _mp_max columns ("per-marker percentage of the maximum"). * Samplestats and Samplevis HTML visualisations will now mark a sequence as 'allele' if the minimum amount of correction OR the minimum number of recovered reads is reached (as opposed to AND). This allows alleles on stutter positions to be detected. Changed: * The -r/--min-recovery option of Samplestats has been renamed to -y/--min-recovery, analogous to the new -Y/--min-recovery-filt. Visualisations: * Updated Vega to version 2.4.1. * Replaced the regular expression-based filters in all visualisations with a much simpler syntax. The new syntax uses space-separated search terms, defaulting to a 'contains'-type search method. If any search term is preceded by an equals sign, that term must be matched exactly. (The search terms themselves are actually still matched as regexes!) * Added 'show negative alleles' option (default on) to Samplevis. When enabled, the graph filtering options work on abs(value) instead of the value itself. * When sorting alleles in Samplevis, the allele name is now used as the final tiebreaker instead of the primary sorting column. * HTML visualisations no longer re-render the entire graph when changing the width. The same holds true for the height setting of Allelevis. * The tables in Samplevis HTML visualisations will now contain the information from BGCorrect's correction_flags column in the Notes column.
-
- 03 Dec, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * In Samplevis HTML visualisations, the automatic allele selection was only checking the number of reverse reads for the 'minimum number of reads per orientation' setting. * In Samplevis HTML visualisations, automatic allele selection would fail to select alleles that had exactly the given minimum number of reads. * FDSTools would sometimes calculate incorrect and even negative repeat counts when producing TSSV-style sequences and allele names for sequences that did not exactly fit the STR structure given in the library. Improved: * The Samplestats tool now offers the same possibilities to mark alleles as Samplevis HTML visualisations do. * In Samplevis HTML visualisations, user-removed alleles now have a line through their table row. * Added a reference to https://docs.python.org/howto/regex in the sample tag parsing options section of the help text of many tools. * FDSTools will now do a better job of finding the longest possible match of the STR repeat definition to produce TSSV-style sequences and allele names for seqences that do not exactly fit the STR structure given in the library. Added: * New visualisation type 'allele'. With Allelevis, you can generate a graph of the alleles of the reference samples (output from Allelefinder). (Known bug: it has a 'funny' amount of padding.)
-
- 01 Dec, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Fixed: * The Vis tool no longer crashes if you specify '-' as the input file without piping data in from another program. It will just produce a visualisation file with no embedded data instead. * FDSTools would crash when generating an allele name for a sequence of an STR marker that contained the prefix and suffix of the marker but not the actual STR (yes, this happened). * Stuttermodelvis would draw all 'All data' fits in the graphs of all repeat unit sequences, instead of just the 'All data' fit that was fitted to the data of a particular repeat sequence. Improved: * BGHomStats, BGHomRaw, and Samplestats now round their output to three significant digits. * BGCorrect now rounds its output to 3 decimal positions. Various enhancements to Samplevis HTML visualisations: * Added a whole new set of options which are used to automatically select the true alleles in a sample. * Added an option to split the graphs and the table up per marker. * The selected alleles are no longer lost when the graphs are re-rendered due to changed options. * Added some more columns to the table of selected alleles and made the table prettier. * Added a dedicated stylesheet for printing, which transforms the web page into a nicely formatted report when printed. * Option groups can now be hidden separately. * Filtering options are now based on the read numbers after correction. * The mouse cursor now changes to a 'pointer' style cursor (usually a hand with stretched index finger) when hovered over the clickable portion of the graph. Visualisations: * Updated Vega to version 2.4.0 and d3 to version 3.5.10. * All visualisations now use signals to set the options. This allows them to be updated without re-parsing the entire graph spec in most cases, which is much faster. * Using new cross-and-filter capabilities in bgrawvis, profilevis, samplevis, and stuttermodelvis. This greatly reduces Vega's memory usage and speeds up rendering. * The name of the currently loaded data file is prepended to the page title in HTML visualisations. * If a file is loaded into an HTML visualisation by drag-and-drop, the name of the loaded file is displayed on the file input element. * A new -T/--title option for the Vis tool allows for specifying something that should be prepended to the page title of HTML visualisations. This is particularly useful when data is piped in, because no file name is available in that case. * Asynchronous rendering of visualisations is now cancelled if a new asynchronous rendering task has already been scheduled (HTML visualisations only).
-
- 23 Nov, 2015 1 commit
-
-
Hoogenboom, Jerry authored
* New tool Samplestats computes various sequence-centric statistics for sample data files. Most statistics relate to correction amounts and are thus only included if the input file contains BGCorrect columns. * The starting position can now be ommitted from the [genome_position] in FDSTools library files. A default value of 1 will be used in this case. * The setup.py script can now also be run without explicitly specifying Python as the interpreter (it now has a shebang line).
-
- 16 Nov, 2015 2 commits
-
-
Hoogenboom, Jerry authored
-
Hoogenboom, Jerry authored
Fixed: * The 'to' base in variants called on mtDNA was incorrect. This bug could also cause FDSTools to crash. * FDSTools would crash if you tried to generate an allele name for a primer dimer of an mtDNA marker. (Now, you get an insane but entirely accurate allele name instead.) * Fixed bug that caused some perfectly valid mtDNA allele names to be rejected when attempting to convert them back to raw sequences. Improved: * You can now also specify the ending position of the markers in the FDSTools library. If you do, you may also additionally specify a second start position (and optionally also a second end position, and so on). FDSTools will interpret this as that the marker is the concatenation of each of these fragments. This was primarily introduced to support mtDNA fragments that contain (somewhere in the middle) the origin of mtDNA base numbering. * More helpful error message when format violations are detected while parsing the library file. * More helpful error message when the -e/--tag-expr regular expression could not be compiled. * Added a paragraph about sequence alignment caching to the help text of Seqconvert. * Added a 'flags' column to BGCorrect output, which gives information about the data that was used to do the correction. Background noise profiles: * Removed -C/--cross-tabular option from BGEstimate, BGPredict, and BGMerge and also removed the ability to read files in this format. * BGEstimate, BGHomStats, and BGPredict now add a column 'tool' with their name to the output.
-
- 05 Nov, 2015 1 commit
-
-
Hoogenboom, Jerry authored
* The legend items now have more descriptive names and also include the strand balance line. * Added 5% to the X axis scale, to reassure people that they are really seeing the entire bar. * The X axis scale limits are now rounded to a 'nice' number and includes a label for its ending tick mark. * Added filter for a minimum number of reads per orientation. * Strand bias line now becomes red if less than a certain percentage (default 25) of reads is on one strand, and a star is placed at the end of the allele's bar. * Alleles can be selected (and deselectd) by clicking them (HTML only). * Selected alleles appear with a green italic allele name. * Data for selected alleles is summarised in a table at the bottom of the page. This is WIP and currently only includes the read total.
-
- 04 Nov, 2015 1 commit
-
-
Hoogenboom, Jerry authored
Additions and improvements to the FDSTools library file format: * New [genome_position] section in FDSTools-style library files allows for specifying the chromosome and position of each marker. * New [no_repeat] section in FDSTools-style library files allows for including non-STR markers. * Comma/semicolon/space-separated values in FDSTools-style library files can now also be separated by tab characters and multiple consecutive separators are no longer collapsed (with the exception of whitespace). * If no prefix and/or suffix has been specified for an alias, the prefix/suffix of the marker itself is used. * Implemented support for non-STR markers (e.g. SNP clusters) and mtDNA markers. Allele names of the latter follow mtDNA nomenclature. * Improved the logic of generating STR allele names for sequences that have a prefix or suffix sequence that was not included in the library file. * Updated and clarified various explanatory texts in generated FDSTools library files. Fixed: * Fixed a bug that caused prefix/suffix variants in aliases to go missing in allele names. Improved file handling: * Library files are now closed immediately after parsing them. * Sample data input files are opened one at a time now. Visualisations: * Updated Vega to version 2.3.1. * Worked around a bug in Google Chrome that caused the 'Save image' link to stop working after having been used once.
-
- 22 Sep, 2015 1 commit
-
-
jhoogenboom authored
Will complete this when updating to Vega 2.2.5 or newer, which contains a new feature that I contributed specifically for this.
-
- 10 Sep, 2015 1 commit
-
-
jhoogenboom authored
* Properly implemented the options on the StuttermodelVis HTML visualisation. * Added filtering options for marker and repeat unit to StuttermodelVis. * Added StuttermodelVis to the Vis tool. General visualisation changes: * Updated Vega to v2.2.4. * Fixed glitch that caused mouseover events in HTML visualisations to stop working after the renderer was switched. * The file name suggested by the Save Image link in HTML visualisations is now derived from the name of the loaded data file.
-
- 04 Sep, 2015 1 commit
-
-
jhoogenboom authored
And thereby removed a dirty workaround for a bug I found.
-
- 03 Sep, 2015 1 commit
-
-
jhoogenboom authored
* Added StuttermodelVis HTML file and JSON spec. The rendering works, but some of the options are not implemented yet. It is also not yet added to the Vis tool. * Changed the order of stuttermodel's coefficients: 'a' used to be the most significant coefficient, now it is the least significant coefficient (the shift). The benefit of this is that when moving to higher-order polynomials, the extra coefficients do not change the meaning of the others. So 'a' is now always the shift, 'b' is the linear component, 'c' the quadratic, etc. * Added some development notes (including todo list) that I had kept outside of the project until now.
-
- 01 Sep, 2015 2 commits
-
-
jhoogenboom authored
* BGCorrect and Stuttermark will now exit with an error message if more than one input file for the same sample is specified and no separate output files are given. Previously these tools would just overwrite the output file repeatedly, discarding the output of all but the last data file of the sample. * Removed to main() functions and related stubs from the tools because they are not actually runnable directly anyway. * Added some more help text to some of the tools. * Doubled the size of the marker name filter input element on the HTML visualisations.
-
jhoogenboom authored
Fixed: * Fixed crash that would occur when an empty sequence (primer dimer) is converted from raw to TSSV-style (or allelename) format. * Fixed bug in BGHomRaw that caused incorrect sample tags in the output. * Fixed bug that caused allele names with negative CE numbers and names of primer dimers to be regarded as 'invalid allele names' even though FDSTools generated those names itself. * Fixed crash when reading sample data while looking for an annotation column. * Fixed bug in Allelefinder resulting in the complete absence of output that occurred when a column name with Stuttermark output was specified. Changed: * Restyled the Options box on HTML visualisations. It is now less transparent and oriented more vertically to reduce overlap with the visualisation. Options are now presented in groups. * Updated Vega to version 2.2.1. New: * Added *_corrected columns to BGCorrect output for convenience. E.g., the total_corrected column contains the value of total-total_noise+total_add. * Added -L/--log-scale option to the Vis tool.
-
- 26 Aug, 2015 1 commit
-
-
jhoogenboom authored
* Added new visualisation BGRawVis to the Vis tool. It visualises BGHomRaw output data. * Now using more reliable linear X axis label formatting in Profilevis. * Changed filtering operands in Profilevis and Samplevis from > to >=.
-
- 25 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGHomRaw computes noise ratios for all detected noise in all homozygous reference samples. The idea is to plot this data in a visualisation that will be added later.
-
- 24 Aug, 2015 1 commit
-
-
jhoogenboom authored
* Added options for the graph width and filtering on marker name to Samplevis and Profilevis. * The text fields in the HTML versions of Samplevs and Profilevis now update the graph OnChange instead of OnKeyUp. This is done because rendering the graph takes a while with large data files. * Fixed glitch in Profilevis that caused useless horizontal axis labels when the logarithmic scale is used. * Fixed glitch in Profilevis that caused Vega to render the graph even before data was loaded. * Changed -R option of SeqConvert to -r to avoid a potential collision with the -R/--report option if SeqConvert ever gets report output support in the future.
-
- 21 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New visualisation Profilevis added to the package, but not yet to the Vis tool. * The Vis tool now prints a helpful error message if no output file was specified, instead of printing half a megabyte of HTML and minified JavaScript to the terminal. * Fixed crash that occurred when attempting to convert the sequence of an alias to its allele name. * Fixed various bugs in the functions that convert sequences to TSSV-style and allele names. Only the conversion of non-matching sequences was affected. * Added "max_expected_copies" section to the FDSTools library format. The default value is 2. Allelefinder will now use these as the maximum number of alleles per marker if the -a/--max-alleles option is not specified. * The section headers in the FDSTools library format are now case insensitive.
-
- 18 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool Vis creates an interactive visualisation in HTML format, or a bare Vega graph spec (JSON format). The user can choose to supply a data file that will be embedded in the visualisation file. If no data file is given, the HTML visualisation will offer a file selection element, or the bare JSON output will refer to a file called 'data.csv'. * Changes to Samplevis: * The Options box can now be opened/closed. * Added options to change the width of the bars and the space between subgraphs (markers). * Added options to filter by read count or percentage vs the highest allele of the marker. * Replaced deprecated 'zip' data transforms in the Vega spec with the new 'lookup' transform. * Updated bundled Vega to v2.1.1.
-
- 14 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New visualisation Samplevis visualises sample data files. (Note: visualisations are currently stored in the package, but are not available via FDSTools commands yet. A new tool is going to be introduced later, which will copy the visualisation files to a user-selected folder.) * Including the current versions of Vega and D3 for completeness. * Fixed missing numpy dependency in setup.py. * Clarified some option help texts in Allelefinder based on feedback by Rick and Kris.
-
- 12 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGMerge can be used to merge background noise profiles (e.g., merge BGPredict output with a database previously obtained from BGEstimate). * Fixed two major bugs in BGPredict that resulted in incorrect fit functions being used. * BGEstimate, BGPredict, BGHomStats, Blame, and StutterModel no longer crash if a library file is specified. * Added reverse strand profile estimation to BGPredict.
-
- 11 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGPredict predicts background noise profiles (containing only stutter products) for user-supplied alleles/sequences using a trained stutter model obtained from Stuttermodel. Currently only the amounts of the forward strand are predicted. * New option -L/--min-lengths for Stuttermodel allows to set a minimum required number of unique repeat lengths to base the fits on (default: 5). * Updated formatting of output of Stuttermodel: added '+' sign to positive stutter, limited r2 scores to 3 decimal places, and now all coefficients are written in scientific notation with 3 decimal places. * The --output-column option of SeqConvert now defaults to using the value of --allele-column.
-
- 10 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool StutterModel fits polynomials to stutter ratio vs repeat length. * Changed -R to -Q (--limit-reads) so that I can reassign -R to an option that is used more often. * Changed -r to -R (--report) to make sure it will not collide with the -r option in Stuttermark, if I ever want to add report output to Stuttermark. * BGHomStats now checks whether all alleles are detected
-
- 07 Aug, 2015 1 commit
-
-
jhoogenboom authored
* All tools now write to stdout by default. Tools that support writing report files write those to stderr by default. The -o/--output and -r/--report options can be used to override these. * Tools that operated on one sample at a time (bgcorrect, seqconvert, stuttermark) now support batch processing. The new -i/--input argument takes a list of files. In batch mode, the -o/--output argument can be used to specify a list of corresponding output files (which must be the same length). It is also possible to specify a format string to automatically generate file names. -o/--output defaults to "\1-\2.out" which is automatically expanded to "sampletag-toolname.out". The old positional arguments [IN] and [OUT] are maintained and allow for conveniently running the tools on a single sample file. [IN] is mutually exclusive with -i/--input and [OUT] is mutually exclusive with -o/--output. [OUT] now also accepts the filename format, but when not in batch mode, it still defaults to stdout. Note that by default, the sample tag is extracted from the input filenames by simply stripping the extension. This means a minimal batch processing command like "fdstools stuttermark -i *.csv" automatically creates a "...-stuttermark.out" file next to each CSV file in the current working directory. * Libconvert now also supports only specifying an output file. This makes it easier to write the default FDSTools library to a new file. E.g., "fdstools libconvert mynewfile.txt" now creates "mynewfile.txt" if it does not exist, and writes the default library to it. Most helpful.
-
- 06 Aug, 2015 1 commit
-
-
jhoogenboom authored
* All tools now have a longer description in the tool-specific help page. * Arguments are now presented in groups and the order is the same across tools. Furthermore: * Fixed bug that rendered BGHomStats and BGEstimate with the -H option useless. * The report of Allelefinder and BGEstimate is now written to sys.stderr by default. This means the report is now always generated (but it may be sent directly to /dev/null explicitly by the user). The big plus is that the progress of the tools is visible in the terminal when the tools are run by hand.
-
- 05 Aug, 2015 1 commit
-
-
jhoogenboom authored
-
- 04 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool Blame can be used to find particularly dirty samples and to construct a DNA profile of the contaminator. * Fixed bug BGCorrect that resulted in incorrect values in the *_add columns. * BGEstimate and BGHomStats no longer crash if a library file is provided. * SeqConvert can now use a different library file for the output, thereby offering some possibilities to update allele names when a library file gets updated. * Replaced various uses of map() by generator expressions and listcomps for increased readability speed (although slightly).
-
- 03 Aug, 2015 1 commit
-
-
jhoogenboom authored
* New tool BGHomStats computes statistics (minimum, maximum, mean, and sample variance) of noise ratios in homozygous samples. * The default BGEstimate output format has been changed to be compatible with that of BGHomStats. The cross-tabular output format is still available as an option because it easily uses 90% less disk space. BGCorrect (and other future tools that use noise profiles) will work with both formats. * Fixed bug in the --min-samples option of BGEstimate that could cause some alleles with less than the specified number of samples to be included if --drop-samples is used at the same time. * The user now receives an error message if there are unknown arguments. The error message lists the usage string of the requested tool. (Argparse's default was to print the general FDSTools usage string, which is not helpful.)
-
- 31 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Unknown arguments are now silently ignored. If this results in the tool not being able to run, the usage information of the tool is printed instead of the general fdstools usage. * Seqconvert no longer crashes on an empty line in the input. * Libconvert now maintains the order of prefix/suffix sequences. * Allele names with aliases other than 'X' or 'Y' are now correctly recognised. These were previously rejected as 'unknown format'. * Fixed bug where a prefix/suffix other than the first listed in the library file was sometimes used as the canonical sequence. * Sequence format conversion from raw to TSSV-style sequences now attempts to match the prefix, suffix, and STR pattern to non-matching sequences on a best effort basis. This is especially useful when converting to allelenames (which is done via TSSV-style sequences), since it results in an allele name that matches more closely the names of other alleles. * Generating allele names for sequences that lack a prefix and/or suffix is now supported (by adding a variant description that deletes the entire prefix/suffix).
-
- 30 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Added BGCorrect tool for filtering noise in case samples. * BGEstimate now writes its output in tab-separated format, instead of JSON. * Small changes to help output formatting.
-
- 29 Jul, 2015 1 commit
-
-
jhoogenboom authored
I could write about all its features here, but instead I will point out some future plans to highlight the things that are possibly not optimal in their current implementation. There are a number of things I plan to change in the future: * The output format is currently JSON, perhaps a carefully designed tabular format is a better choice. The benefit of switching to a tabluar format is that the data can be loaded into e.g. Excel as well. * The profiles are currently produced separately for forward and reverse reads. I would prefer to integrate these into a single computation that estimates allele balance in the heterozygotes using both strands as well. * I would like to add information about strand bias of the alleles as well. The most straightforward way to do this is to set only the forward reads of the true allele to 100 and treat the reverse reads the same as all background products. You will then obtain a number of reverse reads observed for ever 100 forward reads of the true allele. * I think it would be appropriate to make sure the values in the allele balance matrices of each sample ('Ax' in the source code) should add up to 1. For homozygotes, it is currently a scalar 1, the sum of the elements tend to be more than 1. This means that a heterozygous sample has a stronger influence on the profiles than a homozygous sample.
-
- 27 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Allelefinder can now combine data from multiple files into a single sample (this happens when the same sample tag was extracted from their names). * Allelefinder can now automatically convert sequences to a given format (this is optional though). This is particularly useful when combining the knownalleles.csv and newalleles.csv files of a sample. (Note that allelefinder still assumes that the files contain different alleles; no attempt is made to check whether the same allele was represented in multiple files.)
-
- 24 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Fixed crash when attempting to read a TSSV library from sys.stdin. * Various large updates to allelefinder. * libconvert now gives a useful default FDSTools library when given no input.
-
- 23 Jul, 2015 1 commit
-
-
jhoogenboom authored
* Introducing a new, extended library file format to support allele name generation. The new libconvert tool can convert TSSV libraries to the new format and vice versa. * Added functions for converting between raw sequences, TSSV-style sequences, and allele names. * Added global -d/--debug option. Stuttermark updates: * Stuttermark now automatically converts input sequences to TSSV-style if a library is provided. * Stuttermark will no longer crash if there is no 'name' column. Instead, all sequences are taken to belong to the same marker. New tools: * libconvert converts between FDSTools and TSSV library formats. * seqconvert converts between raw sequences, TSSV-style sequences, and allele names. * allelefinder detects the true alleles in reference samples.
-
- 02 Jul, 2015 2 commits
-
-
jhoogenboom authored
-
jhoogenboom authored
FDSTools v0.0.1 with Stuttermark v1.3. Other tools will come later.
-