Overlapped paired-end data support
Added support for mixed-case sequences. TSSV requires at least one base in both flanking sequences to be upper case. The remainder of the read is case insensitive, including the STR region (it is regex-matched in upper case). This addition is useful when using paired-end read overlapping tools that write the overlapped/merged reads in mixed-case, where the overlapped region is in upper case and the overhang is in lower case. Because overlapping STR reads may cause repeats to go missing, this change in TSSV causes such truncated sequences to be filtered out.
Since TSSV was originally completely case sensitive, this change will not affect the output if the input was all upper case. TSSV output for mixed-case and lower case input may have changed due to the new interpretation, however.
Other minor changes included:
- Add
-q
option that enables readingFASTQ
files - Collapse repeated elements like
AGAT(7)AGAT(6)
toAGAT(13)