Commit 53880f20 authored by Jeroen F.J. Laros's avatar Jeroen F.J. Laros

More documentation.

parent bbb84ae2
......@@ -114,3 +114,37 @@ The trie data structure can be accessed via the `root` member variable.
>>> trie.root.keys()
['a', 'b']
```
The distance functions `all_hamming` and `all_levenshtein` also have
counterparts that give the developer more information by returning a list of
tuples containing not only the matched word, but also its distance to the query
string and a [CIGAR](https://samtools.github.io/hts-specs/SAMv1.pdf)-like
string.
The following encoding is used in the CIGAR-like string:
character | meaning
--: | :--
= | match
X | mismatch
I | insertion
D | deletion
In the following example, we search for all words with Hamming distance 1 to
the word 'acc'. In the results we see a match with the word 'abc' having
distance 1 and a mismatch at position 2.
```python
>>> trie = Trie(['abc'])
>>> list(trie.all_hamming_('acc', 1))
[('abc', 1, '=X=')]
```
Similarly, we can search for all words having Levenshtein distance 2 to the
word 'acb'. The word 'abc' matches three times, once by deleting the 'b' on
position 2 and inserting a 'b' after position 3, once by inserting a 'c' after
position 1 and deleting the last character and once by introducing two
mismatches.
```python
>>> list(trie.all_levenshtein_('acb', 2))
[('abc', 2, '=D=I'), ('abc', 2, '=XX'), ('abc', 2, '=I=D')]
```
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment