...
 
Commits (11)
---
name: Bug report
about: Create a report to help us improve
---
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behaviour:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. Ubuntu Desktop 18.04]
- Version [e.g. 0.0.14]
**Additional context**
Add any other context about the problem here.
---
name: Feature request
about: Suggest an idea for this project
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always
frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've
considered.
**Additional context**
Add any other context or screenshots about the feature request here.
# Submit a pull request
Thank you for submitting a pull request. To speed up the review process, please
ensure that everything below is true:
1. This is not a duplicate of an [existing pull request][1].
2. No existing features have been broken without good reason.
3. Your commit messages are detailed
4. The code style [guidelines][2] have been followed.
5. Documentation has been updated to reflect your changes.
6. Tests have been added or updated to reflect your changes.
7. All tests pass.
Any questions should be directed to @jfjlaros.
---
Replace any ":question:" below with information about your pull request.
## Pull Request Details
Provide details about your pull request and what it adds, fixes, or changes.
:question:
## Breaking Changes
Describe what features are broken by this pull request and why, if any.
:question:
## Issues Fixed
Enter the issue numbers resolved by this pull request below, if any.
1. :question:
## Other Relevant Information
Provide any other important details below.
:question:
[1]: https://github.com/jfjlaros/dict-trie/pulls
[2]: https://github.com/jfjlaros/dict-trie/blob/master/docs/CONTRIBUTING.md#code-style
language: python
python:
- "2.7"
- "3.5"
install: pip install tox-travis
script: tox
Trie implementation using nested dictionaries
=============================================
.. image:: https://travis-ci.org/jfjlaros/dict-trie.svg?branch=master
:target: https://travis-ci.org/jfjlaros/dict-trie
.. image:: https://readthedocs.org/projects/dict-trie/badge/?version=latest
:target: https://dict-trie.readthedocs.io/en/latest
.. image:: https://img.shields.io/pypi/v/dict-trie.svg
:target: https://pypi.org/project/dict-trie/
.. image:: https://img.shields.io/pypi/l/dict-trie.svg
:target: https://raw.githubusercontent.com/jfjlaros/dict-trie/master/LICENSE.md
----
This library provides a trie_ implementation using nested dictionaries. Apart
from the basic operations, a number of functions for *approximate matching* are
implemented.
Please see ReadTheDocs_ for the latest documentation.
.. _trie: https://en.wikipedia.org/wiki/Trie
.. _ReadTheDocs: https://dict-trie.readthedocs.io/en/latest/index.html
......@@ -13,7 +13,7 @@ __version_info__ = ('0', '0', '3')
__version__ = '.'.join(__version_info__)
__author__ = 'Jeroen F.J. Laros'
__contact__ = 'J.F.J.Laros@lumc.nl'
__homepage__ = 'https://github.com/jfjlaros/dict-trie.git'
__homepage__ = 'http://dict-trie.readthedocs.io/en/latest/'
usage = __doc__.split('\n\n\n')
......
from itertools import imap
class iMap(imap):
def __next__(cls):
return cls.next()
map = iMap
......@@ -2,7 +2,7 @@ import sys
if sys.version_info.major < 3:
from .compatibility import map
from itertools import imap as map
def _add(root, word, count):
......@@ -240,7 +240,7 @@ class Trie(object):
def hamming(self, word, distance):
try:
return self.all_hamming(word, distance).__next__()
return next(self.all_hamming(word, distance))
except StopIteration:
return ''
......@@ -273,7 +273,7 @@ class Trie(object):
def levenshtein(self, word, distance):
try:
return self.all_levenshtein(word, distance).__next__()
return next(self.all_levenshtein(word, distance))
except StopIteration:
return ''
......
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone.
## Our Standards
Examples of behaviour that contributes to creating a positive environment
include:
- Using welcoming and inclusive language.
- Being respectful of differing viewpoints and experiences.
- Gracefully accepting constructive criticism.
- Focusing on what is best for the community.
- Showing empathy towards other community members.
Examples of unacceptable behaviour by participants include:
- The use of sexualized language or imagery and unwelcome sexual attention or
advances.
- Trolling, insulting/derogatory comments, and personal or political attacks.
- Public or private harassment.
- Publishing others' private information, such as a physical or electronic
address, without explicit permission.
- Other conduct which could reasonably be considered inappropriate in a
professional setting.
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behaviour and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behaviour.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviour that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an
appointed representative at an online or offline event. Representation of a
project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behaviour may be
reported by contacting the project team at mailto:j.f.j.laros@lumc.nl. The
project team will review and investigate all complaints, and will respond in a
way that it deems appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an
incident. Further details of specific enforcement policies may be posted
separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 1.4, available at
[http://contributor-covenant.org/version/1/4][version]
[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/
# Contributing
Please follow these guidelines if you would like to contribute to the project.
---
## Table of Contents
Please read through these guidelines before you get started:
1. [Questions & Concerns](#questions--concerns)
2. [Issues & Bugs](#issues--bugs)
3. [Feature Requests](#feature-requests)
4. [Submitting Pull Requests](#submitting-pull-requests)
5. [Code Style](#code-style)
## Questions & Concerns
If you have any questions about using or developing for this project, reach out
to @jfjlaros or send an [email][email].
## Issues & Bugs
Submit an [issue][issues] or [pull request][compare] with a fix if you find any
bugs in the project. See [below](#submitting-pull-requests) for instructions on
sending in pull requests, and be sure to reference the [code style
guide](#code-style) first!
When submitting an issue or pull request, make sure you are as detailed as
possible and fill in all answers to questions asked in the templates. For
example, an issue that simply states "X/Y/Z is not working!" will be closed.
## Feature Requests
Submit an [issue][issues] to request a new feature. Features fall into one of
two categories:
1. **Major**: Major changes should be discussed with me via [email][email]. I am
always open to suggestions and will get back to you as soon as I can!
2. **Minor**: A minor feature can simply be added via a [pull request][compare].
## Submitting Pull Requests
Before you do anything, make sure you check the current list of [pull
requests][pull] to ensure you are not duplicating anyone's work. Then, do the
following:
1. Fork the repository and make your changes in a git branch: `git checkout -b
my-branch base-branch`
2. Read and follow the [code style guidelines](#code-style).
3. Make sure your feature or fix does not break the project! Test thoroughly.
4. Commit your changes, and be sure to leave a detailed commit message.
5. Push your branch to your forked repo on GitHub: `git push origin my-branch`
6. [Submit a pull request][compare] and hold tight!
7. If any changes are requested by the project maintainers, make them and
follow this process again until the changes are merged in.
## Code Style
Please follow the coding style conventions detailed below:
- For Python: [PEP 8 - Style Guide for Python Code][pep8].
[email]: mailto:j.f.j.laros@lumc.nl
[issues]: https://github.com/jfjlaros/dict-trie/issues/new
[compare]: https://github.com/jfjlaros/dict-trie/compare
[pull]: https://github.com/jfjlaros/dict-trie/pulls
[pep8]: https://www.python.org/dev/peps/pep-0008/
Contributors
============
- Jeroen F.J. Laros <J.F.J.Laros@lumc.nl> (Original author, maintainer)
Find out who contributed:
::
git shortlog -s -e
.. dict-trie documentation.
.. include:: ../README.rst
.. toctree::
:maxdepth: 2
:caption: Contents:
installation
usage
credits
Installation
============
The software is distributed via PyPI_, it can be installed with ``pip``:
::
pip install dict-trie
From source
-----------
The source is hosted on GitHub_, to install the latest development version, use
the following commands.
::
git clone https://github.com/jfjlaros/dict-trie.git
cd dict-trie
pip install .
.. _PyPI: https://pypi.org/project/dict-trie
.. _GitHub: https://github.com/jfjlaros/dict-trie.git
# Trie implementation using nested dictionaries
This library provides a [trie](https://en.wikipedia.org/wiki/Trie)
implementation using nested dictionaries. Apart from the basic operations, a
number of functions for *approximate matching* are implemented.
Usage
=====
The library provides the ``Trie`` class.
## Installation
Via [pypi](https://pypi.python.org/pypi/dict-trie):
.. code:: python
pip install dict-trie
>>> from dict_trie import Trie
From source:
git clone https://github.com/jfjlaros/dict-trie.git
cd dict-trie
pip install .
Basic operations
----------------
## Usage
The library provides the `Trie` class.
### Basic operations
Initialisation of the trie is done via the constructor by providing a list of
words.
```python
>>> from dict_trie import Trie
>>>
>>> trie = Trie(['abc', 'te', 'test'])
```
.. code:: python
>>> trie = Trie(['abc', 'te', 'test'])
Alternatively, an empty trie can be made to which words can be added with the
`add` function.
```python
>>> trie = Trie()
>>> trie.add('abc')
>>> trie.add('te')
>>> trie.add('test')
```
Membership can be tested with the `in` statement.
```python
>>> 'abc' in trie
True
```
Test whether a prefix is present by using the `has_prefix` function.
```python
>>> trie.has_prefix('ab')
True
```
Remove a word from the trie with the `remove` function. This function returns
`False` if the word was not in the trie.
```python
>>> trie.remove('abc')
True
>>> 'abc' in trie
False
>>> trie.remove('abc')
False
```
``add`` function.
.. code:: python
>>> trie = Trie()
>>> trie.add('abc')
>>> trie.add('te')
>>> trie.add('test')
Membership can be tested with the ``in`` statement.
.. code:: python
>>> 'abc' in trie
True
Test whether a prefix is present by using the ``has_prefix`` function.
.. code:: python
>>> trie.has_prefix('ab')
True
Remove a word from the trie with the ``remove`` function. This function returns
``False`` if the word was not in the trie.
.. code:: python
>>> trie.remove('abc')
True
>>> 'abc' in trie
False
>>> trie.remove('abc')
False
Iterate over all words in a trie.
```python
>>> list(trie)
['abc', 'te', 'test']
```
### Approximate matching
.. code:: python
>>> list(trie)
['abc', 'te', 'test']
Approximate matching
--------------------
A trie can be used to efficiently find a word that is similar to a query word.
This is implemented via a number of functions that search for a word, allowing
a given number of mismatches. These functions are divided in two families, one
......@@ -74,78 +72,95 @@ using the Hamming distance which only allows substitutions, the other using the
Levenshtein distance which allows substitutions, insertions and deletions.
To find a word that has at most Hamming distance 2 to the word 'abe', the
`hamming` function is used.
```python
>>> trie = Trie(['abc', 'aaa', 'ccc'])
>>> trie.hamming('abe', 2)
'aaa'
```
``hamming`` function is used.
.. code:: python
>>> trie = Trie(['abc', 'aaa', 'ccc'])
>>> trie.hamming('abe', 2)
'aaa'
To get all words that have at most Hamming distance 2 to the word 'abe', the
`all_hamming` function is used. This function returns a generator.
```python
>>> list(trie.all_hamming('abe', 2))
['aaa', 'abc']
```
``all_hamming`` function is used. This function returns a generator.
.. code:: python
>>> list(trie.all_hamming('abe', 2))
['aaa', 'abc']
In order to find a word that is closest to the query word, the `best_hamming`
In order to find a word that is closest to the query word, the ``best_hamming``
function is used. In this case a word with distance 1 is returned.
```python
>>> trie.best_hamming('abe', 2)
'abc'
```
The functions `levenshtein`, `all_levenshtein` and `best_levenshtein` are used
in a similar way.
.. code:: python
>>> trie.best_hamming('abe', 2)
'abc'
The functions ``levenshtein``, ``all_levenshtein`` and ``best_levenshtein`` are
used in a similar way.
Other functionalities
---------------------
### Other functionalities
A trie can be populated with all words of a fixed length over an alphabet by
using the `fill` function.
```python
>>> trie = Trie()
>>> trie.fill(('a', 'b'), 2)
>>> list(trie)
['aa', 'ab', 'ba', 'bb']
```
The trie data structure can be accessed via the `root` member variable.
```python
>>> trie.root
{'a': {'a': {'': 1}, 'b': {'': 1}}, 'b': {'a': {'': 1}, 'b': {'': 1}}}
>>> trie.root.keys()
['a', 'b']
```
The distance functions `all_hamming` and `all_levenshtein` also have
using the ``fill`` function.
.. code:: python
>>> trie = Trie()
>>> trie.fill(('a', 'b'), 2)
>>> list(trie)
['aa', 'ab', 'ba', 'bb']
The trie data structure can be accessed via the ``root`` member variable.
.. code:: python
>>> trie.root
{'a': {'a': {'': 1}, 'b': {'': 1}}, 'b': {'a': {'': 1}, 'b': {'': 1}}}
>>> trie.root.keys()
['a', 'b']
The distance functions ``all_hamming`` and ``all_levenshtein`` also have
counterparts that give the developer more information by returning a list of
tuples containing not only the matched word, but also its distance to the query
string and a [CIGAR](https://samtools.github.io/hts-specs/SAMv1.pdf)-like
string.
string and a CIGAR_-like string.
The following encoding is used in the CIGAR-like string:
character | meaning
--: | :--
= | match
X | mismatch
I | insertion
D | deletion
+-------------+---------------+
| character | description |
+-------------+---------------+
| = | match |
+-------------+---------------+
| X | mismatch |
+-------------+---------------+
| I | insertion |
+-------------+---------------+
| D | deletion |
+-------------+---------------+
In the following example, we search for all words with Hamming distance 1 to
the word 'acc'. In the results we see a match with the word 'abc' having
distance 1 and a mismatch at position 2.
```python
>>> trie = Trie(['abc'])
>>> list(trie.all_hamming_('acc', 1))
[('abc', 1, '=X=')]
```
.. code:: python
>>> trie = Trie(['abc'])
>>> list(trie.all_hamming_('acc', 1))
[('abc', 1, '=X=')]
Similarly, we can search for all words having Levenshtein distance 2 to the
word 'acb'. The word 'abc' matches three times, once by deleting the 'b' on
position 2 and inserting a 'b' after position 3, once by inserting a 'c' after
position 1 and deleting the last character and once by introducing two
mismatches.
```python
>>> list(trie.all_levenshtein_('acb', 2))
[('abc', 2, '=D=I'), ('abc', 2, '=XX'), ('abc', 2, '=I=D')]
```
.. code:: python
>>> list(trie.all_levenshtein_('acb', 2))
[('abc', 2, '=D=I'), ('abc', 2, '=XX'), ('abc', 2, '=I=D')]
.. _CIGAR: https://samtools.github.io/hts-specs/SAMv1.pdf
[tox]
envlist = py27,py35
[testenv]
deps = pytest
commands = py.test