Unverified Commit 1b3009aa authored by Jeroen F.J. Laros's avatar Jeroen F.J. Laros Committed by GitHub
Browse files

Merge pull request #2 from mutalyzer/py3

Py3
parents 81a1fbad 86c1eb18
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone.
## Our Standards
Examples of behaviour that contributes to creating a positive environment
include:
- Using welcoming and inclusive language.
- Being respectful of differing viewpoints and experiences.
- Gracefully accepting constructive criticism.
- Focusing on what is best for the community.
- Showing empathy towards other community members.
Examples of unacceptable behaviour by participants include:
- The use of sexualized language or imagery and unwelcome sexual attention or
advances.
- Trolling, insulting/derogatory comments, and personal or political attacks.
- Public or private harassment.
- Publishing others' private information, such as a physical or electronic
address, without explicit permission.
- Other conduct which could reasonably be considered inappropriate in a
professional setting.
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behaviour and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behaviour.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviour that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an
appointed representative at an online or offline event. Representation of a
project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behaviour may be
reported by contacting the project team at mailto:info@mutalyzer.nl. The
project team will review and investigate all complaints, and will respond in a
way that it deems appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an
incident. Further details of specific enforcement policies may be posted
separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 1.4, available at
[http://contributor-covenant.org/version/1/4][version]
[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/
# Contributing
Please follow these guidelines if you would like to contribute to the project.
---
## Table of Contents
Please read through these guidelines before you get started:
1. [Questions & Concerns](#questions--concerns)
2. [Issues & Bugs](#issues--bugs)
3. [Feature Requests](#feature-requests)
4. [Submitting Pull Requests](#submitting-pull-requests)
5. [Code Style](#code-style)
## Questions & Concerns
If you have any questions about using or developing for this project, reach out
to @mutalyzer or send an [email][email].
## Issues & Bugs
Submit an [issue][issues] or [pull request][compare] with a fix if you find any
bugs in the project. See [below](#submitting-pull-requests) for instructions on
sending in pull requests, and be sure to reference the [code style
guide](#code-style) first!
When submitting an issue or pull request, make sure you are as detailed as
possible and fill in all answers to questions asked in the templates. For
example, an issue that simply states "X/Y/Z is not working!" will be closed.
## Feature Requests
Submit an [issue][issues] to request a new feature. Features fall into one of
two categories:
1. **Major**: Major changes should be discussed with me via [email][email]. I am
always open to suggestions and will get back to you as soon as I can!
2. **Minor**: A minor feature can simply be added via a [pull request][compare].
## Submitting Pull Requests
Before you do anything, make sure you check the current list of [pull
requests][pull] to ensure you are not duplicating anyone's work. Then, do the
following:
1. Fork the repository and make your changes in a git branch: `git checkout -b
my-branch base-branch`
2. Read and follow the [code style guidelines](#code-style).
3. Make sure your feature or fix does not break the project! Test thoroughly.
4. Commit your changes, and be sure to leave a detailed commit message.
5. Push your branch to your forked repo on GitHub: `git push origin my-branch`
6. [Submit a pull request][compare] and hold tight!
7. If any changes are requested by the project maintainers, make them and
follow this process again until the changes are merged in.
## Code Style
Please follow the coding style conventions detailed below:
- For Python: [PEP 8 - Style Guide for Python Code][pep8].
[email]: mailto:info@mutalyzer.nl
[issues]: https://github.com/mutalyzer/backtranslate/issues/new
[compare]: https://github.com/mutalyzer/backtranslate/compare
[pull]: https://github.com/mutalyzer/backtranslate/pulls
[pep8]: https://www.python.org/dev/peps/pep-0008/
---
name: Bug report
about: Create a report to help us improve
---
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behaviour:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. Ubuntu Desktop 18.04]
- Version [e.g. 0.0.14]
**Additional context**
Add any other context about the problem here.
---
name: Feature request
about: Suggest an idea for this project
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always
frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've
considered.
**Additional context**
Add any other context or screenshots about the feature request here.
# Submit a pull request
Thank you for submitting a pull request. To speed up the review process, please
ensure that everything below is true:
1. This is not a duplicate of an [existing pull request][1].
2. No existing features have been broken without good reason.
3. Your commit messages are detailed
4. The code style [guidelines][2] have been followed.
5. Documentation has been updated to reflect your changes.
6. Tests have been added or updated to reflect your changes.
7. All tests pass.
Any questions should be directed to @mutalyzer.
---
Replace any ":question:" below with information about your pull request.
## Pull Request Details
Provide details about your pull request and what it adds, fixes, or changes.
:question:
## Breaking Changes
Describe what features are broken by this pull request and why, if any.
:question:
## Issues Fixed
Enter the issue numbers resolved by this pull request below, if any.
1. :question:
## Other Relevant Information
Provide any other important details below.
:question:
[1]: https://github.com/mutalyzer/backtranslate/pulls
[2]: https://github.com/mutalyzer/backtranslate/blob/master/docs/CONTRIBUTING.md#code-style
*.pyc
*.egg-info
build
dist
.ipynb_checkpoints
*.pyc
.cache/
.tox/
build/
dist/
examples/.ipynb_checkpoints/
# Validate this file using http://lint.travis-ci.org/
language: python
sudo: false
python:
- "2.7"
- "3.3"
- "3.4"
- "3.5"
- "nightly"
- "pypy"
- "pypy3"
install: python setup.py install
script: py.test
- 3.5
- 3.6
install: pip install . fake-open tox-travis
script: tox
Copyright (c) 2015-2019 by LUMC, Jeroen F.J. Laros.
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# Back translation
This library provides functions for back translation from amino acids to
nucleotides.
```python
>>> from __future__ import unicode_literals
>>>
>>> from backtranslate.backtranslate import BackTranslate
>>>
>>> # Create a class instance, optionally giving the translation table id.
>>> bt = BackTranslate()
>>>
>>> # Find all substitutions that transform the codon 'TGG' into a stop codon.
>>> bt.with_dna('TGG', '*')
{1: set([('G', 'A')]), 2: set([('G', 'A')])}
```
Sometimes we do not have access to the DNA sequence so we have to find
possible substitutions from the amino acids directly.
```python
>>> # Find all substitutions that transform a Tryptophan into a stop codon.
>>> bt.without_dna('W', '*')
{1: set([('G', 'A')]), 2: set([('G', 'A')])}
```
To find out which substitution predictions can be improved by adding codon
information, use the following function.
```python
>>> bt.improvable()
set([('I', 'L'), ('R', 'W'), ('Q', 'H'), ('C', '*'), ('*', 'W'), ('K', 'N'),
('C', 'W'), ('S', 'R'), ('L', 'I'), ('*', 'S'), ('S', '*'), ('L', '*'),
('L', 'M'), ('L', 'F'), ('*', 'L'), ('D', 'E'), ('R', 'G'), ('S', 'C'),
('E', 'D'), ('R', 'S'), ('N', 'K'), ('H', 'Q'), ('S', 'T'), ('T', 'S'),
('G', 'R'), ('L', 'V'), ('I', 'M'), ('F', 'L'), ('*', 'Y'), ('Y', '*'),
('V', 'L'), ('R', '*')])
```
To get substitutions in a readable format, we can use the following:
```python
>>> from backtranslate.util import subst_to_cds
>>>
>>> substitutions = bt.without_dna('W', '*')
>>>
>>> # Transform the substitutions to CDS coordinates.
>>> subst_to_cds(substitutions, 12)
set([(15, 'G', 'A'), (14, 'G', 'A')])
```
## Command line interface
Use the command `backtranslate` to find substitutions that explain an amino
acid change:
```bash
$ backtranslate with_dna -o 210 data/mhv.fa - 1 Leu
1 A C
1 A T
```
If no reference is available, use the `without_dna` subcommand:
```bash
$ backtranslate without_dna - Asp 92 Tyr
274 G T
```
The command `find_stops` finds a list of positions and substitutions that lead
to stop codons. This list of destructive substitutions are useful when
analysing a pool of viral transcripts. Counting the appropriate nucleotides at
the given positions gives insight into how many transcripts are active.
```bash
$ backtranslate find_stops -o 210 data/mhv.fa -
216 A T
225 A T
230 C A
230 C G
243 A T
...
```
Back translation
================
.. image:: https://img.shields.io/github/last-commit/mutalyzer/backtranslate.svg :target: https://github.com/mutalyzer/backtranslate/graphs/commit-activity
.. image:: https://travis-ci.org/mutalyzer/backtranslate.svg?branch=master
:target: https://travis-ci.org/mutalyzer/backtranslate
.. image:: https://readthedocs.org/projects/simplerpc/badge/?version=latest
:target: https://backtranslate.readthedocs.io/en/latest
.. image:: https://img.shields.io/github/release-date/mutalyzer/backtranslate.svg
:target: https://github.com/mutalyzer/backtranslate/releases
.. image:: https://img.shields.io/github/release/mutalyzer/backtranslate.svg
:target: https://github.com/mutalyzer/backtranslate/releases
.. image:: https://img.shields.io/pypi/v/backtranslate.svg
:target: https://pypi.org/project/backtranslate/
.. image:: https://img.shields.io/github/languages/code-size/mutalyzer/backtranslate.svg
:target: https://github.com/mutalyzer/backtranslate
.. image:: https://img.shields.io/github/languages/count/mutalyzer/backtranslate.svg
:target: https://github.com/mutalyzer/backtranslate
.. image:: https://img.shields.io/github/languages/top/mutalyzer/backtranslate.svg
:target: https://github.com/mutalyzer/backtranslate
.. image:: https://img.shields.io/github/license/mutalyzer/backtranslate.svg
:target: https://raw.githubusercontent.com/mutalyzer/backtranslate/master/LICENSE.md
----
This library provides functions for back translation from amino acids to
nucleotides.
Please see ReadTheDocs_ for the latest documentation.
.. _ReadTheDocs: https://backtranslate.readthedocs.io
"""
backtranslate: Functions for reverse translation.
from os.path import dirname, abspath
from configparser import ConfigParser
Copyright (c) 2015 Leiden University Medical Center <humgen@lumc.nl>
Copyright (c) 2015 Jeroen F.J. Laros <j.f.j.laros@lumc.nl>
from .backtranslate import BackTranslate
Licensed under the MIT license, see the LICENSE file.
"""
config = ConfigParser()
with open('{}/setup.cfg'.format(dirname(abspath(__file__)))) as handle:
config.read_file(handle)
from __future__ import (
absolute_import, division, print_function, unicode_literals)
from future.builtins import str, zip
_copyright_notice = 'Copyright (c) {} {} <{}>'.format(
config.get('metadata', 'copyright'),
config.get('metadata', 'author'),
config.get('metadata', 'author_email'))
__version_info__ = ('0', '0', '5')
__version__ = '.'.join(__version_info__)
__author__ = 'LUMC, Jeroen F.J. Laros'
__contact__ = 'J.F.J.Laros@lumc.nl'
__homepage__ = 'https://github.com/mutalyzer/backtranslate'
usage = __doc__.split("\n\n\n")
usage = [config.get('metadata', 'description'), _copyright_notice]
def doc_split(func):
return func.__doc__.split("\n\n")[0]
return func.__doc__.split('\n\n')[0]
def version(name):
return "%s version %s\n\nAuthor : %s <%s>\nHomepage : %s" % (name,
__version__, __author__, __contact__, __homepage__)
return '{} version {}\n\n{}\nHomepage: {}'.format(
config.get('metadata', 'name'),
config.get('metadata', 'version'),
_copyright_notice,
config.get('metadata', 'url'))
from __future__ import (
absolute_import, division, print_function, unicode_literals)
from future.builtins import str, zip
from collections import defaultdict
from Bio.Data import CodonTable
......@@ -9,8 +5,7 @@ from Levenshtein import hamming
def cmp_subst(subst_1, subst_2):
"""
Compare two substitution sets.
"""Compare two substitution sets.
:arg dict subst_1: Substitution set.
:arg dict subst_2: Substitution set.
......@@ -28,8 +23,7 @@ def cmp_subst(subst_1, subst_2):
def reverse_translation_table(table_id=1):
"""
Calculate a reverse translation table.
"""Calculate a reverse translation table.
:arg int table_id: Translation table id.
......@@ -47,20 +41,16 @@ def reverse_translation_table(table_id=1):
class BackTranslate(object):
"""
Back translation.
"""
"""Back translation."""
def __init__(self, table_id=1):
"""
Initialise the class.
"""Initialise the class.
:arg int table_id: Translation table id.
"""
self._back_table = reverse_translation_table(table_id)
def _one_subst(self, substitutions, reference_codon, amino_acid):
"""
Find single nucleotide substitutions that given a reference codon
"""Find single nucleotide substitutions that given a reference codon
explains an observed amino acid.
:arg defaultdict(set) substitutions: Set of single nucleotide
......@@ -76,8 +66,7 @@ class BackTranslate(object):
(reference_codon[position], codon[position]))
def with_dna(self, reference_codon, amino_acid):
"""
Find single nucleotide substitutions that given a reference codon
"""Find single nucleotide substitutions that given a reference codon
explains an observed amino acid.
:arg str reference_codon: Original codon.
......@@ -93,9 +82,8 @@ class BackTranslate(object):
return dict(substitutions)
def without_dna(self, reference_amino_acid, amino_acid):
"""
Find single nucleotide substitutions that given a reference amino acid
explains an observed amino acid.
"""Find single nucleotide substitutions that given a reference amino
acid explains an observed amino acid.
:arg str reference_amino_acid: Original amino acid.
:arg str amino_acid: Observed amino acid.
......@@ -111,9 +99,8 @@ class BackTranslate(object):
return dict(substitutions)
def improvable(self):
"""
Calculate all pairs of amino acid substututions that can be improved by
looking at the underlying codon.
"""Calculate all pairs of amino acid substututions that can be improved
by looking at the underlying codon.
:returns list: List of improvable substitutions.
"""
......
#!/usr/bin/env python
from __future__ import (
absolute_import, division, print_function, unicode_literals)
from future.builtins import str, zip
import argparse
import re
from argparse import ArgumentParser, FileType, RawDescriptionHelpFormatter
from re import findall
from Bio import SeqIO
from . import usage, version, doc_split
from .backtranslate import BackTranslate
from . import BackTranslate, usage, version, doc_split
from .util import protein_letters_3to1, subst_to_cds
def with_dna(input_handle, output_handle, offset, position, amino_acid):
"""
Get all variants that result in the observed amino acid change.
"""Get all variants that result in the observed amino acid change.
:arg stream input_handle: Open readable handle to a FASTA file.
:arg stream output_handle: Open writable handle to a file.
......@@ -28,18 +19,17 @@ def with_dna(input_handle, output_handle, offset, position, amino_acid):
:returns set: Variants that lead to the observed amino acid change.
"""
bt = BackTranslate()
reference = str(SeqIO.parse(input_handle, 'fasta').next().seq)
reference = str(next(SeqIO.parse(input_handle, 'fasta')).seq)
codon_pos = offset - 1 + (position - 1) * 3
codon = reference[codon_pos:codon_pos + 3]
substitutions = bt.with_dna(codon, protein_letters_3to1[amino_acid])
for subst in subst_to_cds(substitutions, (position - 1) * 3):
for subst in sorted(subst_to_cds(substitutions, (position - 1) * 3)):
output_handle.write('{}\t{}\t{}\n'.format(*subst))
def without_dna(output_handle, position, reference_amino_acid, amino_acid):
"""
Get all variants that result in the observed amino acid change without
"""Get all variants that result in the observed amino acid change without
making use of the transcript.
:arg stream output_handle: Open writable handle to a file.
......@@ -61,76 +51,87 @@ def without_dna(output_handle, position, reference_amino_acid, amino_acid):
output_handle.write(
'This substitution can be improved by using `with_dna`.\n')
for subst in subst_to_cds(substitutions, (position - 1) * 3):
for subst in sorted(subst_to_cds(substitutions, (position - 1) * 3)):
output_handle.write('{}\t{}\t{}\n'.format(*subst))
def find_stops(input_handle, output_handle, offset):
"""
Almost stop codon finder.
def find_stops(input_handle, output_handle, offset, compact):
"""Almost stop codon finder.
:arg stream input_handle: Open readable handle to a FASTA file.
:arg stream output_handle: Open writable handle to a file.
:arg int offset: Position of the CDS start in the reference sequence.
:arg bool compact: Output one line per position.
"""
bt = BackTranslate()
sequence = str(SeqIO.parse(input_handle, 'fasta').next().seq)
sequence = str(next(SeqIO.parse(input_handle, 'fasta')).seq)
for index, codon in enumerate(re.findall('...', sequence[offset - 1:])):
for index, codon in enumerate(findall('...', sequence[offset - 1:])):
stop_positions = bt.with_dna(codon, '*')
for position in stop_positions:
for subst in stop_positions[position]:
for position in sorted(stop_positions):
if not compact:
for subst in sorted(stop_positions[position]):
output_handle.write('{}\t{}\t{}\n'.format(
offset + (index * 3) + position, *subst))
else:
output_handle.write('{}\t{}\t{}\n'.format(
offset + (index * 3) + position, *subst))
offset + (index * 3) + position,
list(stop_positions[position])[0][0],
','.join(map(lambda x: x[1],
sorted(stop_positions[position])))))
def main():
"""
Main entry point.
"""
input_parser = argparse.ArgumentParser(add_help=False)
input_parser.add_argument('input_handle', metavar='INPUT',
type=argparse.FileType('r'), help='input file in FASTA format')
input_parser.add_argument('-o', dest='offset', type=int, default=1,
"""Main entry point."""
input_parser = ArgumentParser(add_help=False)
input_parser.add_argument(
'input_handle', metavar='INPUT', type=FileType('r'),