Skip to content
Snippets Groups Projects
Commit 2f33e62c authored by Vermaat's avatar Vermaat
Browse files

Move to Sphinx for developer documentation

This is quite a large commit, touching many things related to developer
documentation. It is all focussed on getting as much of this as possible
into the new Sphinx-based documentation.

Some highlights:

- Start Sphinx-based developer documentation, including fairly complete
  instructions for installation and configuration.
- Remove epydoc API docs.
- Rework some docstrings to conform to reStructuredText, so they can be
  used in the API docs generated by Sphinx.
- Move all of the top-level text files to reStructuredText so they can
  linked from the Sphinx-based docs and for consistency.
- Remove many obsolete things from the extras/ directory, including old
  installation scripts and migrations.

Many of the installation related documentation and scripts are removed
or adapted in light of the new automated deployment using Ansible.
parent 47d5633b
No related branches found
No related tags found
No related merge requests found
Showing
with 707 additions and 632 deletions
Copyright
=========
.. include:: ../LICENSE.rst
Authors
-------
.. include:: ../AUTHORS.rst
.. highlight:: none
.. _deploy:
Deploying Mutalyzer in production
=================================
The previous sections discussed managing a Mutalyzer installation with a focus
on a development environment. There are a number of additional things you will
want to consider when deploying Mutalyzer in a production environment, mainly
concerning security and performance.
Usually you'll at least want to use a well-performing WSGI application server
for the website and SOAP and HTTP/RPC+JSON webservices. There are many options
here, ranging from Apache's `mod_wsgi`_ to `uWSGI`_ to standalone WSGI
containers such as `Gunicorn`_.
Below we briefly describe our recommended setup for a production environment
using Gunicorn, nginx and Supervisor.
Configuration settings
----------------------
Todo: Link to the description of these configuration settings.
It is recommended to at least set the following configuration settings:
- DEBUG
- EMAIL
- CACHE_DIR
- SOAP_WSDL_URL
- JSON_ROOT_URL
WSGI application server: Gunicorn
---------------------------------
`Gunicorn`_ is a well-perfoming Python WSGI HTTP Server. Being a Python
application, it can be installed in the Mutalyzer virtual environment with
``pip install gunicorn``.
Many configuration settings are available for Gunicorn and we recommend to use
a configuration file per WSGI application. For example, the following
configuration can be stored in ``website.conf``:
.. code-block:: ini
workers = 4
max_requests = 1000
timeout = 600
bind = 'unix:/opt/mutalyzer/run/website.sock'
This will bind the Gunicorn server to a unix socket (which we can later use
from nginx) and run with 4 worker processes. To serve the Mutalyzer website
with this configuration, run the following::
$ gunicorn -c website.conf mutalyzer.entrypoints.website
This uses the WSGI application object exported by the
`mutalyzer.entrypoints.website` module. Likewise, the SOAP and HTTP/RPC+JSON
webservices have WSGI application objects exported by the
`mutalyzer.entrypoints.service_soap` and `mutalyzer.entrypoints.service_json`
modules.
Web server: nginx
-----------------
It is usually a good idea to use a separate webserver in front of the WSGI
application servers. We use `nginx`_ for this purpose and configure it to
server static files directly and act as a reverse proxy for the WSGI
applications.
For example, to serve the website from the root path and the HTTP/RPC+JSON
webservice from the ``/json`` path, an nginx configuration similar to the
following can be used:
.. code-block:: nginx
server {
listen 80;
server_name _;
client_max_body_size 2G;
keepalive_timeout 5;
location /static/ {
alias /opt/mutalyzer/static/;
expires 30d;
add_header Pragma public;
add_header Cache-Control "public";
}
location / {
root /usr/share/nginx/html;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_read_timeout 600;
proxy_pass http://website;
}
location /json {
root /usr/share/nginx/html;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_read_timeout 600;
proxy_pass http://service-json;
}
}
upstream website {
server unix:/opt/mutalyzer/run/website.sock fail_timeout=0;
}
upstream service-json {
server unix:/opt/mutalyzer/run/service-json.sock fail_timeout=0;
}
Process control: Supervisor
---------------------------
For managing the different WSGI application servers and Mutalyzer batch
processor, Supervisor can be used. Supervisor is usually started from the init
system and controls programs and program groups. For example, it can
automatically restart a program if it crashed for some reason.
The following is an example Supervisor configuration defining a Mutalyzer
group consisting of the batch processor and a Gunicorn process for the
website:
.. code-block:: ini
[group:mutalyzer]
programs=batch-processor,website
[program:batch-processor]
command=mutalyzer-batch-processor
autorestart=true
environment=MUTALYZER_SETTINGS="/opt/mutalyzer/conf/settings.py"
[program:website]
command=gunicorn -c /opt/mutalyzer/conf/website.conf mutalyzer.entrypoints.website
autorestart=true
environment=MUTALYZER_SETTINGS="/opt/mutalyzer/conf/settings.py"
Automated deployment with Ansible
---------------------------------
Deployments of complete production environments are often complex and
repetitive. Therefore, manual deployments are inefficient and
error-prone. Several systems exist to automate this, such as `Puppet`_,
`Chef`_, and `Ansible`_.
An automated deployment of Mutalyzer with Ansible is `available from the LUMC
GitLab <https://git.lumc.nl/mutalyzer/mutalyzer-deployment>`_. This includes
installation of the website, SOAP and HTTP/RPC+JSON webservices, and the batch
processor, similar to the setup described above.
.. _Ansible: http://www.ansible.com/
.. _Chef: http://www.getchef.com/
.. _Gunicorn: http://gunicorn.org/
.. _mod_wsgi: https://code.google.com/p/modwsgi/
.. _nginx: http://nginx.org/
.. _Puppet: http://puppetlabs.com/
.. _uWSGI: http://uwsgi-docs.readthedocs.org/
Application design
==================
Todo: Write this section and integrate the LaTeX technical reference.
.. highlight:: none
.. _development:
Development
===========
Development of Mutalyzer happens on GitLab:
https://git.lumc.nl/mutalyzer/mutalyzer
Contributing
------------
Contributions to Mutalyzer are very welcome! They can be feature requests, bug
reports, bug fixes, unit tests, documentation updates, or anything els you may
come up with.
Coding style
------------
In general, try to follow the `PEP 8`_ guidelines for Python code and `PEP
257`_ for docstrings.
Unit tests
----------
To run the unit tests with `nose`_, just run ``nosetests -v``.
Working with feature branches
-----------------------------
New features are best implemented in their own branches, isolating the work
from unrelated developments. In fact, it's good practice to *never work
directly on the master branch* but always in a separate branch. For this
reason, the master branch on the GitLab server is locked. Feature branches can
be merged back into master via a *merge request* in GitLab.
Before starting work on your feature, create a branch for it::
git checkout -b your-feature
Commit changes on this branch. If you're happy with it, push to GitLab::
git push origin your-feature -u
Now create a merge request to discuss the implementation with your
colleagues. This might involve adding additional commits which are included in
the merge request by pushing your branch again::
git commit
git push
You may also be asked to rebase your branch on the master branch if it has
changed since you started your work. This will require a forced push::
git fetch
git rebase origin/master
git push -f
If the work is done, a developer can merge your branch and close the merge
request. After the branch was merged you can safely delete it::
git branch -d your-feature
Versioning
----------
All version numbers for recent Mutalyzer releases take the form 2.0.beta-X
where X is incremented on release. Pre-release (or development) version
numbers take the form 2.0.beta-X.dev where 2.0.beta-X is the closest future
release version.
Note that we are planning a switch to `SemVer`_.
.. A normal version number takes the form X.Y.Z where X is the major version, Y
is the minor version, and Z is the patch version. Development versions take
the form X.Y.Z.dev where X.Y.Z is the closest future release version.
Note that this scheme is not 100% compatible with `SemVer`_ which would
require X.Y.Z-dev instead of X.Y.Z.dev but `compatibility with setuptools
<http://peak.telecommunity.com/DevCenter/setuptools#specifying-your-project-s-version>`_
is more important for us. Other than that, version semantics are as described
by SemVer.
Releases are `published at PyPI <https://pypi.python.org/pypi/wiggelen>`_ and
available from the GitHub git repository as tags.
Release procedure
^^^^^^^^^^^^^^^^^
Releasing a new version is done as follows:
1. Make sure the section in the ``CHANGES`` file for this release is
complete and there are no uncommitted changes.
.. note::
Commits since release 2.0.beta-X can be listed with ``git log
mutalyzer-2.0.beta-X..`` for quick inspection.
2. Update the ``CHANGES`` file to state the current date for this release
and edit ``mutalyzer/__init__.py`` by updating `__date__`, removing the
``dev`` value from `__version_info__` and setting `RELEASE` to `True`.
Commit and tag the version update::
git commit -am 'Bump version to 2.0.beta-X'
git tag -a 'mutalyzer-2.0.beta-X'
git push --tags
3. Add a new entry at the top of the ``CHANGES`` file like this::
Version 2.0.beta-Y
------------------
Release date to be decided.
Set `__version_info__` to a new version ending with ``dev`` and set
`RELEASE` to `True` in ``mutalyzer/__init__.py``. Commit these changes::
git commit -am 'Open development for 2.0.beta-Y'
.. _nose: https://nose.readthedocs.org/
.. _PEP 8: http://www.python.org/dev/peps/pep-0008/
.. _PEP 257: http://www.python.org/dev/peps/pep-0257/
.. _SemVer: http://semver.org/
Mutalyzer developer documentation
=================================
Mutalyzer is an HGVS variant nomenclature checker. The canonical Mutalyzer
installation can be found at [mutalyzer.nl](https://mutalyzer.nl).
This is the developer documentation for Mutalyzer. User documentation can be
found on the `wiki <https://humgenprojects.lumc.nl/trac/mutalyzer>`_.
Managing Mutalyzer
------------------
Information for getting Mutalyzer running on a system.
.. toctree::
:maxdepth: 2
install
config
run
upgrade
admin
deploy
Technical reference
-------------------
Application design and complete API reference.
.. toctree::
:maxdepth: 2
design
api
Additional notes
----------------
.. toctree::
:maxdepth: 2
development
todo
changelog
copyright
Indices and tables
------------------
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
.. highlight:: none
.. _install:
Installation
============
Mutalyzer depends on a database server, `Python`_ 2.7, and several Python
packages. `Redis`_ is a soft dependency. This section walks you through
installing Mutalyzer with Redis and using `PostgreSQL`_ as database server,
which is the recommended setup.
.. note:: All operating system specific instructions assume installation on a
`Debian`_ 7 *wheezy* system. You'll have to figure out the necessary
adjustements yourself if you're on another system.
The following steps will get Mutalyzer running on your system with the
recommended setup:
* :ref:`install-postgresql`
* :ref:`install-redis`
* :ref:`install-virtualenv`
* :ref:`install-setup`
At the bottom of this page some :ref:`alternative setups
<install-alternatives>` are documented.
.. _install-quick:
If you're in a hurry
--------------------
The impatient can run Mutalyzer without a database server and more such
nonsense with the following steps::
$ pip install -r requirements.txt
$ MUTALYZER_SETTINGS=/dev/null python -m mutalyzer.entrypoints.website
This starts the website frontend on the reported port using an in-memory
SQLite database.
.. _install-postgresql:
Database server: PostgreSQL
---------------------------
Install `PostgreSQL`_ and add a user for Mutalyzer. Create a database
(e.g. ``mutalyzer``) owned by the new user. For example::
$ sudo apt-get install postgresql
$ sudo -u postgres createuser --superuser $USER
$ createuser --pwprompt --encrypted --no-adduser --no-createdb --no-createrole mutalyzer
$ createdb --encoding=UNICODE --owner=mutalyzer mutalyzer
Also install some development libraries needed for building the ``psycopg2``
Python package later and add the package to the list of requirements::
$ sudo apt-get install libpq-dev
$ echo psycopg2 >> requirements.txt
This will make sure the Python PostgreSQL database adapter gets installed in
the :ref:`install-virtualenv` section.
.. seealso::
:ref:`install-mysql`
Alternatively, MySQL can be used as database server.
:ref:`install-sqlite`
Alternatively, SQLite can be used as database server.
`Dialects -- SQLAlchemy documentation <http://docs.sqlalchemy.org/en/latest/dialects/index.html>`_
In theory, any database supported by SQLAlchemy could work.
.. _install-redis:
Redis
-----
Mutalyzer uses Redis for non-critical fast storage such as statistics::
$ sudo apt-get install redis-server
.. note:: Redis is a soft dependency, meaning that Mutalyzer will run without
it (but may lack some non-essential features).
.. _install-virtualenv:
Python virtual environment
--------------------------
It is recommended to run Mutalyzer from a Python virtual environment, using
`virtualenv`_. Installing virtualenv and creating virtual environments is not
covered here.
Assuming you created and activated a virtual environment for Mutalyzer,
install all required Python packages::
$ sudo apt-get install python-dev libmysqlclient-dev
$ pip install -r requirements.txt
Now might be a good time to run the unit tests::
$ nosetests -v
If everything's okay, install Mutalyzer::
$ python setup.py install
.. seealso::
`virtualenv`_
``virtualenv`` is a tool to create isolated Python environments.
`virtualenvwrapper`_
``virtualenvwrapper`` is a set of extensions to the ``virtualenv``
tool. The extensions include wrappers for creating and deleting virtual
environments and otherwise managing your development workflow.
.. _install-setup:
Mutalyzer setup
---------------
Mutalyzer looks for its configuration in the file specified by the
``MUTALYZER_SETTINGS`` environment variable. First create the file with your
configuration settings, for example::
$ export MUTALYZER_SETTINGS=~/mutalyzer/settings.py
$ cat > $MUTALYZER_SETTINGS
REDIS_URI = 'redis://localhost'
DATABASE_URI = 'postgresql://mutalyzer:*****@localhost/mutalyzer'
A script is included to setup the database::
$ mutalyzer-admin setup-database --alembic-config migrations/alembic.ini
You can now proceed to :ref:`run`.
.. seealso::
:ref:`config`
For more information on the available configuration settings.
.. _install-alternatives:
Alternative setups
------------------
The remainder of this page documents some alternatives to the recommended
setup documented above.
.. _install-mysql:
Database server: MySQL
^^^^^^^^^^^^^^^^^^^^^^
Install `MySQL`_ and create a database (e.g. ``mutalyzer``) with all privileges
for the Mutalyzer user. For example::
$ sudo apt-get install mysql-server
$ mysql -h localhost -u root -p
> create database mutalyzer;
> grant all privileges on mutalyzer.* to mutalyzer@localhost identified by '*****';
The Python MySQL database adapter is a hard dependency regardless of your
choice of database server, so it'll get installed in the
:ref:`install-virtualenv` section.
In the :ref:`install-setup` section, make sure to use a MySQL database URI in
the Mutalyzer settings file, e.g.:
.. code-block:: python
DATABASE_URI = 'mysql://mutalyzer:*****@localhost/mutalyzer'
.. seealso::
:ref:`install-postgresql`
The recommended setup uses PostgreSQL as database server.
.. _install-sqlite:
Database server: SQLite
^^^^^^^^^^^^^^^^^^^^^^^
You probably already have all you need for using `SQLite`_, so this section
consists of zero steps.
Just note that in the :ref:`install-setup` section, you should use an SQLite
database URI in the Mutalyzer settings file, e.g.:
.. code-block:: python
DATABASE_URI = 'sqlite:////tmp/mutalyzer.db'
.. seealso::
:ref:`install-postgresql`
The recommended setup uses PostgreSQL as database server.
.. _Debian: http://www.debian.org/
.. _MySQL: http://www.mysql.com/
.. _PostgreSQL: http://www.postgresql.org/
.. _Python: http://python.org/
.. _Redis: http://redis.io/
.. _SQLite: http://www.sqlite.org/
.. _virtualenv: http://www.virtualenv.org/
.. _virtualenvwrapper: http://www.doughellmann.com/docs/virtualenvwrapper/
.. highlight:: none
.. _run:
Running Mutalyzer
=================
Mutalyzer comes with a number of different interfaces, of which the website is
perhaps the main one. It can be started using a built-in test server that's
useful for development and debugging purposes like this::
$ mutalyzer-website
* Running on http://127.0.0.1:5000/
You can now point your webbrowser to the URL that is printed and see the
welcoming Mutalyzer homepage.
Likewise, the SOAP and HTTP/RPC+JSON webservices can be started with the
``mutalyzer-service-json`` and ``mutalyzer-service-soap`` commands,
respectively.
For processing batch jobs, the batch processor must be running. This process
can be started from the command line and will keep running until it is stopped
by pressing Ctrl+C::
$ mutalyzer-batch-processor
^Cmutalyzer-batch-processor: Hitting Ctrl+C again will terminate any running job!
mutalyzer-batch-processor: Graceful shutdown
The built-in test servers won't get you far in production, though, and there
are many other possibilities for deploying Mutalyzer using WSGI. This topic is
discussed in :ref:`deploy`.
Todo list
=========
These are some general todo notes. More specific notes can be found by
grepping the source code for ``Todo``.
.. seealso::
`Mutalyzer Trac -- Active tickets <https://humgenprojects.lumc.nl/trac/mutalyzer/report/2>`_
Users can file tickets on the Mutalyzer Trac website.
`Mutalyzer GitLab -- Open issues <https://git.lumc.nl/mutalyzer/mutalyzer/issues>`_
Some issues are recorded in the Mutalyzer GitLab project.
- Improve the web interface design :)
- Test all uses of mkstemp().
- Use naming conventions for modules Crossmap, Db, File, GenRecord, Retriever
and Scheduler.
- Use standard logging module, with rotating functionality. Race conditions
on the log file are probably a problem in the current setup.
Instead of that rotating, we could also use logrotate:
http://serverfault.com/questions/55610/logrotate-and-open-files
- Setup continuous integration. Currently, I'm most impressed with Hudson.
- http://hudson-ci.org/
- http://www.rhonabwy.com/wp/2009/11/04/setting-up-a-python-ci-server-with-hudson/
Or perhaps Jenkins.
- http://jenkins-ci.org/
- Migrate Javascript to JQuery.
- I think in the long run, the Output object is not really the way to go. It
obscures the control flow. The logging part should use the standard logging
module. The data gathering by the Output object is probably better handled
by explicitely returning data objects from functions.
- Migrate from TAL to a more mondern and maintained Python template library,
for example jinja.
- Develop a large test suite.
- Create a web interface url to watch the progress of a batch job.
- Create web services for the batch jobs (steal ideas from Jeroen's DVD
web service).
- Use virtualenv?
- Use SQLAlchemy?
- Password for MySQL user.
- In deployment, remove old versions of Mutalyzer package?
- Check for os.path.join vulnerabilities.
- Use a standard solution for the database migrations in extras/migrations.
- Use something like Sphinx to generate development documentation from code.
- There are some problems with the batch architecture, especially that there
cannot be multiple workers without synchronisation problems.
Good read: http://news.ycombinator.com/item?id=3002861
Suggestion: http://celeryproject.org/
- Have a normal 404 page.
- Maintenance (and/or read-only) mode.
- Cleanup this document.
- Be more explicit in all the type of descriptions we don't currently support.
.. highlight:: none
.. _upgrade:
Upgrading
=========
Before upgrading Mutalyzer, stop any currently running instances. Then, update
your copy of the source code (using for example ``git pull`` on an existing
git clone).
Make sure to install any new requirements::
$ pip install -r requirements.txt
Now install the new version::
$ python setup.py install
Managing database migrations is done using `Alembic`_. This command will move
your database to the latest schema::
$ alembic -c migrations/alembic.ini upgrade head
.. _Alembic: http://alembic.readthedocs.org/
# Static files
Alias /mutalyzer/static /var/www/mutalyzer/static
<Directory /var/www/mutalyzer/static>
Order deny,allow
Allow from all
Options -Indexes
AllowOverride None
</Directory>
# Use daemon mode of mod_wsgi
WSGIDaemonProcess mutalyzer processes=2 threads=15 maximum-requests=10000
WSGIProcessGroup mutalyzer
# SOAP/1.1 web service
WSGIScriptAlias /mutalyzer/services <MUTALYZER_BIN_SOAP_SERVICE>
<Directory /mutalyzer/services>
Order deny,allow
Allow from all
Options -Indexes
</Directory>
# HTTP/RPC+JSON web service
WSGIScriptAlias /mutalyzer/json <MUTALYZER_BIN_JSON_SERVICE>
<Directory /mutalyzer/json>
Order deny,allow
Allow from all
Options -Indexes
</Directory>
# Website
WSGIScriptAlias /mutalyzer <MUTALYZER_BIN_WEBSITE>
<Directory /mutalyzer>
Order deny,allow
Allow from all
Options -Indexes
</Directory>
#
# Mutalyzer config file.
#
# Specify the location of this file in the MUTALYZER_SETTINGS environment
# variable.
#
# These settings are used by the Retriever module.
#
# Use this email address for retrieval of records at the NCBI.
email = "mutalyzer@humgen.nl"
# The cache directory.
cache = "/var/cache/mutalyzer"
# The maximum size of the cache in megabytes.
cachesize = 50
# The maximum size of a downloaded GenBank file in megabytes.
maxDldSize = 10
# The minimum size of a downloaded GenBank file in bytes.
minDldSize = 512
# The URL from where LRG files are fetched
lrgurl = "ftp://ftp.ebi.ac.uk/pub/databases/lrgex/"
#
# These settings are used by the Db module.
#
# Internal database.
internalDb = "mutalyzer"
# MySQL mapping database names.
dbNames = "hg18", "hg19", "mm10"
# Default mapping database.
defaultDb = "hg19"
# MySQL username for the local databases (internalDb and dbNames).
LocalMySQLuser = "mutalyzer"
# Host name for the local databases.
LocalMySQLhost = "localhost"
# Automatically reconnect to MySQL server using the MySQLdb reconnect option.
# Note that this may not always be available and if not will result in an
# error if used. Mutalyzer also implements its own reconnecting mechanism now
# which is always on. See Trac issue #91.
autoReconnect = no
# Number of days a cached transcript->protein link from the NCBI is valid.
proteinLinkLifetime = 30
# Number of days a cached nonexisting transcript->protein link from the NCBI
# is valid.
proteinLinkNoneLifetime = 5
#
# These settings are used by the Output module.
#
# Name and location of the log file.
log = "/var/log/mutalyzer.log"
# Prefix for each log message.
datestring = "%Y-%m-%d %H:%M:%S"
# Message levels:
#
# 0 : Debug ; Show all messages.
# 1 : Info ; Show all messages except debug messages.
# 2 : Warning ; Show warning, error and fatal messages.
# 3 : Error ; Show error and fatal messages.
# 4 : Fatal ; Only show fatal messages.
# 5 : Off ; Show nothing.
# Level of logged messages.
loglevel = 3
# Level of output messages.
outputlevel = 1
# Show debug info in the web interface.
debug = yes
#
# These settings are used by the Mutator module.
#
# Length of the flanking sequences (used in the visualisation of mutations).
flanksize = 25
# Maximum length of visualised mutations.
maxvissize = 25
# Length of the flanking sequences of the clipped mutations (see maxvissize).
flankclipsize = 6
#
# These settings are used by the Scheduler module.
#
# Return e-mail address.
mailFrom = "noreply@humgen.nl"
# Subject of the message.
mailSubject = "Result of Mutalyzer batch check."
# Location of the results.
resultsDir = "/var/cache/mutalyzer"
# Location of the PID file.
PIDfile = "/var/run/mutalyzer/mutalyzer-batchd.pid"
# Maximum size for uploaded batch input files in megabytes.
batchInputMaxSize = 5
# The output header for NameChecking
nameCheckOutHeader = "Input", "Errors | Messages", "AccNo", "Genesymbol", "Variant", "Reference Sequence Start Descr.", "Coding DNA Descr.", "Protein Descr.", "GeneSymbol Coding DNA Descr.", "GeneSymbol Protein Descr.", "Genomic Reference", "Coding Reference", "Protein Reference", "Affected Transcripts", "Affected Proteins", "Restriction Sites Created", "Restriction Sites Deleted"
# The output header for SyntaxChecking
syntaxCheckOutHeader = "Input", "Status"
# The output header for PositionConverter
positionConverterOutHeader = "Input Variant", "Errors", "Chromosomal Variant", "Coding Variant(s)"
# The output header for SnpConverter
snpConverterOutHeader = "Input Variant", "HGVS description(s)", "Errors | Messages"
#
# These settings are used by the File module.
#
# Amount of bytes to be read for determining the file type.
bufSize = 32768
# The obligatory header in batch request files.
header = "AccNo", "Genesymbol", "Mutation"
# Threshold for Batch Jobs
threshold = 0.05
#
# These settings are used by the GenRecord module.
#
spliceAlarm = 2
spliceWarn = 5
#
# These settings are for Piwik analytics.
#
# Enable Piwik analytics.
piwik = no
# Piwik server base URL (include protocol, no trailing slash).
piwikBase = https://piwik.example.com
# Piwik site ID.
piwikSite = 1
#
# Mutalyzer config file.
#
# Specify the location of this file in the MUTALYZER_SETTINGS environment
# variable.
# Use this email address for retrieval of records at the NCBI.
email = "mutalyzer@humgen.nl"
# The cache directory.
cache = "/home/<USERNAME>/.cache/mutalyzer"
# Name and location of the log file.
log = "/tmp/mutalyzer-<USERNAME>.log"
# Synchronize the local cache with the live server every morning at 05:25
#25 5 * * * www-data <MUTALYZER_BIN_CACHE_SYNC> 'https://mutalyzer.nl/services/?wsdl' 'https://mutalyzer.nl/Reference/{file}' 3
# Update the mapping database every sunday morning at 03:25 and 04:25
#25 3 * * 7 www-data wget "ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/ARCHIVE/BUILD.36.3/mapview/seq_gene.md.gz" -O - | zcat > /tmp/seq_gene.md; <MUTALYZER_BIN_MAPPING_UPDATE> hg18 /tmp/seq_gene.md reference
#24 4 * * 7 www-data wget "ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/ARCHIVE/BUILD.37.1/mapview/seq_gene.md.gz" -O - | zcat > /tmp/seq_gene.md; <MUTALYZER_BIN_MAPPING_UPDATE> hg19 /tmp/seq_gene.md 'Primary Assembly'
#25 4 * * 7 www-data wget "ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/ARCHIVE/BUILD.37.2/mapview/seq_gene.md.gz" -O - | zcat > /tmp/seq_gene.md; <MUTALYZER_BIN_MAPPING_UPDATE> hg19 /tmp/seq_gene.md 'GRCh37.p2-Primary Assembly'
#25 4 * * 7 www-data wget "ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/ARCHIVE/BUILD.37.3/mapview/seq_gene.md.gz" -O - | zcat > /tmp/seq_gene.md; <MUTALYZER_BIN_MAPPING_UPDATE> hg19 /tmp/seq_gene.md 'GRCh37.p5-Primary Assembly'
#25 4 * * 7 www-data wget "ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/mapview/seq_gene.md.gz" -O - | zcat > /tmp/seq_gene.md; <MUTALYZER_BIN_MAPPING_UPDATE> hg19 /tmp/seq_gene.md 'GRCh37.p10-Primary Assembly'
#! /bin/sh
### BEGIN INIT INFO
# Provides: mutalyzer-batchd
# Required-Start: $local_fs $remote_fs $network $syslog $mysql
# Required-Stop: $local_fs $remote_fs $network $syslog $mysql
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Start and stop the Mutalyzer batch daemon
# Description: Controls the Mutalyzer batch job processing daemon..
### END INIT INFO
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="Mutalyzer batch deamon"
NAME=mutalyzer-batchd
DAEMON=<MUTALYZER_BIN_BATCHD>
DIR=/
PIDDIR=/var/run/mutalyzer
PIDFILE=$PIDDIR/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
USER=www-data
# Exit if the package is not installed
[ -x "$DAEMON" ] || exit 0
# Read configuration variable file if it is present
[ -r /etc/default/$NAME ] && . /etc/default/$NAME
# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh
# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions
#
# Function that starts the daemon/service
#
do_start()
{
# Return
# 0 if daemon has been started
# 1 if daemon was already running
# 2 if daemon could not be started
mkdir -p $PIDDIR
chown -R $USER $PIDDIR
start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --chuid $USER --chdir $DIR --test > /dev/null \
|| return 1
start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --chuid $USER --chdir $DIR -- \
$DAEMON_ARGS \
|| return 2
# Add code here, if necessary, that waits for the process to be ready
# to handle requests from services started subsequently which depend
# on this one. As a last resort, sleep for some time.
}
#
# Function that stops the daemon/service
#
do_stop()
{
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
start-stop-daemon --stop --quiet --oknodo --pidfile $PIDFILE
RETVAL="$?"
[ "$RETVAL" = 2 ] && return 2
# Many daemons don't delete their pidfiles when they exit.
rm -f $PIDFILE
return "$RETVAL"
}
case "$1" in
start)
[ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
do_start
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
stop)
[ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
do_stop
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
status)
status_of_proc "$DAEMON" "$NAME" && exit 0 || exit $?
;;
#reload|force-reload)
#
# If do_reload() is not implemented then leave this commented out
# and leave 'force-reload' as an alias for 'restart'.
#
#log_daemon_msg "Reloading $DESC" "$NAME"
#do_reload
#log_end_msg $?
#;;
restart|force-reload)
#
# If the "reload" option is implemented then remove the
# 'force-reload' alias
#
log_daemon_msg "Restarting $DESC" "$NAME"
do_stop
case "$?" in
0|1)
do_start
case "$?" in
0) log_end_msg 0 ;;
1) log_end_msg 1 ;; # Old process is still running
*) log_end_msg 1 ;; # Failed to start
esac
;;
*)
# Failed to stop
log_end_msg 1
;;
esac
;;
*)
#echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2
echo "Usage: $SCRIPTNAME {start|stop|status|restart|force-reload}" >&2
exit 3
;;
esac
:
#!/usr/bin/env python
"""
Add a column and index 'created' to the 'GBInfo' table.
Usage:
./001-db-gbinfo-add-created.migration [migrate]
"""
import migration
def check():
"""
Check if migration is needed.
"""
connection = migration.db_connect('mutalyzer')
cursor = connection.cursor()
cursor.execute('SHOW COLUMNS FROM GBInfo WHERE field = "created";')
has_column = len(cursor.fetchall()) > 0
cursor.execute('SHOW INDEX FROM GBInfo WHERE Key_name = "created";')
has_index = len(cursor.fetchall()) > 0
connection.close()
if has_column != has_index:
migration.fatal('Installation is not in a recognizable state. Fix manually.')
return not has_column
def migrate():
"""
Perform migration.
"""
connection = migration.db_connect('mutalyzer')
cursor = connection.cursor()
cursor.execute("""
ALTER TABLE GBInfo
ADD COLUMN created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
ADD INDEX created (created);""")
cursor.execute('UPDATE GBInfo SET created = CURRENT_TIMESTAMP;')
connection.commit()
connection.close()
migration.info('Added column mutalyzer.GBInfo.created')
migration.info('Added index on mutalyzer.GBInfo.created')
if __name__ == '__main__':
migration.main(check, migrate)
#!/usr/bin/env python
"""
Convert the old 'map' tables to the new 'Mapping' tables.
Usage:
./002-db-map-to-mapping.migration [migrate]
This is basically just a renaming of columns and
- use NULL for missing values
- add 1 to all chromosomal start positions.
The following tables on hg18 and hg19 are dropped:
- gbStatus
- map_cdsBackup
- refGene
- refLink
The map tables are renamed to map_backup.
"""
import MySQLdb
import migration
def _exon_starts(starts):
updated = []
for start in starts.split(',')[:-1]:
updated.append(str(int(start) + 1))
return ','.join(updated)
def _exon_stops(stops):
if stops[-1] == ',':
return stops[:-1]
def _check(db):
# Todo: Also check if 'map' is gone.
connection = migration.db_connect(db)
cursor = connection.cursor()
cursor.execute('SHOW TABLES LIKE "Mapping";')
ok = len(cursor.fetchall()) > 0
connection.close()
return ok
def _migrate(db):
connection = migration.db_connect(db)
cursor = connection.cursor()
cursor.execute("""
CREATE TABLE Mapping (
gene varchar(255) DEFAULT NULL,
transcript varchar(20) NOT NULL DEFAULT '',
version smallint(6) DEFAULT NULL,
chromosome varchar(40) DEFAULT NULL,
orientation char(1) DEFAULT NULL,
start int(11) unsigned DEFAULT NULL,
stop int(11) unsigned DEFAULT NULL,
cds_start int(11) unsigned DEFAULT NULL,
cds_stop int(11) unsigned DEFAULT NULL,
exon_starts longblob NOT NULL,
exon_stops longblob NOT NULL,
protein varchar(20) DEFAULT NULL,
source varchar(20) DEFAULT NULL,
INDEX (transcript)
);""")
select_cursor = connection.cursor(MySQLdb.cursors.DictCursor)
select_cursor.execute("""
SELECT
geneName as gene,
acc as transcript,
version as version,
chrom as chromosome,
strand as orientation,
txStart + 1 as start,
txEnd as stop,
NULLIF(cdsStart + 1, cdsEnd + 1) as cds_start,
NULLIF(cdsEnd, cdsStart) as cds_stop,
exonStarts as exon_starts,
exonEnds as exon_stops,
NULLIF(protAcc, '') as protein,
'UCSC' as source
FROM
map;""")
count = 0
while True:
r = select_cursor.fetchone()
if r == None:
break
count += 1
cursor.execute("""
INSERT INTO Mapping
(gene, transcript, version, chromosome, orientation, start, stop,
cds_start, cds_stop, exon_starts, exon_stops, protein, source)
VALUES
(%s, %s, %s, %s, %s, %s, %s,
%s, %s, %s, %s, %s, %s);""",
(r['gene'], r['transcript'], r['version'], r['chromosome'],
r['orientation'], r['start'], r['stop'], r['cds_start'],
r['cds_stop'], _exon_starts(r['exon_starts']), _exon_stops(r['exon_stops']),
r['protein'], r['source']))
migration.info('Converted table map to table Mapping on %s (%d entries)' % (db, count))
cursor.execute('DROP TABLE IF EXISTS gbStatus, map_cdsBackup, refGene, refLink')
cursor.execute('RENAME TABLE map TO map_backup')
migration.info('Dropped tables gbStatus, map_cdsBackup, refGene, refLink on %s' % db)
migration.info('Renamed table map to map_backup on %s' % db)
select_cursor.close()
cursor.close()
connection.commit()
connection.close()
def check():
"""
Check if migration is needed.
"""
hg18_ok = _check('hg18')
hg19_ok = _check('hg19')
if hg18_ok != hg19_ok:
migration.fatal('Installation is not in a recognizable state. Fix manually.')
return not hg18_ok
def migrate():
"""
Perform migration.
"""
_migrate('hg18')
_migrate('hg19')
if __name__ == '__main__':
migration.main(check, migrate)
#!/bin/bash
# Remove UCSC database values from the configuration file.
#
# Usage:
# ./003-config-remove-ucsc.migration [migrate]
COLOR_INFO='\033[32m'
COLOR_WARNING='\033[33m'
COLOR_ERROR='\033[31m'
COLOR_END='\033[0m'
if [ -e /etc/mutalyzer/config ] && $(grep -q 'MySQL username for the UCSC database' /etc/mutalyzer/config); then
echo -e "${COLOR_WARNING}This migration is needed.${COLOR_END}"
if [ "$1" = 'migrate' ]; then
echo 'Performing migration.'
echo -e "${COLOR_INFO}Copying /etc/mutalyzer/config to /etc/mutalyzer/config.backup${COLOR_END}"
cp /etc/mutalyzer/config /etc/mutalyzer/config.backup
sed -i '/MySQL username for the UCSC database/d' /etc/mutalyzer/config
sed -i '/Host name for the UCSC database/d' /etc/mutalyzer/config
sed -i '/Retrieve all entries modified within a certain number of days/d' /etc/mutalyzer/config
sed -i '/RemoteMySQLuser =/d' /etc/mutalyzer/config
sed -i '/^RemoteMySQLhost =/d' /etc/mutalyzer/config
sed -i '/^UpdateInterval =/d' /etc/mutalyzer/config
echo -e "${COLOR_INFO}Removed all UCSC database configuration values from /etc/mutalyzer/config${COLOR_END}"
echo 'Performed migration.'
fi
else
echo -e "${COLOR_INFO}This migration is not needed.${COLOR_END}"
fi
#!/bin/bash
# Remove UCSC update from cron and install NCBI update.
#
# Usage:
# ./004-cron-ucsc-to-ncbi.migration [migrate]
COLOR_INFO='\033[32m'
COLOR_WARNING='\033[33m'
COLOR_ERROR='\033[31m'
COLOR_END='\033[0m'
if [ -e /etc/cron.d/mutalyzer-ucsc-update ] && $(grep -v -q '^#' /etc/cron.d/mutalyzer-ucsc-update); then
echo -e "${COLOR_WARNING}This migration is needed.${COLOR_END}"
if [ "$1" = 'migrate' ]; then
echo 'Performing migration.'
sed -i 's/^/#/' /etc/cron.d/mutalyzer-ucsc-update
echo -e "${COLOR_INFO}Commented all lines in /etc/cron.d/mutalyzer-ucsc-update${COLOR_END}"
if [ ! -e /etc/cron.d/mutalyzer-mapping-update ]; then
BIN_MAPPING_UPDATE=$(which mutalyzer-mapping-update)
cp extras/cron.d/mutalyzer-mapping-update /etc/cron.d/mutalyzer-mapping-update
sed -i -e "s@<MUTALYZER_BIN_MAPPING_UPDATE>@${BIN_MAPPING_UPDATE}@g" /etc/cron.d/mutalyzer-mapping-update
echo -e "${COLOR_INFO}Installed /etc/cron.d/mutalyzer-mapping-update${COLOR_END}"
fi
echo 'Performed migration.'
fi
else
echo -e "${COLOR_INFO}This migration is not needed.${COLOR_END}"
fi
#!/bin/bash
# Add batch checker restriction sites headers to the configuration file.
#
# Usage:
# ./005-config-batch-restriction-sites.migration [migrate]
COLOR_INFO='\033[32m'
COLOR_WARNING='\033[33m'
COLOR_ERROR='\033[31m'
COLOR_END='\033[0m'
if [ -e /etc/mutalyzer/config ] && $(grep -q '^nameCheckOutHeader = "Input", "Errors | Messages", "AccNo", "Genesymbol", "Variant", "Reference Sequence Start Descr.", "Coding DNA Descr.", "Protein Descr.", "GeneSymbol Coding DNA Descr.", "GeneSymbol Protein Descr.", "Genomic Reference", "Coding Reference", "Protein Reference", "Affected Transcripts", "Affected Proteins"$' /etc/mutalyzer/config); then
echo -e "${COLOR_WARNING}This migration is needed.${COLOR_END}"
if [ "$1" = 'migrate' ]; then
echo 'Performing migration.'
echo -e "${COLOR_INFO}Copying /etc/mutalyzer/config to /etc/mutalyzer/config.backup${COLOR_END}"
cp /etc/mutalyzer/config /etc/mutalyzer/config.backup
sed -i 's/nameCheckOutHeader = "Input", "Errors | Messages", "AccNo", "Genesymbol", "Variant", "Reference Sequence Start Descr.", "Coding DNA Descr.", "Protein Descr.", "GeneSymbol Coding DNA Descr.", "GeneSymbol Protein Descr.", "Genomic Reference", "Coding Reference", "Protein Reference", "Affected Transcripts", "Affected Proteins"/nameCheckOutHeader = "Input", "Errors | Messages", "AccNo", "Genesymbol", "Variant", "Reference Sequence Start Descr.", "Coding DNA Descr.", "Protein Descr.", "GeneSymbol Coding DNA Descr.", "GeneSymbol Protein Descr.", "Genomic Reference", "Coding Reference", "Protein Reference", "Affected Transcripts", "Affected Proteins", "Restriction Sites Created", "Restriction Sites Deleted"/' /etc/mutalyzer/config
echo -e "${COLOR_INFO}Added batch checker restriction sites headers to /etc/mutalyzer/config${COLOR_END}"
echo 'Performed migration.'
fi
else
echo -e "${COLOR_INFO}This migration is not needed.${COLOR_END}"
fi
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment