From 126bd63246440b747e0508e06d42c6b25c3002e8 Mon Sep 17 00:00:00 2001
From: "J.F.J. Laros" <j.f.j.laros@lumc.nl>
Date: Thu, 1 May 2014 19:09:37 +0200
Subject: [PATCH] Document the effort needed to add a new organism

---
 doc/index.rst        |  1 +
 doc/new-organism.rst | 78 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)
 create mode 100644 doc/new-organism.rst

diff --git a/doc/index.rst b/doc/index.rst
index c332722e..1c641b04 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -44,6 +44,7 @@ Additional notes
 
    development
    todo
+   new-organism
    changelog
    copyright
 
diff --git a/doc/new-organism.rst b/doc/new-organism.rst
new file mode 100644
index 00000000..ab1e262a
--- /dev/null
+++ b/doc/new-organism.rst
@@ -0,0 +1,78 @@
+Adding a new organism to Mutalyzer
+==================================
+
+
+Introduction
+------------
+
+In this document, we describe what is needed for Mutalyzer to support new
+organisms. For each functionality, we give a list of requirements and an
+estimate of the amount of time needed for implementation.
+
+
+Position converter
+------------------
+
+To use the position converter, we need a mapping database for the genomic
+reference sequence that is used. The database should contain the following
+information per gene:
+
+- Genomic location (chromosome, transcription start, transcription end).
+- Gene model (CDS start, CDS stop, all splice sites).
+
+The transcription start and end may be missing, but this is not recommended.
+
+
+Implementation time
+^^^^^^^^^^^^^^^^^^^
+
+Depending on the format of the database, making this functionality available
+is relatively straightforward. If the database is stored in a structured
+format (CSV or something that can be converted to CSV automatically),
+importing should take no more than two working days.
+
+
+Name checker
+------------
+
+To use the name checker, a fully annotated GenBank record should be available
+for every genomic location.
+
+
+Public genome build
+^^^^^^^^^^^^^^^^^^^
+
+If the genome build of the organism in question is public, as is the case for
+some model organisms, the GenBank records can be retrieved from the NCBI. In
+this case, no special effort is required.
+
+If the data is public, but is not yet available as a genome build at the NCBI,
+the mapping database can be converted to the format required by the NCBI and
+uploaded. The conversion should take no more than two working days, but the
+time it will take before the build is available depends on the NCBI.
+
+Non public genome build
+^^^^^^^^^^^^^^^^^^^^^^^
+
+If it is not possible to submit the generated GenBank reference files to the
+NCBI, for commercial reasons for example, there is the option to offer a
+stand-alone version of Mutalyzer containing the reference files. This version
+should be hosted at the client side. Preparing such an installation will
+require no more than three working days.
+
+
+Additional notes
+----------------
+
+There is currently no program available for the generation of the GenBank
+reference files. Although this is a one time effort, we estimate that the
+development of such a program requires three weeks.
+
+There is also no program available for the conversion of a mapping database to
+the format the NCBI expects. The specifications are also unknown. Once we have
+the specifications, we estimate that the development of a conversion tool will
+take three weeks.
+
+A stand-alone version of Mutalyzer will require some expertise to set up and
+maintain at the client side. Although we currently provide no support or
+maintenance, we are thinking about a model for this.
-- 
GitLab