Commit c3f6b9a6 authored by bow's avatar bow
Browse files

Finish user and developer documentation

parent 88ce46a6
src/sphinx/changes.rst
\ No newline at end of file
# Sentinel
Sentinel is a JSON-based database for next-generation sequencing statistics.
Sentinel is a JSON-based database for next-generation sequencing statistics. Queries and submissions are all done via a
RESTful HTTP API which is specified based on [Swagger](http://swagger.io).
## Requirements
- Java 8 (must be set as the default `java`)
- Scala 2.11.6
- MongoDB 3.0
- MongoDB 3.0 (running on localhost port 27017 for development)
- Python 2.7 and Sphinx (only when building the documentation)
## Build & Run
## Quick Start
```sh
$ git clone {this-repository}
$ cd sentinel
$ ./scripts/bootstrap_dev.sh
$ ./sbt
> container:start
> browse
......@@ -22,12 +25,9 @@ If `browse` doesn't launch your browser, manually open [http://localhost:8080/](
## Support
Please report issues to [the issue page](https://git.lumc.nl/sasc/sentinel/issues). Feature suggestions are also welcome.
Report issues to [the issue page](https://git.lumc.nl/sasc/sentinel/issues). Fixes and feature suggestions are also
welcome.
## Contributing
## More
```sh
$ grep -r 'TODO' src/
```
You can also check for unclosed issues in the issue page.
Please see the documentation for a complete guide on the project.
......@@ -26,6 +26,7 @@ import nl.lumc.sasc.sentinel._
import nl.lumc.sasc.sentinel.api.auth.AuthenticationSupport
import nl.lumc.sasc.sentinel.db._
import nl.lumc.sasc.sentinel.models._
import nl.lumc.sasc.sentinel.processors.GenericRunsProcessor
import nl.lumc.sasc.sentinel.processors.gentrap._
import nl.lumc.sasc.sentinel.utils.implicits._
......@@ -45,13 +46,10 @@ class StatsController(implicit val swagger: Swagger, mongo: MongodbAccessObject)
protected val applicationDescription: String = "Statistics of deposited run summaries"
/** Adapter for connecting to the run collections */
protected val runs = new RunsAdapter {
val mongo = self.mongo
def processRun(fi: FileItem, user: User, pipeline: Pipeline.Value) = Try(throw new NotImplementedError)
}
protected val runs = new GenericRunsProcessor(mongo)
/** Adapter for connecting to the gentrap collection */
protected val gentrap = new GentrapOutputProcessor(mongo)
protected val gentrap = new GentrapStatsProcessor(mongo)
/** Adapter for connecting to the users collection */
protected val users = new UsersAdapter { val mongo = self.mongo }
......@@ -98,6 +96,53 @@ class StatsController(implicit val swagger: Swagger, mongo: MongodbAccessObject)
// format: OFF
val statsGentrapAlignmentsGetOperation = (apiOperation[Seq[GentrapAlignmentStats]]("statsGentrapAlignmentsGet")
summary "Retrieves the alignment statistics of Gentrap pipeline runs."
notes
"""This endpoint returns a list containing alignment-level metrics of Gentrap pipeline runs.
|
|By default:
|
| * Each data point represents metrics from an alignment file of a sample. To return alignment metrics from
| single libraries, use the `accLevel` parameter.
|
| * When the `accLevel` parameter is set to `lib`, the returned data points represent either single-end or
| paired-end sequencing files. To return data points from only one library type, use the `libType` parameter.
| When the `accLevel` is set to `sample`, the `libType` parameter is ignored as a single sample alignment
| may be a mix of single-end and paired-end library.
|
| * Data points are returned in random order which changes in every query. To maintain a sorted order
| (most-recently created first), use the `sorted` parameter.
|
| * All data points of all Gentrap runs are returned. The filter for specific data points, use the `runIds`,
| `refIds`, and/or `annotIds` parameter.
|
| * All data points are unlabeled. To label the data points with their respective IDs and names, you must be
| the data points' uploader and authenticate yourself using your API key. If the returned data points contain
| data points you did not upload, they will remain unlabeled.
|
|Each returned data point has the following metrics:
|
| * `maxInsertSize`: Maximum insert size (only for paired-end libraries).
| * `median3PrimeBias`: Median value of 3' coverage biases from the top 1000 expressed transcripts (3'-most 100 bp).
| * `median5PrimeBias`: Median value of 5' coverage biases from the top 1000 expressed transcripts (5'-most 100 bp).
| * `median5PrimeTo3PrimeBias`: Median value of 5' to 3' coverage biases.
| * `medianInsertSize`: Median insert size (only for paired-end libraries).
| * `nBasesAligned`: Number of bases aligned.
| * `nBasesCoding`: Number of bases aligned in the coding regions.
| * `nBasesIntergenic`: Number of bases aligned in the intergenic regions.
| * `nBasesIntron`: Number of bases aligned in the intronic regions.
| * `nBasesRibosomal`: Number of bases aligned to ribosomal gene regions.
| * `nBasesUtr`: Number of bases aligned in the UTR regions.
| * `normalizedTranscriptCoverage`: Array representing normalized coverage along transcripts. The transcripts
| come from the top 1000 expressed genes and each item in the array represents 1% of the transcript length.
| * `nReadsAligned`: Number of reads aligned.
| * `nReadsSingleton`: Number of paired-end reads aligned as singletons.
| * `nReadsTotal`: Number of reads.
| * `nReadsProperPair`: Number of paired-end reads aligned as proper pairs.
| * `pctChimeras`: Percentage of reads aligned as chimeras (only for paired-end libraries).
| * `rateIndel`: How much indels are present.
| * `rateReadsMismatch`: Mismatch rate of aligned reads.
| * `stdevInsertSize`: Insert size standard deviation (only for paired-end libraries).
""".stripMargin
parameters (
queryParam[Seq[String]]("runIds")
.description("Include only Gentrap runs with the given run ID(s).")
......@@ -183,6 +228,31 @@ class StatsController(implicit val swagger: Swagger, mongo: MongodbAccessObject)
val statsGentrapAlignmentsAggregateGetOperation = (
apiOperation[GentrapAlignmentStatsAggr]("statsGentrapAlignmentsAggregatesGet")
summary "Retrieves the aggregate alignment statistics of Gentrap pipeline runs."
notes
"""This endpoint returns aggregate values of various alignment-level metrics. The default settings are the same
| as the corresponding data points endpoint. The aggregated metrics are also similar to the data points'
| metrics, with the following additions:
|
| * `nBasesMrna`: Number of bases aligned in the UTR and coding region.
| * `pctBasesCoding`: Percentage of bases aligned in the coding regions.
| * `pctBasesIntergenic`: Percentage of bases aligned in the intergenic regions.
| * `pctBasesIntron`: Percentage of bases aligned in the intronic regions.
| * `pctBasesMrna`: Percentage of bases aligned in the UTR and coding region.
| * `pctBasesRibosomal`: Percentage of bases aligned to ribosomal gene regions.
| * `pctBasesUtr`: Percentage of bases aligned in the UTR regions.
| * `pctReadsAlignedTotal`: Percentage of reads aligned (per total reads).
| * `pctReadsAligned`: Percentage of reads aligned (per aligned reads).
| * `pctReadsSingleton`: Percentage of paired-end reads aligned as singletons (per aligned reads).
| * `pctReadsProperPair`: Percentage of paired-end reads aligned as proper pairs (per aligned reads).
|
|The following data point attribute is not aggregated:
|
| * `normalizedTranscriptCoverage`
|
|Each aggregated metric contains the attributes `avg` (average), `max` (maximum), `min` (minimum), `median`
| (median), and `stdev` (standard deviation). It also contains the `nDataPoints` attribute, showing the number
| of data points aggregated for the metrics.
""".stripMargin
parameters (
queryParam[Seq[String]]("runIds")
.description("Include only Gentrap runs with the given run ID(s).")
......@@ -254,6 +324,44 @@ class StatsController(implicit val swagger: Swagger, mongo: MongodbAccessObject)
// format: OFF
val statsGentrapSequencesGetOperation = (apiOperation[Seq[SeqStats]]("statsGentrapSequencesGet")
summary "Retrieves the sequencing statistics of Gentrap pipeline runs."
notes
"""This endpoint returns a list containing sequence-level metrics of Gentrap pipeline runs.
|
|By default:
|
| * Each data point represents an input set, which may consist of a single sequence (for single-end sequencing)
| or two sequences (for paired-end sequencing). Selection on library type (single, paired, or both) can be
| done via the `libType` parameter. When `libType` is set to `paired`, the data point will contain a `readAll`
| attribute denoting the combined metrics of both `read1` and `read2`.
|
| * The returned data points are computed from the raw sequence files. To return data points of the processed
| sequence files (possibly adapter-clipped and/or trimmed), use the `qcPhase` parameter.
|
| * Data points are returned in random order which changes in every query. To maintain a sorted order
| (most-recently created first), use the `sorted` parameter.
|
| * All data points of all Gentrap runs are returned. The filter for specific data points, use the `runIds`,
| `refIds`, and/or `annotIds` parameter.
|
| * All data points are unlabeled. To label the data points with their respective IDs and names, you must be
| the data points' uploader and authenticate yourself using your API key. If the returned data points contain
| data points you did not upload, they will remain unlabeled.
|
|Each `read*` attribute contains the following metrics:
|
| * `nBases`: Total number of bases across all reads.
| * `nBasesA`: Total number of adenine bases across all reads.
| * `nBasesT`: Total number of thymines across all reads.
| * `nBasesG`: Total number of guanines across all reads.
| * `nBasesC`: Total number of cytosines across all reads.
| * `nBasesN`: Total number of unknown bases across all reads.
| * `nReads`: Total number of reads.
| * `nBasesByQual`: Array indicating how many bases have a given quality. The quality value corresponds to the
| array index (e.g. array(10) shows how many bases have quality value 10 as quality values start from 0).
| * `medianQualByPosition`: Array indicating the median quality value for a given read position. The position
| correspond to the array index (e.g. array(20) shows the median quality value of read position 21 since
| position starts from 1).
""".stripMargin
parameters (
queryParam[Seq[String]]("runIds")
.description("Include only Gentrap runs with the given run ID(s).")
......@@ -334,6 +442,28 @@ class StatsController(implicit val swagger: Swagger, mongo: MongodbAccessObject)
val statsGentrapSequencesAggregateGetOperation =
(apiOperation[SeqStatsAggr[ReadStatsAggr]]("statsGentrapSequencesAggregationsGet")
summary "Retrieves the aggregate sequencing statistics of Gentrap pipeline runs."
notes
"""This endpoint returns aggregate values of various sequence-level metrics. The default settings are the same
| as the corresponding data points endpoint. The aggregated metrics are also similar to the data points'
| metrics' with the following additions:
|
| * `pctBases`: Percentage of bases across all reads.
| * `pctBasesA`: Percentage of adenine bases across all reads.
| * `pctBasesT`: Percentage of thymines across all reads.
| * `pctBasesG`: Percentage of guanines across all reads.
| * `pctBasesC`: Percentage of cytosines across all reads.
| * `pctBasesN`: Percentage of unknown bases across all reads.
| * `pctBasesGC`: Percentage of guanine and cytosine bases across all reads.
|
|The following data point attributes not aggregated:
|
| * `nBasesByQual`
| * `medianQualByPosition`
|
|Each aggregated metric contains the attributes `avg` (average), `max` (maximum), `min` (minimum), `median`
| (median), and `stdev` (standard deviation). It also contains the `nDataPoints` attribute, showing the number
| of data points aggregated for the metrics.
""".stripMargin
parameters (
queryParam[Seq[String]]("runIds")
.description("Include only Gentrap runs with the given run ID(s).")
......
......@@ -82,8 +82,11 @@ case class SeqStatsAggr[T <: AnyRef](read1: T, read2: Option[T] = None, readAll:
* @param nBasesC Total number of cytosines across all reads.
* @param nBasesN Total number of unknown bases across all reads.
* @param nReads Total number of reads.
* @param nBasesByQual Values indicating how many bases have a given quality.
* @param medianQualByPosition Values indicating the median base quality of each position.
* @param nBasesByQual Sequence indicating how many bases have a given quality. The quality values correspond to the
* array index (e.g. Seq(10) shows many bases have quality value 10 as quality values start from 0).
* @param medianQualByPosition Sequence indicating the median quality value for a given read position. The position
* correspond to the array index (e.g. Seq(20) shows the median quality value of read
* position 21 since positions start from 1).
*/
case class ReadStats(
nBases: Long,
......
......@@ -36,7 +36,9 @@ import nl.lumc.sasc.sentinel.utils.pctOf
* @param nBasesIntron Number of bases aligned in the intronic regions.
* @param nBasesRibosomal Number of bases aligned to ribosomal gene regions.
* @param nBasesUtr Number of bases aligned in the UTR regions.
* @param normalizedTranscriptCoverage Values representing normalized transcript coverage.
* @param normalizedTranscriptCoverage Sequence representing normalized transcript coverage along transcripts. The
* transcripts come from the top 1000 expressed genes and each item in the
* sequence represents 1% of the transcript length.
* @param nReadsAligned Number of reads aligned.
* @param nReadsSingleton Number of paired-end reads aligned as singletons.
* @param nReadsTotal Number of reads.
......
......@@ -25,7 +25,7 @@ import org.json4s.mongo.ObjectIdSerializer
* Custom ObjectId serializer for serialization between MongoDB and plain strings.
*
* This serializer is required so that `ObjectId`s can be serialized directly to strings instead of JSON objects, i.e.
* `MongoDBObject("myId" -> ObjectId("1234"))` becomes `{"myId": "1234"}` instead of `{"myId": {"$oid": "1234"}}`.
* `MongoDBObject("myId" -> ObjectId("1234"))` becomes `{"myId": "1234"}` instead of `{"myId": {"\$oid": "1234"}}`.
*
* The (de)serializers are meant to be used by the controllers when sending JSON payloads.
*/
......
API Reference
=============
Changelog
=========
Release 0.1.0
-------------
`release date: July 6 2015`
* First release of Sentinel, with support of the Gentrap pipeline.
......@@ -16,7 +16,8 @@ import sys
import os
import shlex
import sphinx_bootstrap_theme
import alabaster
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
......@@ -34,6 +35,8 @@ import sphinx_bootstrap_theme
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.todo',
'sphinx.ext.extlinks',
'alabaster',
]
# Add any paths that contain templates here, relative to this directory.
......@@ -62,7 +65,7 @@ author = u'Wibowo Arindrarto'
# The short X.Y version.
version = '0.1'
# The full version, including alpha/beta/rc tags.
release = '0.1.0'
release = '0.1.0-SNAPSHOT'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
......@@ -75,7 +78,7 @@ language = None
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
......@@ -99,6 +102,9 @@ exclude_patterns = ['_build']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# Default language to highlight
highlight_language = 'scala'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
......@@ -108,6 +114,9 @@ pygments_style = 'sphinx'
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True
# URL for sentinel and the live api docs
sentinel_url = "../"
apidoc_url = "../api-docs"
# -- Options for HTML output ----------------------------------------------
......@@ -115,15 +124,38 @@ todo_include_todos = True
# a list of builtin themes.
html_theme = 'alabaster'
_scaladoc_url = "scaladoc/%s"
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {
# 'bootswatch_theme': 'flatly',
#}
html_theme_options = {
'show_related': True,
'github_button': False,
'extra_nav_links': {
"API Documentation": apidoc_url,
"Source Code": "https://git.lumc.nl/sasc/sentinel",
"Scala Documentation": _scaladoc_url % release,
}
}
# Aliases for external links
extlinks = {
"scaladoc": (_scaladoc_url, "ScalaDoc version "),
"apidoc": (apidoc_url + "%s", None),
"endpoint": (apidoc_url + "/#!/%s", ""),
}
rst_prolog = """
.. |sentinel_url| replace:: {sentinel_url}
.. |apidoc_url| replace:: {apidoc_url}
""".format(sentinel_url=sentinel_url, apidoc_url=apidoc_url)
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = sphinx_bootstrap_theme.get_html_theme_path()
html_theme_path = [alabaster.get_path()]
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
......@@ -153,14 +185,21 @@ html_static_path = ['_static']
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
html_sidebars = {
'**': [
'about.html',
'navigation.html',
'relations.html',
'searchbox.html',
]
}
# Additional templates that should be rendered to pages, maps page names to
# template names.
......
Contributing
============
Any type of contributions is very welcomed and appreciated :)! From bug reports to new features, there is always room
to help out.
Quick Links
-----------
* Issue tracker: `https://git.lumc.nl/sasc/sentinel/issues <https://git.lumc.nl/sasc/sentinel/issues>`_
* Source code: `https://git.lumc.nl/sasc/sentinel <https://git.lumc.nl/sasc/sentinel>`_
* Git: `git@git.lumc.nl:sasc/sentinel.git <git@git.lumc.nl:sasc/sentinel.git>`_
Bug Reports & Feature Suggestions
---------------------------------
Feel free to report bugs and/or suggest new features about our local LUMC deployment or Sentinel in general to our
`issue tracker <https://git.lumc.nl/sasc/sentinel/issues>`_. We do request that you be as descriptive as possible.
Particularly for bugs, please describe in as much detail as possible what you expected to see and what you saw instead.
Documentation
-------------
Documentation updates and/or fixes are very appreciated! We welcome everything from one-letter typo fixes to new
documentation sections, be it in the internal ScalaDoc or our user guide (the one you're reading now). You are free to
submit a pull request for documentation fixes. If you don't feel like cloning the entire code, we are also happy if you
open an issue on our issue tracker.
Bug Fixes
---------
Bug fix contributions requires that you have a local development environment up and running. Head over to the
:doc:`devs_setup` section for a short guide on how to do so.
To find bugs to fix, you can start by browsing our issue tracker for issues labeled with ``bug``. You can also search
through the source code for ``FIXME`` notes. Having found an issue you would like to fix, the next steps would be:
1. Create a new local branch, based on the last version of `master`.
2. Implement the fix.
3. Make sure all test passes. If the bug has not been covered by any of our tests, we request that new tests be added
to protect against regressions in the future.
4. Commit your changes.
5. Submit a pull request.
We will then review your changes. If it is all good, it will be rebased to ``master`` and we will list your name in our
contributors list :).
And yes, we did say rebase up there, not merge. We prefer to keep our git history linear, which means changes will be
integrated to ``master`` via ``git rebase`` and not ``git merge``.
New Features
------------
Feature implementations follow almost the same procedure as `Bug Fixes`_. The difference being that you are not limited
to the feature requests we list on the issue tracker. If you have a new idea for a new feature that has not been listed
anywhere, you are free to go ahead and implement it. We only ask that if you do wish to have the feature merged with
the `master` branch that you communicate with us first, mainly to prevent possible duplicate works.
Design Notes
============
The Codebase
============
Before delving deeper into the code, it is useful to see how the source code is organized.
Starting from the root, we see three directories:
* ``project``, where the build definition files are located.
* ``scripts``, where helper scripts are located.
* ``src``, where the actual source code files are located.
Inside ``src``, we see four more directories. This may look unusual if you come from a Java background, less-so if
you are already used to Scala. They are:
* ``main``, where the main Sentinel source files are located.
* ``test``, where unit tests are defined.
* ``it``, where integration tests are defined.
* ``sphinx``, where the raw documentation source files are located.
From here on, you should already get a good grip on the contents of the deeper level directories. Some are worth noting,
for reasons of clarity:
* ``test/resources`` contains all test files and example run summaries used for testing. It is symlinked to
``it/resources`` to avoid having duplicate testing resources.
* ``main/resources`` contains run-time resource files that are loaded into the deployment JAR. In most cases, these
are pipeline schema files.
* ``main/webapp/api-docs`` contains a disttribution copy of the `swagger-ui <https://github.com/swagger-api/swagger-ui>`_
package. The package is also bundled into the deployment JAR, to help users explore the Sentinel APIs
interactively.
Internal Design Notes
=====================
General Aims
------------
The goal of Sentinel is to enable storing and retrieval of next-generation sequencing metrics as general as possible.
It should not be constrained to a specific data analysis pipeline, a specific reference sequence, nor a specific
sequencing technology. The challenge here is to have a framework that can be adapted to the need of a lab / institution
processing large quantities of such data, when the data analysis pipelines can be so diverse with so many moving parts.
This is why we decided to implement Sentinel as a service which communicates via the HTTP protocol using JSON files.
JSON files are essentially free-form, yet it still enforces a useful structure and useful data types which can store the
sequencing metrics. Communicating via HTTP also means that we are not constrained to a specific language. A huge number
of tools and programming languages that can communicate via HTTP exist today.
The current implementation still has a noticeable drawback, however. At the moment, if one wishes to add support to
his / her own data analysis pipeline, he/she must clone the entire source code and implement the JSON file parsing
functions and also the pipeline's HTTP endpoints there. More ideal is to have a single core module (e.g.
a ``sentinel-core`` package) and other modules which implements specific pipeline support separately. This is not yet
implemented since in order to do so, we need to be able to combine not only parsing logic, but also the HTTP endpoints
that exposes the pipeline's metrics. While this is seems possible, we have not found a way to do so cleanly.
Framework
---------
Sentinel is written in `Scala <http://www.scala-lang.org/>`_ using the `Scalatra <http://www.scalatra.org/>`_ web
framework. Scalatra was chosen since it is has a minimal core allowing us to add / remove parts as we see fit. Other
frameworks, such as the Play Framework, may come with features that we probably will never use (e.g. a full-blown
templating engine).
The API specification is written based on the `Swagger specification <http://swagger.io>`_. It is not the only API
specification available out there nor is it an official specification endorsed by the W3C. It seems, however,
to enjoy noticeable support from the programming community in general, with various third-party tools and
libraries available (at the time of writing). The spec itself is also accompanied by useful tools such as the
`automatic interactive documentation generator <https://github.com/swagger-api/swagger-ui>`_. Finally, Scalatra can
generate the specification directly from the code, allowing the spec to live side-by-side with the code.
Persistence Layer
-----------------
For the underlying database, Sentinel uses `MongoDB <https://www.mongodb.org/>`_. This is in line with what Sentinel is
trying to achieve: to be as general as possible. MongoDB helps by not imposing any schema on its own. However, we would
like to stress that this does not mean there is no underlying schema of any sort. While MongoDB allows JSON
document of any structure, Sentinel does expect a certain structure from all incoming JSON summary files. They must
represent a single pipeline run, which contain at least one sample, which contain at least one library. Internally,
Sentinel also breaks down an uploaded run summary file into single samples. It is these single samples that are stored
and queried in the database. One can consider that MongoDB allows us to define the 'schema' on our own, in our own code.
Considering this, we strongly recommend that JSON summary files be validated against a schema. Sentinel uses
`JSON schema <http://json-schema.org/>`_, which itself is JSON, for the pipeline schemas.
Data Modeling
-------------
The following list denotes some commonly-used objects inside Sentinel. Other objects exist, so this is not an
exhaustive list.
Controllers
^^^^^^^^^^^
HTTP endpoints are represented as ``Controller`` objects which subclass from the ``SentinelServlet`` class. The
exception to this rule is the ``RootController``, since it implements only few endpoints and is the only controller
that returns HTML for browser display. API specifications are defined inside the controllers and is tied to a specific
route matcher of an HTTP method.
Processors
^^^^^^^^^^
Pipeline support is achieved using ``Processor`` objects, implemented now in the ``nl.lumc.sasc.sentinel.processors``
package. For a given pipeline, two processors must be implemented: a runs processor, responsible for processing
incoming run summary files, and a stats processor, responsible for querying and aggregating metrics of the pipeline.
Adapters
^^^^^^^^
Adapters are traits that are mixed into processors to add processing capabilities. For example, the
``ReferencesAdapter`` can be mixed in to a processor so it also processes reference sequence information. Most of the
adapters involve connection to the MongoDB database, although not all do so.
Records
^^^^^^^
These objects are more loosely-defined, but most of the time they are case classes that represents a MongoDB object
stored in the database. While it is possible to interact with raw MongoDB objects, we prefer to have these objects
contained within case classes to minimize run time errors.
Local Development Setup
=======================
Dependencies
------------
The minimum requirements for a local development environment are:
* `git <https://git-scm.com/>`_ (version >= 1.9)
* `Java <https://www.java.com/en/>`_ (version >= 1.8)
* `MongoDB <https://www.mongodb.org/>`_ (version >= 3.0)
Note that for testing, Sentinel relies on an embedded MongoDB server which it downloads and runs automatically. If you
are only interested in running tests or are confident enough not to use any development servers, you can skip MongoDB
installation.
For building documentation, you will also need Python `Python <https://www.python.org/>`_ (version 2.7.x), since we use
the `Sphinx <http://sphinx-doc.org/>`_ documentation generator. A complete list of python libraries is listed in the
``dev-requirements.txt`` file in the root of the project.
And finally, while the following packages are not required per se, they can make your development much easier:
* `IntelliJ <https://www.jetbrains.com/idea/>`_ IDE, with the
`Scala plugin <https://plugins.jetbrains.com/plugin/?id=1347>`_.
* `httpie <https://github.com/jkbrzt/httpie>`_, a command-line HTTP client for issuing HTTP requests.