Commit a2d5d897 authored by bow's avatar bow
Browse files

Update documentation

Thanks to input from Sander van der Zeeuw.
parent 76e6baf4
......@@ -39,7 +39,7 @@ through the source code for ``FIXME`` notes. Having found an issue you would lik
1. Create a new local branch, based on the last version of `master`.
2. Implement the fix.
3. Make sure all test passes. If the bug has not been covered by any of our tests, we request that new tests be added
3. Make sure all tests pass. If the bug has not been covered by any of our tests, we request that new tests be added
to protect against regressions in the future.
4. Commit your changes.
5. Submit a pull request.
......
......@@ -19,7 +19,7 @@ his / her own data analysis pipeline, he/she must clone the entire source code a
functions and also the pipeline's HTTP endpoints there. More ideal is to have a single core module (e.g.
a ``sentinel-core`` package) and other modules which implements specific pipeline support separately. This is not yet
implemented since in order to do so, we need to be able to combine not only parsing logic, but also the HTTP endpoints
that exposes the pipeline's metrics. While this is seems possible, we have not found a way to do so cleanly.
that exposes the pipeline's metrics. While this seems possible, we have not found a way to do so cleanly yet.
Framework
---------
......
......@@ -16,7 +16,7 @@ When you're set, let's start with the ``RunsController`` first.
RunsController
--------------
The updates that we need to do on the ``RunsController`` is quite minimum. First, we need to instantiate a copy of our
The updates that we need to do on the ``RunsController`` are quite minimum. First, we need to instantiate a copy of our
``MapleRunsProcessor`` in it, and then make sure the ``POST /runs`` endpoint recognizes ``Maple``.
To instantiate ``MapleRunsProcessor``, you must first import the processor in the top of the file:
......@@ -155,14 +155,14 @@ anymore. This makes sense, since aggregated data points do not have any name lab
}
Having a shorter API description now means that we only need to implement fewer parameter parsing, as you can see in the
route matcher above. There, we only capture the ``runIds`` filter parameter. The rest of the code deals with actually
querying and aggregating the data.
route matcher above. There, we only capture the ``runIds`` filter parameter. The rest of the code deals with the actual
querying and aggregating of the data.
Epilogue
--------
The ``MapleStatsController`` implementation marks the end of our tutorial. You have just added a new pipeline support to
Sentinel! Feel free to play around with uploading and queryingt the endpoints you just created. When you're more
Sentinel! Feel free to play around with uploading and querying the endpoints you just created. When you're more
familiar with the code base, you can experiment with adding support for more complex pipelines. If that's not enough,
head over to the :doc:`contribute` page and see how you can contribute to Sentinel development.
\ No newline at end of file
head over to the :doc:`contribute` page and see how you can contribute to Sentinel development.
......@@ -17,7 +17,7 @@ for the `Maple` pipeline support.
Internal Models
---------------
There are a few models that you will need to define to model the internal support:
To start off, we first consider the types of object we need to define:
* For the run itself, we'll define a ``MapleRunRecord`` that subclasses ``nl.lumc.sasc.sentinel.models.BaseRunRecord``.
* For the samples, we'll define ``MapleSampleRecord`` that subclasses ``nl.lumc.sasc.sentinel.models.BaseSampleRecord``.
......@@ -262,7 +262,7 @@ We can already see some new classes and objects being used there:
Now we're ready to take a stab at defining the ``extractUnits`` pipeline. Generally, there is at least one function to
extract the samples and libraries defined in a runs processor. This is completely up you (you can even define it inside
the ``processRun`` if you wish). Here, we define it as a separate function so the structure is clearer.
the ``processRun`` function if you wish). Here, we define it as a separate function so the structure is clearer.
Here's our definition of ``extractUnits``:
......@@ -310,7 +310,8 @@ filled with the library records.
Inside, you'll notice that we also have defined three helper functions: ``makeStats`` for creating the ``MapleStats``
object, ``makeLib`` for the library record, and ``makeSample`` for the sample record. All three functions are used
in the last part, where we work directly on the supplied run JSON object. There, you'll see that we are
in the last part, where we work directly on the supplied run JSON object. There, you'll see that this allows us to
deconstruct the nested sample-library structure into two ``Seq``s: a ``Seq`` of samples and a ``Seq`` of libraries.
Again, although in theory you may not need the helper functions, we prefer to have them defined separately for
readability.
......@@ -360,8 +361,8 @@ ones and the general structure of the for-comprehension:
possibility of failure in the type itself. Notice also that the code looks much cleaner, without any nested
``try-catch`` blocks.
2. Some of the function calls' return values are simply an underscore (``_``). This means we are not using whatever
the functioon is returning. Instead we are only interested in its side-effect. Indeed, all the functions whose
2. Some of the function calls' return values are simply an underscore. This means we are not using whatever
the function is returning. Instead we are only interested in its side-effect. Indeed, all the functions whose
result we discard are database storage functions.
And that's it! You now have fully-functioning runs processor.
......@@ -370,7 +371,7 @@ The Stats Processor
-------------------
The final step is defining the stats processor. This step will be relatively simpler than the inputs processor, since
Sentinel have a better idea of what to expect from the database records:
Sentinel now has a better idea of what to expect from the database records:
.. code-block:: scala
:linenos:
......
......@@ -128,5 +128,5 @@ For our `Maple` pipeline, we'll use the schema already defined below. Save this
If the above code looks daunting, don't worry. You can copy-paste the code as-is and try to understand the JSON schema
specifications later on. If you want to play around with the schema itself, there is an online validator available
`here <http://jsonschemalint.com/draft4/>`_. You can copy-paste both JSON documents above there and try tinkering with
them.
`here <http://jsonschemalint.com/draft4/>`_. You can copy-paste both the JSON summary and JSON schema examples above
there and try tinkering with them.
......@@ -14,7 +14,7 @@ queried using one of the predefined HTTP endpoints.
At the moment, Sentinel is meant for internal `LUMC <http://www.lumc.nl>`_ use only. URLs mentioned in this
documentation may not work outside LUMC.
Please use the navigation bar on the right to explore this site.
Please use the navigation bar on the left to explore this site.
.. toctree::
:hidden:
......
......@@ -17,8 +17,8 @@ anecdotal evidence.
To address this issue, we developed Sentinel. Sentinel is a database designed to store various metrics of various
sequencing analysis pipeline runs. It provides a systematic way of storing and querying these metrics, with various
filter and selection capabilities. We believe that only after gathering enough data points, can one makes an informed
decision about his / her sequencing experiment.
filter and selection capabilities. We believe that gathering sufficient data points is the first step to make
informed decisions about a sequencing experiment.
At a Glance
......@@ -35,6 +35,6 @@ important is that Sentinel knows how to parse and store the particular JSON file
the parsing code inside Sentinel but we are working to make the setup more modular.
All uploaded JSON files are only accessible to the uploader and site administrators. The data points contained in the
JSON file, however, are available to anybody with access to the HTTP endpoints. These data points are anonymized by
JSON file however, are available to anybody with access to the HTTP endpoints. These data points are anonymized by
default. Only after (optional) authentication, can a user see the names of the data points.
......@@ -2,13 +2,13 @@ Terminologies
=============
In the context of next-generation sequencing, the same words are often used to refer to multiple things. Here we list
terms that we use repeatedly across the Sentinel documentation and what we actually mean when we use them.
terms that are used repeatedly in the Sentinel documentation.
Library
-------
A library denotes a single execution / run of an NGS machine. It may consist of a single sequence file (in the case of
single-end sequencing) or two sequence files (in the case of paired-end sequencing). Libraries are often used when
single-end sequencing) or two sequence files (paired-end sequencing). Libraries are often used when
a single sample needs to be sequenced more than once (e.g. because its sequencing depth is less than desired) or when
one sample is sequenced in different lanes.
......
......@@ -68,7 +68,7 @@ Depending on the pipeline, you may also see additional attributes such as:
Large Run Summaries
^^^^^^^^^^^^^^^^^^^
Some pipeline run may contain hundreds of samples, which in turn increases the run summary file size as well. Sentinel
Some pipeline runs may contain hundreds of samples, which in turn increases the run summary file size as well. Sentinel
has a default upload limit of 16MB. While this may seem small, there are several things you can do to minimize your
uploaded summary:
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment