Skip to content
Snippets Groups Projects
Commit 6e1dd442 authored by Martin Larralde's avatar Martin Larralde
Browse files

Add Jupyter notebook with library usage in `lightmotif-py` docs [ci skip]

parent 5068087c
No related branches found
No related tags found
No related merge requests found
Pipeline #17396 skipped
%% Cell type:markdown id: tags:
# Example
This Jupyter notebook shows how to use the library with common examples.
%% Cell type:code id: tags:
```
import lightmotif
lightmotif.__version__
```
%% Cell type:code id: tags:
```
from urllib.request import urlopen
```
%% Cell type:markdown id: tags:
## Creating a motif
A `Motif` can be created from several sequences of the same length using the
`lightmotif.create` function. This first builds a `CountMatrix` from each
sequence position, and then creates a `WeightMatrix` and a `ScoringMatrix`.
%% Cell type:code id: tags:
```
motif = lightmotif.create(["AATTGTGGTTA", "ATCTGTGGTTA", "TTCTGCGGTTA"])
```
%% Cell type:markdown id: tags:
## Loading a motif
The `lightmotif.load` function can be used to load the motifs found in a given
file. Because it supports any file-like object, we can immediately download a
motif from the [JASPAR](https://jaspar.elixir.no/) database and parse it on
the fly:
%% Cell type:code id: tags:
```
url = "https://jaspar.elixir.no/api/v1/matrix/MA0002.1.jaspar"
with urlopen(url) as response:
motif = next(lightmotif.load(response, format="jaspar16"))
print(f"Loaded motif {motif.name} of length {len(motif.counts)}")
```
%% Cell type:markdown id: tags:
## Adding pseudo-counts
By default, the loaded scoring matrix is built with zero pseudo-counts and
a uniform background, which may not be ideal. Using the `CountMatrix.normalize`
and `WeightMatrix.log_odds` methods, we can build a new `ScoringMatrix` with
pseudo-counts of 0.1:
%% Cell type:code id: tags:
```
pssm = motif.counts.normalize(0.1).log_odds()
```
%% Cell type:markdown id: tags:
## Preparing a sequence
Since the motif we loaded is a human transcription factor binding site,
it makes sense to use a human sequence. As an example, we can load a
contig from the human chromosome 22 ([NT_167212.2](https://www.ncbi.nlm.nih.gov/nuccore/NT_167212.2)).
%% Cell type:code id: tags:
```
url = "https://www.ncbi.nlm.nih.gov/sviewer/viewer.cgi?tool=portal&save=file&report=fasta&id=568801992"
with urlopen(url) as response:
response.readline()
sequence = ''.join(line.strip().decode() for line in response)
```
%% Cell type:markdown id: tags:
To score a sequence with `lightmotif`, if must be first encoded and stored with
a particular memory layout. This is taken care of by the `lightmotif.stripe`
function.
%% Cell type:code id: tags:
```
striped = lightmotif.stripe(sequence)
```
%% Cell type:markdown id: tags:
## Calculate scores
Once the sequence has been prepared, it can be used with the different functions
and methods of `lightmotif` to compute scores for each position. The most most
basic functionality is to compute the PSSM scores for every position of the
sequence. This can be done with the `ScoringMatrix.calculate` method:
%% Cell type:code id: tags:
```
scores = pssm.calculate(striped)
```
%% Cell type:markdown id: tags:
The scores are computed in an efficient column-major matrix which can be used
to further extract high scoring positions:
- The `argmax` method returns the smallest index with the highest score
- The `max` method returns the highest score
- The `threshold` method returns a list of positions with a score above the given score
%% Cell type:code id: tags:
```
print(f"Highest score: {scores.max():.3f}")
print(f"Position with highest score: {scores.argmax()}")
print(f"Position with score above 14: {scores.threshold(13.0)}")
```
%% Cell type:markdown id: tags:
Otherwise, the resulting array can be accessed by index, and flattened into
a list (or an `array`):
%% Cell type:code id: tags:
```
print("Score at position 90517:", scores[156007])
```
%% Cell type:markdown id: tags:
## Using p-value thresholds
LightMotif features a re-implementation of the TFP-PVALUE algorithm which
can convert between a bitscore and a p-value for a given scoring matrix. Use
the `ScoringMatrix.score` method to compute the score threshold for a *p-value*:
%% Cell type:code id: tags:
```
print(f"Score threshold for p=1e-5: {pssm.score(1e-5):.3f}")
```
%% Cell type:markdown id: tags:
The `ScoringMatrix.pvalue` method can compute the *p-value* for a score, allowing
to compute them for scores obtained by the scoring pipeline:
%% Cell type:code id: tags:
```
for index in scores.threshold(13.0):
print(f"Hit at position {index:6}: score={scores[index]:.3f} p={pssm.pvalue(scores[index]):.3g}")
```
%% Cell type:markdown id: tags:
## Scanning algorithm
For cases where a long sequence is being processed, and only a handful of
significative hits is expected, using a scanner will be much more efficient.
A `Scanner` can be created with the `lightmotif.scan` function, and yields
`Hit` objects for every position above the threshold parameter:
%% Cell type:code id: tags:
```
scanner = lightmotif.scan(pssm, striped, threshold=13.0)
for hit in scanner:
print(f"Hit at position {hit.position:6}: score={hit.score:.3f}")
```
%% Cell type:markdown id: tags:
Although it gives equivalent results to the `calculate` example above, the
`scan` implementation uses less memory and is generally faster for higher
threshold values.
%% Cell type:markdown id: tags:
# Reverse-complement
All the examples above are showing how to calculate the hits for the direct
strand. To process the reverse-strand, one could reverse-complement the sequence;
however, it is much more efficient to reverse-complement the `ScoringMatrix`,
as it is usually much smaller in memory.
%% Cell type:code id: tags:
```
psmm_rc = pssm.reverse_complement()
scanner_rc = lightmotif.scan(psmm_rc, striped, threshold=13.0)
```
......@@ -8,6 +8,7 @@ This section contains guides and documents about LightMotif usage.
:caption: Getting Started
Installation <install>
Example <example>
.. toctree::
:maxdepth: 1
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment