... | ... | @@ -10,13 +10,13 @@ Leiden University Medical Center <br> |
|
|
1. [Defining Unique Features](#defining-unique-features-)
|
|
|
1. [Statistical Analysis](#statistical-analysis-)
|
|
|
1. [Polyadenylation Sites Sequence Motif Analysis](#polyadenylation-sites-sequence-motif-analysis-)
|
|
|
1. [Tandem 3' UTR Analysis]()
|
|
|
1. [Sequence Motif Analysis Relative to Acceptor and Donor Sites]()
|
|
|
1. [RNA Binding Motif Analysis]()
|
|
|
1. [Tandem 3' UTR Analysis](#tandem-3-utr-analysis-)
|
|
|
1. [Sequence Motif Analysis Relative to Acceptor and Donor Sites](#sequence-motif-analysis-relative-to-acceptor-and-donor-sites-)
|
|
|
1. [RNA Binding Motif Analysis](#rNA-binding-motif-analysis-)
|
|
|
1. [Scripts]()
|
|
|
1. [Reported Bugs and Fixes]()
|
|
|
1. [Reported Bugs and Fixes](#reported-bugs-and-fixes-)
|
|
|
1. [Citation]()
|
|
|
1. [Authors Affiliation]()
|
|
|
1. [Authors Affiliation](#authors-affiliation-)
|
|
|
<br>
|
|
|
<br>
|
|
|
|
... | ... | @@ -108,3 +108,89 @@ dreme -o output -png -eps -v 1 -t 18000 -p input.targets.fasta -n input.backgrou |
|
|
---
|
|
|
## **Tandem 3' UTR Analysis** <br>
|
|
|
This analysis was performed to identify loci that contain tandem 3' UTRs (loci with multiple PASs located in the same last exon). Custom scripts were used to identify loci that contain at least two PASs that share the same coordinates of the start of the last exon. The number of loci with tandem 3' UTRs was calculated for those in which PAS was significantly coupled to alternative exons and for those that did not show any significant interdepenedncies between alternative exons and the PAS usage. <br>
|
|
|
|
|
|
<br>
|
|
|
|
|
|
---
|
|
|
## **Sequence Motif Analysis Relative to Acceptor and Donor Sites** <br>
|
|
|
For each detected gene, we report the first and last nucleotide of each exon as acceptor and donor splice sites, respectively. Each unique genomic position was converted into a BED format and the strand specific sequences of 2 nucleotides were extracted using UCSC Table Browser (GRCh37/hg19) for both acceptor and donor splice sites.
|
|
|
|
|
|
> **`script:`** The python script for extracting dinucleotide sequences of the splice-sites at R1 and R3 domains (i.e., canonical GT and AG motifs) can be found [**here**](https://git.lumc.nl/s.y.anvar/mRNA-Coupling/ipython_notebook/master/scripts/Rdomain_splice_junction_motif.ipynb). <br>
|
|
|
|
|
|
<br>
|
|
|
|
|
|
---
|
|
|
## **RNA Binding Motif Analysis** <br>
|
|
|
We used MEME suite to identify enriched sequence motifs present in exons significantly coupled with alternative TSSs, PASs or other alternative exons. For each unique exon, three regions were considered: R1 domain (containing up to 35bp upstream of the acceptor splice site), R2 domain (containing 32bp downstream of the acceptor splice site and 32bp upstream of the donor splice site) and R3 domain (containing up to 40bp downstream of the donor splice site). R1, R2 and R3 domains were obtained by extracting strand specific FASTA sequences using UCSC Table Browser (GRCh37/hg19). <br>
|
|
|
|
|
|
We locally ran DREME (version 4.11.4) for each region separately and performed a motif search analysis using a negative background (R1, R2 and R3 domains of exons that were not significantly coupled). We ran DREME without any limitation for the motifs' length (similar to poly(A) site motif analysis). In each case, a maximum of 10 motifs with E-value less than 0.05 was reported. The remaining parameters were kept as default. We then compared each motif found by DREME against the human RNA-binding motifs database CISBP-RNA using TOMTOM Motif Comparison tool. We ran the analysis by setting the Pearson correlation coefficient as comparison function and considered only matches with a minimum false discovery rate (q-value) less than 0.05. <br>
|
|
|
|
|
|
<br>
|
|
|
|
|
|
---
|
|
|
## **Reported Bugs and Fixes** <br>
|
|
|
So far, we have not received any bug reports! In this section, we will report any future changes to the procedure or the accompanied scripts. Feel free to send in your suggestions and comments for improvement or additional features. <br>
|
|
|
|
|
|
<br>
|
|
|
|
|
|
---
|
|
|
## **Authors Affiliation** <br>
|
|
|
**SY Anvar** <br>
|
|
|
Leiden University Medical Center <br>
|
|
|
Department of Human Genetics <br>
|
|
|
Leiden, 2300 RC, The Netherlands <br>
|
|
|
|
|
|
<br>
|
|
|
**Guy Allard** <br>
|
|
|
Leiden University Medical Center <br>
|
|
|
Department of Human Genetics <br>
|
|
|
Leiden, 2300 RC, The Netherlands <br>
|
|
|
|
|
|
<br>
|
|
|
**Elizabeth Tseng** <br>
|
|
|
Pacific Biosciences <br>
|
|
|
Menlo Park, CA 94025, USA <br>
|
|
|
|
|
|
<br>
|
|
|
**Gloria Sheynkman** <br>
|
|
|
Dana-Farber Cancer Institute, USA <br>
|
|
|
Department of Cancer Biology <br>
|
|
|
Boston, MA 02215, USA <br>
|
|
|
|
|
|
<br>
|
|
|
**Eleonora de Klerk** <br>
|
|
|
University California San Francisco, USA <br>
|
|
|
|
|
|
<br>
|
|
|
**Martijn Vermaat** <br>
|
|
|
Leiden University Medical Center <br>
|
|
|
Department of Human Genetics <br>
|
|
|
Leiden, 2300 RC, The Netherlands <br>
|
|
|
|
|
|
<br>
|
|
|
**Hans Johansson** <br>
|
|
|
LGC Bioresearch Technologies <br>
|
|
|
Petaluma, CA 94954-6904, USA <br>
|
|
|
|
|
|
<br>
|
|
|
**Yavuz Ariyurek** <br>
|
|
|
Leiden University Medical Center <br>
|
|
|
Department of Human Genetics <br>
|
|
|
Leiden, 2300 RC, The Netherlands <br>
|
|
|
|
|
|
<br>
|
|
|
**Johan den Dunnen** <br>
|
|
|
Leiden University Medical Center <br>
|
|
|
Department of Human Genetics <br>
|
|
|
Leiden, 2300 RC, The Netherlands <br>
|
|
|
|
|
|
<br>
|
|
|
**Stephen Turner** <br>
|
|
|
Pacific Biosciences <br>
|
|
|
Menlo Park, CA 94025, USA <br>
|
|
|
|
|
|
<br>
|
|
|
**Peter AC 't Hoen** <br>
|
|
|
Leiden University Medical Center <br>
|
|
|
Department of Human Genetics <br>
|
|
|
Leiden, 2300 RC, The Netherlands <br> |
|
|
\ No newline at end of file |