Skip to content
Snippets Groups Projects
Commit 607d95b1 authored by Michiel van Galen's avatar Michiel van Galen
Browse files

Solutions ported from other repo

parent 0f6e79f1
No related branches found
No related tags found
No related merge requests found
{
"metadata": {
"name": "",
"signature": "sha256:e94271a07f14be86cf54e39526ee9cc758acb8b4bebac46472bd7f94a34a1747"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Palindromic sequence analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notebook by: [M. van Galen](mailto:m.van_galen@lumc.nl)\n",
"\n",
"License: [Creative Commons Attribution 3.0 License (CC-by)](http://creativecommons.org/licenses/by/3.0)\"\n",
"\n",
"A palindromic sequence is a nucleic acid sequence (DNA or RNA) that is the same whether read 5' (five-prime) to 3' (three prime) on one strand or 5' to 3' on the complementary strand with which it forms a double helix. Palindromic sequences play an important role in molecular biology [1].\n",
"\n",
"[1] http://en.wikipedia.org/wiki/Palindromic_sequence\n",
"\n",
"This notebook will demonstrate how we developed code to identify palindromic sequences.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"<figure><img src=\"https://git.lumc.nl/humgen/programming-course/raw/master/images/1590px-DNA_palindrome.svg.png\" align=\"center\" width=\"500\" ><imgcaption>Figure: Example of an hairppin formed by palidromic repeats.</imgcaption></figure>\n",
"\n"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Translate a given sequence based on a dictionary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To find palindromic sequences we need to be able to create a sequence complement."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def complement(seq):\n",
" complements = {'A': 'T', 'C': 'G', 'T': 'A', 'G': 'C'}\n",
" c_seq = ''\n",
" for n in seq:\n",
" c_seq = c_seq + complements[n]\n",
" return c_seq"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"A simple string reversal function"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To get the reverse complement, we need to reverse the sequence also."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def reverse(seq):\n",
" rev = seq[::-1]\n",
" return rev"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Function to test if a sequence is palindromic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The functions above can be combined to test a sequence being palindromic."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def palindromic(seq):\n",
" if seq == reverse(complement(seq)):\n",
" return True\n",
" return False"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Testing a larger sequence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we implement the palindromic function in another function. We create a loop which checks for palindromic sequences in a longer piece of DNA. We start to test subsequences with a length of 2. This can be increased by setting the window size parameter. If we find a palindromic sequence, we need to expand our test into both directions. \n",
"\n",
"A palindromic sequence has to have a length of an even number. It's impossible to have a palindrome sequence of length 3, 5, etc. \n",
"\n",
"Below we test te sequence and we find a pattern on position 5-6. We expand our window both directions and find the pattern to span position 4-7. If we expand it further our palindrome will break.\n",
"\n",
"CCCTTAACCC\n",
"\n",
"CCCT*__TA__*ACCC\n",
"\n",
"CCC*__TTAA__*CCC\n",
"\n",
"CC~~CTTAAC~~CC\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def find_palindromic_seq(seq, window = 2):\n",
" for pos in range(0, len(seq)):\n",
" m = 0 # The modifier to incres the window in both directions\n",
" while palindromic( seq[ pos - m : pos + window + m ] ) and pos - m >= 0:\n",
" print 'Palindrome found at {0} {1}'.format(pos - m, seq[ pos - m : pos + window + m ])\n",
" m = m + 1 # We increase our window modifier as long as the palindrome grows"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Apply some input to the code"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"reverse('AACGT')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"'TGCAA'"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"complement(_)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"'ACGTT'"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"palindromic('AACGTT')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"True"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"find_palindromic_seq('GGGAGACATGTCTAACCGTTGTAAAA', window=6)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Palindrome found at 5 ACATGT\n",
"Palindrome found at 4 GACATGTC\n",
"Palindrome found at 3 AGACATGTCT\n"
]
}
],
"prompt_number": 9
}
],
"metadata": {}
}
]
}
\ No newline at end of file
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment