"A palindromic sequence is a nucleic acid sequence (DNA or RNA) that is the same whether read 5' (five-prime) to 3' (three prime) on one strand or 5' to 3' on the complementary strand with which it forms a double helix. Palindromic sequences play an important role in molecular biology [1].\n",
"This notebook will demonstrate how we developed code to identify palindromic sequences.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"<figure><img src=\"https://git.lumc.nl/humgen/programming-course/raw/master/images/1590px-DNA_palindrome.svg.png\" align=\"center\" width=\"500\" ><imgcaption>Figure: Example of an hairppin formed by palidromic repeats.</imgcaption></figure>\n",
"\n"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Translate a given sequence based on a dictionary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To find palindromic sequences we need to be able to create a sequence complement."
"To get the reverse complement, we need to reverse the sequence also."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def reverse(seq):\n",
" rev = seq[::-1]\n",
" return rev"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Function to test if a sequence is palindromic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The functions above can be combined to test a sequence being palindromic."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def palindromic(seq):\n",
" if seq == reverse(complement(seq)):\n",
" return True\n",
" return False"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Testing a larger sequence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we implement the palindromic function in another function. We create a loop which checks for palindromic sequences in a longer piece of DNA. We start to test subsequences with a length of 2. This can be increased by setting the window size parameter. If we find a palindromic sequence, we need to expand our test into both directions. \n",
"\n",
"A palindromic sequence has to have a length of an even number. It's impossible to have a palindrome sequence of length 3, 5, etc. \n",
"\n",
"Below we test te sequence and we find a pattern on position 5-6. We expand our window both directions and find the pattern to span position 4-7. If we expand it further our palindrome will break.\n",
"\n",
"CCCTTAACCC\n",
"\n",
"CCCT*__TA__*ACCC\n",
"\n",
"CCC*__TTAA__*CCC\n",
"\n",
"CC~~CTTAAC~~CC\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def find_palindromic_seq(seq, window = 2):\n",
" for pos in range(0, len(seq)):\n",
" m = 0 # The modifier to incres the window in both directions\n",
" while palindromic( seq[ pos - m : pos + window + m ] ) and pos - m >= 0:\n",
" print 'Palindrome found at {0} {1}'.format(pos - m, seq[ pos - m : pos + window + m ])\n",
" m = m + 1 # We increase our window modifier as long as the palindrome grows"