{
 "metadata": {
  "name": "",
  "signature": "sha256:e94271a07f14be86cf54e39526ee9cc758acb8b4bebac46472bd7f94a34a1747"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Palindromic sequence analysis"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Notebook by: [M. van Galen](mailto:m.van_galen@lumc.nl)\n",
      "\n",
      "License: [Creative Commons Attribution 3.0 License (CC-by)](http://creativecommons.org/licenses/by/3.0)\"\n",
      "\n",
      "A palindromic sequence is a nucleic acid sequence (DNA or RNA) that is the same whether read 5' (five-prime) to 3' (three prime) on one strand or 5' to 3' on the complementary strand with which it forms a double helix. Palindromic sequences play an important role in molecular biology [1].\n",
      "\n",
      "[1] http://en.wikipedia.org/wiki/Palindromic_sequence\n",
      "\n",
      "This notebook will demonstrate how we developed code to identify palindromic sequences.\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "<figure><img src=\"https://git.lumc.nl/humgen/programming-course/raw/master/images/1590px-DNA_palindrome.svg.png\" align=\"center\" width=\"500\" ><imgcaption>Figure: Example of an hairppin formed by palidromic repeats.</imgcaption></figure>\n",
      "\n"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Translate a given sequence based on a dictionary"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To find palindromic sequences we need to be able to create a sequence complement."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def complement(seq):\n",
      "    complements = {'A': 'T', 'C': 'G', 'T': 'A', 'G': 'C'}\n",
      "    c_seq  = ''\n",
      "    for n in seq:\n",
      "        c_seq  = c_seq + complements[n]\n",
      "    return c_seq"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "A simple string reversal function"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To get the reverse complement, we need to reverse the sequence also."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def reverse(seq):\n",
      "    rev = seq[::-1]\n",
      "    return rev"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Function to test if a sequence is palindromic"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The functions above can be combined to test a sequence being palindromic."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def palindromic(seq):\n",
      "    if seq == reverse(complement(seq)):\n",
      "        return True\n",
      "    return False"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Testing a larger sequence"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here we implement the palindromic function in another function. We create a loop which checks for palindromic sequences in a longer piece of DNA. We start to test subsequences with a length of 2. This can be increased by setting the window size parameter. If we find a palindromic sequence, we need to expand our test into both directions. \n",
      "\n",
      "A palindromic sequence has to have a length of an even number. It's impossible to have a palindrome sequence of length 3, 5, etc. \n",
      "\n",
      "Below we test te sequence and we find a pattern on position 5-6. We expand our window both directions and find the pattern to span position 4-7. If we expand it further our palindrome will break.\n",
      "\n",
      "CCCTTAACCC\n",
      "\n",
      "CCCT*__TA__*ACCC\n",
      "\n",
      "CCC*__TTAA__*CCC\n",
      "\n",
      "CC~~CTTAAC~~CC\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def find_palindromic_seq(seq, window = 2):\n",
      "    for pos in range(0, len(seq)):\n",
      "        m = 0 # The modifier to incres the window in both directions\n",
      "        while palindromic( seq[ pos - m : pos + window + m ] ) and pos - m >= 0:\n",
      "            print 'Palindrome found at {0} {1}'.format(pos - m, seq[ pos - m : pos + window + m ])\n",
      "            m = m + 1 # We increase our window modifier as long as the palindrome grows"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Apply some input to the code"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "reverse('AACGT')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 6,
       "text": [
        "'TGCAA'"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "complement(_)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "'ACGTT'"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "palindromic('AACGTT')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "True"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "find_palindromic_seq('GGGAGACATGTCTAACCGTTGTAAAA', window=6)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Palindrome found at 5 ACATGT\n",
        "Palindrome found at 4 GACATGTC\n",
        "Palindrome found at 3 AGACATGTCT\n"
       ]
      }
     ],
     "prompt_number": 9
    }
   ],
   "metadata": {}
  }
 ]
}