{ "metadata": { "name": "", "signature": "sha256:e94271a07f14be86cf54e39526ee9cc758acb8b4bebac46472bd7f94a34a1747" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Palindromic sequence analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notebook by: [M. van Galen](mailto:m.van_galen@lumc.nl)\n", "\n", "License: [Creative Commons Attribution 3.0 License (CC-by)](http://creativecommons.org/licenses/by/3.0)\"\n", "\n", "A palindromic sequence is a nucleic acid sequence (DNA or RNA) that is the same whether read 5' (five-prime) to 3' (three prime) on one strand or 5' to 3' on the complementary strand with which it forms a double helix. Palindromic sequences play an important role in molecular biology [1].\n", "\n", "[1] http://en.wikipedia.org/wiki/Palindromic_sequence\n", "\n", "This notebook will demonstrate how we developed code to identify palindromic sequences.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "<figure><img src=\"https://git.lumc.nl/humgen/programming-course/raw/master/images/1590px-DNA_palindrome.svg.png\" align=\"center\" width=\"500\" ><imgcaption>Figure: Example of an hairppin formed by palidromic repeats.</imgcaption></figure>\n", "\n" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Translate a given sequence based on a dictionary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To find palindromic sequences we need to be able to create a sequence complement." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def complement(seq):\n", " complements = {'A': 'T', 'C': 'G', 'T': 'A', 'G': 'C'}\n", " c_seq = ''\n", " for n in seq:\n", " c_seq = c_seq + complements[n]\n", " return c_seq" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "A simple string reversal function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get the reverse complement, we need to reverse the sequence also." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def reverse(seq):\n", " rev = seq[::-1]\n", " return rev" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Function to test if a sequence is palindromic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The functions above can be combined to test a sequence being palindromic." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def palindromic(seq):\n", " if seq == reverse(complement(seq)):\n", " return True\n", " return False" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Testing a larger sequence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we implement the palindromic function in another function. We create a loop which checks for palindromic sequences in a longer piece of DNA. We start to test subsequences with a length of 2. This can be increased by setting the window size parameter. If we find a palindromic sequence, we need to expand our test into both directions. \n", "\n", "A palindromic sequence has to have a length of an even number. It's impossible to have a palindrome sequence of length 3, 5, etc. \n", "\n", "Below we test te sequence and we find a pattern on position 5-6. We expand our window both directions and find the pattern to span position 4-7. If we expand it further our palindrome will break.\n", "\n", "CCCTTAACCC\n", "\n", "CCCT*__TA__*ACCC\n", "\n", "CCC*__TTAA__*CCC\n", "\n", "CC~~CTTAAC~~CC\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def find_palindromic_seq(seq, window = 2):\n", " for pos in range(0, len(seq)):\n", " m = 0 # The modifier to incres the window in both directions\n", " while palindromic( seq[ pos - m : pos + window + m ] ) and pos - m >= 0:\n", " print 'Palindrome found at {0} {1}'.format(pos - m, seq[ pos - m : pos + window + m ])\n", " m = m + 1 # We increase our window modifier as long as the palindrome grows" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Apply some input to the code" ] }, { "cell_type": "code", "collapsed": false, "input": [ "reverse('AACGT')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "'TGCAA'" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "complement(_)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "'ACGTT'" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "palindromic('AACGTT')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "True" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "find_palindromic_seq('GGGAGACATGTCTAACCGTTGTAAAA', window=6)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Palindrome found at 5 ACATGT\n", "Palindrome found at 4 GACATGTC\n", "Palindrome found at 3 AGACATGTCT\n" ] } ], "prompt_number": 9 } ], "metadata": {} } ] }