Skip to content
Snippets Groups Projects
practical_two.tex 4.33 KiB
Newer Older
\documentclass{article}
\usepackage{fullpage}
\usepackage{listings}
\usepackage{graphicx}
\frenchspacing
\setlength{\parindent}{0pt}
\pagestyle{empty}
\renewcommand\_{\textunderscore\,}
\let\tempitemize=\itemize
\renewcommand\itemize{
  \vspace{-5pt}
  \tempitemize
  \setlength{\itemsep}{0pt}
}
\let\tempenditemize=\enditemize
\renewcommand\enditemize{
  \tempenditemize
}

\title{NGS Introduction Course\\
  {\Large Practical two, Pipelines}}
Laros's avatar
Laros committed
\date{Friday, April 4, 2014}
\author{Jeroen Laros, Michiel van Galen}

\begin{document}
\maketitle
\thispagestyle{empty}
\begin{figure}[h!]
  \centering
  \includegraphics[width=0.55\textwidth]{pipelines-font}
\end{figure}

% Pipeline
\section{A simple analysis pipeline}
Michiel van Galen's avatar
Michiel van Galen committed
Log in to the virtual machine as described in the first practical. Finish these
exercises first then continue with this one.
Analysis pipelines can vary from a couple of very simple lines of code, to
Michiel van Galen's avatar
Michiel van Galen committed
complex frameworks working on multiple computing nodes. However, the idea
behind it is always the same. Make your analysis easier, while at the same time
it becomes reproducible, documented and easier to share.
Michiel van Galen's avatar
Michiel van Galen committed
Today we will use bash to create a basic pipeline. Unknowingly, you have
already worked with bash. Everything you've typed into the terminal so far is
basically part of bash. To create a pipeline, all we need to do is create a
file with the ``.sh'' extension and simply sum up the steps you want to include.
\medskip

First, let's see how to make your script executable. We need to change the
permissions. This way you can simply run the file directly without having to
specify the interpreter. In Linux we us chmod for this.
\medskip

\begin{lstlisting}
  $ chmod +x file.sh
\end{lstlisting}
\medskip

One last additional piece of information your system is missing, is which
interpreter it should use. We store this in the very first line of the script
Michiel van Galen's avatar
Michiel van Galen committed
and is written like this ``\#!/bin/bash''.
\medskip

This is all you need to know to write your first pipeline. Bash offers a lot
more possibilities than just using Linux power tools, navigation and starting
Michiel van Galen's avatar
Michiel van Galen committed
other software. For example, to make the pipeline easier to re-use with other
data, we can give a bash script parameters. Just like Bowtie, where you can
supply your fastq file. This is done by using the reserved variable \$1 inside
the script. Everywhere in your script this variable will be replace with the
first argument you give on the command line.
\medskip

The next exercise will give you an idea on how to structure a simple bash
pipeline. Complete the exercise and try to understand what is going on in the 
script:
\medskip

\begin{itemize}
Michiel van Galen's avatar
Michiel van Galen committed
  \item Write the lines you see below to a file ``Hello.sh''
  \item Make the file executable ``chmod +x Hello.sh''
  \item Run it with an argument ``./Hello.sh yourname'' and see what it does
  \item Comments start with \# and are used to document your work
Michiel van Galen's avatar
Michiel van Galen committed
\begin{lstlisting}
  #!/bin/bash
  echo Hello \$1, the date is:
  # The next line will print the date.
  date
\end{lstlisting}
\medskip

With this knowledge you should be able to write your own short bash pipeline.
Combine this with what you have learned in the first practical and try to write
a script that combines the tools from the first practical into a pipeline:
\medskip

\begin{itemize}
  \item Include these steps:
Michiel van Galen's avatar
Michiel van Galen committed
  \begin{itemize}
    \item Align a fastq file to the mtDNA
    \item Convert the SAM to a sorted BAM
    \item Call variants
    \item Annotate the variants
  \end{itemize}
  \item Keep these things in mind when writing the script:
  \begin{itemize}
    \item The fastq file should be the first argument
    \item Use comments to explain what happens
    \item Make it executable
    \item Bonus: Can you think of adding more arguments? Maybe for annotation?
  \end{itemize}
\end{itemize}
\medskip

Hints and summary:
\begin{itemize}
  \item A comment line starts with an \#
Michiel van Galen's avatar
Michiel van Galen committed
  \item \$1 is the first argument, \$2 the second, etc...
  \item Make your pipeline executable with ``chmod +x file''
Michiel van Galen's avatar
Michiel van Galen committed
Now you know how to write a basic pipeline. In bash you can also work with
conditions, loops and error handling but this is beyond the scope of this
course. If you like to know more there is plenty to read on the web or visit
one of our other courses. This is the end of the practical.
\medskip

Thank you for participating!