skeleton.tex 8.34 KB
 Laros committed Oct 08, 2013 1 2 \documentclass[slidestop]{beamer}  Laros committed Oct 13, 2013 3 \title{Analysis projects skeleton}  Laros committed Oct 08, 2013 4 \providecommand{\myConference}{Git course}  Laros committed Jun 20, 2014 5 \providecommand{\myDate}{Monday, June 23, 2014}  Laros committed Oct 08, 2013 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 \author{Jeroen F. J. Laros} \providecommand{\myGroup}{Leiden Genome Technology Center} \providecommand{\myDepartment}{Department of Human Genetics} \providecommand{\myCenter}{Center for Human and Clinical Genetics} \providecommand{\lastCenterLogo}{ \raisebox{-0.1cm}{ \includegraphics[height=1cm]{lgtc_logo} %\includegraphics[height=0.7cm]{ngi_logo} } } \providecommand{\lastRightLogo}{ %\includegraphics[height=0.7cm]{nbic_logo} %\includegraphics[height=0.8cm]{nwo_logo_en} %\hspace{1.5cm}\includegraphics[height=0.7cm]{gen2phen_logo} } \usetheme{lumc} \begin{document} % This disables the \pause command, handy in the editing phase. %\renewcommand{\pause}{} % Make the title page. \bodytemplate % First page of the presentation. \section{Introduction}  Laros committed Jun 20, 2014 34 35 \subsection{Shared projects} \begin{pframe}  Laros committed Oct 13, 2013 36 37 38 39 40 41 42 43 44 45 46  Most of us work on multiple projects with multiple people. \bigskip That is why is is convenient to: \begin{itemize} \item Have everything in one place. \begin{itemize} \item Data. \item Code. \item Documentation. \end{itemize}  Laros committed Oct 14, 2013 47  \pause  Laros committed Oct 13, 2013 48 49  \item Have the same structure for all projects. \end{itemize}  Laros committed Jun 20, 2014 50 \end{pframe}  Laros committed Oct 13, 2013 51   Laros committed Oct 14, 2013 52 \section{Starting a project}  Laros committed Jun 20, 2014 53 54 \subsection{Project skeleton} \begin{pframe}  Laros committed Oct 13, 2013 55 56 57 58  Usage: \begin{itemize} \item Make a \emph{fork} (copy) of the skeleton project. \item Rename the project.  Laros committed Oct 14, 2013 59  \item Clone your project.  Laros committed Oct 13, 2013 60 61  \item Start working with it. \end{itemize}  Laros committed Oct 14, 2013 62 63  \bigskip \pause  Laros committed Oct 13, 2013 64 65 66 67 68 69 70 71 72 73  Configure your project. \begin{itemize} \item Choose to make your project public or not. \begin{itemize} \item Public by default. \item Public really means public. \end{itemize} \item Add the people that work on this project. \end{itemize}  Laros committed Oct 14, 2013 74 75 76  \vfill \permfoot{https://git.lumc.nl/lgtc-bioinformatics/project-skeleton}  Laros committed Jun 20, 2014 77 \end{pframe}  Laros committed Oct 14, 2013 78   Laros committed Jun 20, 2014 79 80 %\subsection{Forking} %\begin{pframe}  Laros committed Oct 14, 2013 81 82 83 84 85 86 87 88 89 90 91 92 % Make a new analysis project. % \begin{itemize} % \item Go to the Project skeleton'' project page on our GitLab server. % \item Click Fork'' to fork it to a new project. % \item Go to Settings'' to rename the new project. % \begin{itemize} % \item Change both the project as well as the repository path. % \end{itemize} % \end{itemize} % % \vfill % \permfoot{https://git.lumc.nl/lgtc-bioinformatics/project-skeleton}  Laros committed Jun 20, 2014 93 %\end{pframe}  Laros committed Oct 14, 2013 94   Laros committed Jun 20, 2014 95 96 %\subsection{Configuration} %\begin{pframe}  Laros committed Oct 14, 2013 97 98 99 100 101 102 103 104 105 % Configure your project. % \begin{itemize} % \item Choose to make your project public or not. % \begin{itemize} % \item Public by default. % \item Public really means public. % \end{itemize} % \item Add the people that work on this project. % \end{itemize}  Laros committed Jun 20, 2014 106 %\end{pframe}  Laros committed Oct 13, 2013 107 108  \section{Project structure}  Laros committed Jun 20, 2014 109 110 \subsection{Global overview} \begin{pframe}  Laros committed Oct 13, 2013 111 112 113 114 115 116 117 118 119 120  Project layout: \begin{itemize} \item analysis \item data \item doc \item src \end{itemize} \bigskip Ideally, every directory in the project has a \bt{README} file.  Laros committed Jun 20, 2014 121 \end{pframe}  Laros committed Oct 13, 2013 122   Laros committed Jun 20, 2014 123 124 \subsection{The toplevel README'' file} \begin{pframe}  Laros committed Oct 13, 2013 125 126 127 128 129 130  This file contains general information about the project, for example: \begin{itemize} \item Who leads the project. \item Who participates in the project. \item The amount of hours people have spent on this project. \end{itemize}  Laros committed Jun 20, 2014 131 \end{pframe}  Laros committed Oct 13, 2013 132   Laros committed Jun 20, 2014 133 134 \subsection{The doc'' directory} \begin{pframe}  Laros committed Oct 13, 2013 135 136  Documentation on the project: \begin{itemize}  Laros committed Oct 14, 2013 137  \item Annotation of the data.  Laros committed Oct 13, 2013 138 139 140 141 142 143  \item Goal of the project. \item Related work and literature. \begin{itemize} \item You may want to note who provided the documentation. \end{itemize} \end{itemize}  Laros committed Jun 20, 2014 144 \end{pframe}  Laros committed Oct 13, 2013 145   Laros committed Jun 20, 2014 146 147 \subsection{The data'' directory} \begin{pframe}  Laros committed Oct 13, 2013 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170  Used to store all raw data. \bigskip The \bt{README} contains: \begin{itemize} \item Description of the delivered data. \begin{itemize} \item Sequencing centre. \item Platform. \item Molecular type. \item Owner. \item Gatherer. \end{itemize} \item Description of other data. \begin{itemize} \item Perhaps you already got BAM files. \begin{itemize} \item Who aligned it? \item Which aligner? \end{itemize} \end{itemize} \end{itemize}  Laros committed Jun 20, 2014 171 \end{pframe}  Laros committed Oct 13, 2013 172   Laros committed Jun 20, 2014 173 174 \subsection{The analysis'' directory} \begin{pframe}  Laros committed Oct 13, 2013 175 176 177 178 179 180 181 182 183 184 185 186 187 188  All analysis related files are stored here: \begin{itemize} \item Run scripts. \item Make files. \item Result files. \end{itemize} \bigskip Try to separate self-contained parts of the analysis in their own subdirectories and document dependencies in a \bt{README} file. \begin{itemize} \item Normal data analysis. \item $k$-mer analysis. \end{itemize}  Laros committed Jun 20, 2014 189 \end{pframe}  Laros committed Oct 13, 2013 190   Laros committed Jun 20, 2014 191 192 \subsection{The src'' directory} \begin{pframe}  Laros committed Oct 13, 2013 193 194 195 196 197  Any custom scripts and specific software versions for this project. \bigskip When these scripts are useful for other projects, move them to their own repository.  Laros committed Jun 20, 2014 198 \end{pframe}  Laros committed Oct 13, 2013 199 200  \section{Working with large files}  Laros committed Jun 20, 2014 201 202 \subsection{Git is not designed for massive files} \begin{pframe}  Laros committed Oct 13, 2013 203 204 205 206 207 208 209  Some problems with large files: \begin{itemize} \item Limited storage on the server. \item Checking out a repository would take a long time. \end{itemize} \bigskip  Laros committed Oct 14, 2013 210 211 212 213 214 215 216 217  It also does not make much sense: \begin{itemize} \item These files are usually \emph{static}. \item And probably \emph{binary}. \end{itemize} \bigskip \pause  Laros committed Oct 13, 2013 218 219 220 221 222  We do want to have some way to track our input and output data. This can be done with \bt{git-annex}. \vfill \permfoot{http://git-annex.branchable.com/}  Laros committed Jun 20, 2014 223 \end{pframe}  Laros committed Oct 13, 2013 224   Laros committed Jun 20, 2014 225 226 \subsection{Git annex} \begin{pframe}  Laros committed Oct 14, 2013 227  Manage files with git, without checking their contents in.  Laros committed Oct 13, 2013 228 229 230 231 232 233 234 235 236  \begin{itemize} \item Manage large files without storing them. \item Store file checksums. \item Prevent files from being deleted accidentally. \end{itemize} \bigskip \pause You first have to enable this for your repository.  Laros committed Oct 14, 2013 237  \bigskip  Laros committed Oct 13, 2013 238 239 240 241  \begin{lstlisting}[language=none, caption=Enable git-annex.] $git annex init "" \end{lstlisting}  Laros committed Jun 20, 2014 242 \end{pframe}  Laros committed Oct 13, 2013 243   Laros committed Jun 20, 2014 244 245 \subsection{Adding big files} \begin{pframe}  Laros committed Oct 14, 2013 246 247 248  In our master repository, we annex a file. \bigskip  Laros committed Oct 13, 2013 249 250 251 252 253  \begin{lstlisting}[language=none, caption=Adding files.]$ git annex add $git commit \end{lstlisting} \bigskip  Laros committed Oct 14, 2013 254  \pause  Laros committed Oct 13, 2013 255 256  In a clone, this file will visible, but not really present.  Laros committed Oct 14, 2013 257 258  \bigskip  Laros committed Oct 13, 2013 259 260 261 262 263  \begin{lstlisting}[language=none, caption=Make a file available.]$ file : broken symbolic link to ... $git annex get \end{lstlisting}  Laros committed Jun 20, 2014 264 \end{pframe}  Laros committed Oct 13, 2013 265   Laros committed Jun 20, 2014 266 267 \subsection{Removing files} \begin{pframe}  Laros committed Oct 13, 2013 268  As long as there are enough copies available, you can remove files.  Laros committed Oct 14, 2013 269 270  \bigskip  Laros committed Oct 13, 2013 271 272 273 274 275 276  \begin{lstlisting}[language=none, caption=A failing drop command.]$ git annex drop drop bigfile (unsafe) git-annex: drop: 1 failed \end{lstlisting} \bigskip  Laros committed Oct 14, 2013 277  \pause  Laros committed Oct 13, 2013 278 279  It is actually quite well protected.  Laros committed Oct 14, 2013 280 281  \bigskip  Laros committed Oct 13, 2013 282 283 284 285  \begin{lstlisting}[language=none, caption=rm fails too.] $rm -rf rm: cannot remove /.git/annex/objects/... \end{lstlisting}  Laros committed Jun 20, 2014 286 \end{pframe}  Laros committed Oct 13, 2013 287   Laros committed Jun 20, 2014 288 289 \subsection{Synchronise your results} \begin{pframe}  Laros committed Oct 13, 2013 290  Let the other repositories know what you have done.  Laros committed Oct 14, 2013 291 292 293  \bigskip \begin{lstlisting}[language=none, caption=Synchronise with all repositories.]  Laros committed Oct 13, 2013 294 295 $ git annex sync \end{lstlisting}  Laros committed Oct 14, 2013 296 297 298 299 300 301 302 303 304  \bigskip \pause You can choose to sync with a selection of repositories. \bigskip \begin{lstlisting}[language=none, caption=Synchronise with a selection.] $git annex sync origin \end{lstlisting}  Laros committed Jun 20, 2014 305 \end{pframe}  Laros committed Oct 14, 2013 306   Laros committed Jun 20, 2014 307 308 \subsection{Working together on the same clone} \begin{pframe}  Laros committed Oct 14, 2013 309 310 311 312 313 314 315 316 317 318  Sometimes you need to work with other people on the same repository clone. \begin{itemize} \item Where the large files are stored. \end{itemize} \bigskip Use the following command to give group access: \bigskip \begin{lstlisting}[language=none, caption=Make everyting group writable.]  Laros committed Oct 13, 2013 319 320 321 $ find -type d -exec chmod 775 {} \; \$ find -type f -exec chmod 664 {} \; \end{lstlisting}  Laros committed Jun 20, 2014 322 \end{pframe}  Laros committed Oct 08, 2013 323 324 325  \section{Questions?} \lastpagetemplate  Laros committed Jun 20, 2014 326 \begin{pframe}  Laros committed Oct 08, 2013 327 328 329 330 331 332 333 334 335 336  \begin{center} Acknowledgements: \bigskip \bigskip Martijn Vermaat Zuotian Tatum \end{center}  Laros committed Oct 13, 2013 337 338 339  \vfill \permfoot{http://git-annex.branchable.com/}  Laros committed Oct 08, 2013 340   Laros committed Jun 20, 2014 341 342  \permfoot{https://git.lumc.nl/lgtc-bioinformatics/project-skeleton} \end{pframe}  Laros committed Oct 08, 2013 343 \end{document}