Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
H
hutspot
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Klinische Genetica
capture-lumc
hutspot
Commits
9760cd58
Commit
9760cd58
authored
7 years ago
by
Sander Bollen
Browse files
Options
Downloads
Patches
Plain Diff
installation and requirements
parent
b853933e
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+72
-3
72 additions, 3 deletions
README.md
with
72 additions
and
3 deletions
README.md
+
72
−
3
View file @
9760cd58
#
h
utspot
#
H
utspot
This is a multisample DNA variant calling pipeline based on Snakemake, bwa and the
GATK HaplotypeCaller.
...
...
@@ -18,9 +18,78 @@ GATK HaplotypeCaller.
*
Separate conda environments for
**every**
step. No more dependency hell!
Every job can potentially use different versions of the same package.
*
Optionally sub-sample inputs when number of bases exceeds a user-defined
threshold.
threshold.
# graph
# Installation
We recommend the use of
[
conda
](
https://conda.io/docs/
)
for installing all
dependencies. All rules have a separate conda environment, which guarantees
every tool can use its own dependencies.
To install the base environment containing snakemake itself, activate conda
and run the following in your terminal:
`conda env create -f environment.yml`
Subsequently running the pipeline with
`--use-conda`
will make sure
the correct conda environments get created. This requires a working
internet connection. If you do not want conda environment to be created for
each pipeline run, use the
`--conda-prefix`
argument. See the
[
snakemake documentation
](
http://snakemake.readthedocs.io/en/stable/executable.html
)
for more information.
## GATK
For license reasons, conda cannot fully install the GATK. The JAR
must be registered by running
`gatk-register`
after the environment is
created, which conflicts with the automated environment creation.
For this reason, hutspot
**requires**
you to manually specify the path to
the GATK executable JAR via
`--config GATK=/path/to/gatk.jar`
.
## Fastq-count
Several steps in the pipeline collect fastq metrics via
[
fastq-count
](
https://github.com/sndrtj/fastq-count
)
.
This is a small tool implemented in Rust for speed reasons. As this tool
is not yet in conda, it must be compiled on the user's system before
running the pipeline. When compiled, the path to the executable can be
supplied via
`--config FASTQ_COUNT=/path/to/fastq-count`
.
TODO: Implement drop-in replacement in python for users not wanting to
compile fastq-count.
## Operating system
Hutspot was tested on Ubuntu 16.04 only.
It should reasonably work on most modern Linux distributions.
# Requirements
For every sample you wish to analyze, we require one or more paired end
readgroups in fastq format. They must be compressed with either
`gzip`
or
`bgzip`
.
Samples must be passed to the pipeline through a config file. This is a
simple json file listing the samples and their associated readgroups/libraries.
An example config json can be found
[
here
](
config/example.json
)
, and a
json schema describing the configuration file can be found
[
here
](
config/schema.json
)
.
This json schema can also be used to validate your configuration file.
## Reference files
The following reference files
**must**
be provided:
1.
A reference genome, in fasta format. Must be indexed with
`samtools faidx`
.
2.
A dbSNP VCF file
3.
A VCF file from 1000Genomes
4.
A VCF file from the HapMap project.
The following reference files
**may**
be provided:
1.
Any number of BED files to calculate coverage on.
# Graph
```
plantuml
digraph snakemake_dag {
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment