Commit b6e52693 authored by Peter van 't Hof's avatar Peter van 't Hof
Browse files

Merge branch 'develop' into feature-report

parents e7e9aa43 7d5425d9
......@@ -69,10 +69,18 @@ Global setting examples are:
----
#### References
Pipelines and tools that use references should now use the reference module. This gives some more fine-grained control over references.
E.g. pipelines and tools that use a fasta references file should now set value `reference_fasta`.
Additionally, we can set `reference_name` for the name to be used (e.g. `hg19`). If unset, Biopet will default to `unknown`.
It is also possible to set the `species` flag. Again, we will default to `unknown` if unset.
#### Example settings config
~~~
{
"reference": "/references/hg19_nohap/ucsc.hg19_nohap.fasta",
"reference_fasta": "/references/hg19_nohap/ucsc.hg19_nohap.fasta",
"reference_name": "hg19_nohap",
"species": "homo_sapiens",
"dbsnp": "/references/hg19_nohap/dbsnp_137.hg19_nohap.vcf",
"joint_variantcalling": false,
"haplotypecaller": { "scattercount": 100 },
......
......@@ -69,10 +69,17 @@ Global setting examples are:
----
#### References
Pipelines and tools that use references should now use the reference module. This gives some more fine-grained control over references.
E.g. pipelines and tools that use a fasta references file should now set value `reference_fasta`.
Additionally, we can set `reference_name` for the name to be used (e.g. `hg19`). If unset, Biopet will default to `unknown`.
It is also possible to set the `species` flag. Again, we will default to `unknown` if unset.
#### Example settings config
~~~
{
"reference": "/data/LGTC/projects/vandoorn-melanoma/data/references/hg19_nohap/ucsc.hg19_nohap.fasta",
"reference_fasta": "/references/hg19_nohap/ucsc.hg19_nohap.fasta",
"reference_name": "hg19_nohap",
"species": "homo_sapiens",
"dbsnp": "/data/LGTC/projects/vandoorn-melanoma/data/references/hg19_nohap/dbsnp_137.hg19_nohap.vcf",
"joint_variantcalling": false,
"haplotypecaller": { "scattercount": 100 },
......
......@@ -2,35 +2,45 @@
## Introduction
Bam2Wig is a small pipeline consisting of three steps that is used to convert BAM files into track coverage files: bigWig, wiggle, and TDF. While this seems like a task that should be tool, at the time of writing, there are no command line tools that can do such conversion in one go. Thus, the Bam2Wig pipeline was written.
Bam2Wig is a small pipeline consisting of three steps that are used to convert BAM files into track coverage files: bigWig, wiggle, and TDF. While this seems like a task that should be tool, at the time of writing, there are no command line tools that can do such conversion in one go. Thus, the Bam2Wig pipeline was written.
## Configuration
The required configuration file for Bam2Wig is really minimal, only a single JSON file containing an `output_dir` entry:
~~~
{"output_dir": "/path/to/output/dir"}
~~~
For technical reasons, single sample pipelines, such as this mapping pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.
Bam2wig requires a one to set the `--bamfile` command line argument to point to the to-be-converted BAM file.
## Running Bam2Wig
As with other pipelines, you can run the Bam2Wig pipeline by invoking the `pipeline` subcommand. There is also a general help available which can be invoked using the `-h` flag:
~~~
$ java -jar /path/to/biopet.jar pipeline sage -h
~~~bash
$ java -jar /path/to/biopet.jar pipeline bam2wig -h
Arguments for Bam2Wig:
--bamfile <bamfile> Input bam file
-config,--config_file <config_file> JSON / YAML config file(s)
-cv,--config_value <config_value> Config values, value should be formatted like 'key=value' or
'path:path:key=value'
-DSC,--disablescatter Disable all scatters
~~~
If you are on SHARK, you can also load the `biopet` module and execute `biopet pipeline` instead:
~~~
~~~bash
$ module load biopet/v0.3.0
$ biopet pipeline bam2wig
~~~
To run the pipeline:
~~~
biopet pipeline bam2wig -config </path/to/config.json> -qsub -jobParaEnv BWA -run
~~~bash
biopet pipeline bam2wig -config </path/to/config.json> --bamfile </path/to/bam.bam> -qsub -jobParaEnv BWA -run
~~~
## Output Files
......
......@@ -3,8 +3,8 @@
## Introduction
A pipeline for aligning bacterial genomes and detect structural variations on the level of SNPs. Basty will output phylogenetic trees.
Which makes it very easy to look at the variations between certain species or strains.
Basty is a pipeline for aligning bacterial genomes and detecting structural variations on the level of SNPs.
Basty will output phylogenetic trees, which makes it very easy to look at the variations between certain species or strains.
### Tools for this pipeline
* [Shiva](../pipelines/shiva.md)
......@@ -14,7 +14,7 @@ Which makes it very easy to look at the variations between certain species or st
### Requirements
To run for a specific species, please do not forget to create the proper index files.
To run with a specific species, please do not forget to create the proper index files.
The index files are created from the supplied reference:
* ```.dict``` (can be produced with <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>)
......@@ -22,18 +22,59 @@ The index files are created from the supplied reference:
* ```.idxSpecificForAligner``` (depending on which aligner is used one should create a suitable index specific for that aligner.
Each aligner has his own way of creating index files. Therefore the options for creating the index files can be found inside the aligner itself)
### Configuration
To run Basty, please create the proper [Config](../general/config.md) files.
Batsy uses the [Shiva](../shiva.md) pipeline internally. Please check the documentation for this pipeline for the options.
#### Required configuration values
| Submodule | Name | Type | Default | Function |
| --------- | ---- | ---- | ------- | -------- |
| shiva | variantcallers | List[String] | | Which variant caller to use |
| - | output_dir | Path | Path to output directory |
#### Other options
Specific configuration options additional to Basty are:
| Submodule | Name | Type | Default | Function |
| --------- | ---- | ---- | ------- | -------- |
| raxml | seed | Integer | 12345 | RAxML Random seed|
| raxml | raxml_ml_model | String | GTRGAMMAX | RAxML model |
| raxml | ml_runs | Integer | 20 | Number of RaxML runs |
| raxml | boot_runs | Integer | 100 | Number of RaxML boot runs |
#### Example settings config
```json
{
output_dir: </path/to/out_directory>,
"shiva": {
"variantcallers": ["freeBayes"]
},
"raxml" : {
"ml_runs": 50
}
}
```
### Example
##### For the help screen:
~~~
java -jar Biopet.0.2.0.jar pipeline basty -h
java -jar </path/to/biopet.jar> pipeline basty -h
~~~
##### Run the pipeline:
Note that one should first create the appropriate [configs](../general/config.md).
~~~
java -jar Biopet.0.2.0.jar pipeline basty -run -config MySamples.json -config MySettings.json -outDir myOutDir
java -jar </path/to/biopet/jar> pipeline basty -run -config MySamples.json -config MySettings.json
~~~
### Result files
......
# Flexiprep
## Introduction
Flexiprep is out quality control pipeline. This pipeline checks for possible barcode contamination, clips reads, trims reads and runs
the tool <a href="http://www.bioinformatics.babraham.ac.uk/projects/fastqc/" target="_blank">Fastqc</a>.
The adapter clipping is performed by <a href="https://github.com/marcelm/cutadapt" target="_blank">Cutadapt</a>.
For the quality trimming we use: <a href="https://github.com/najoshi/sickle" target="_blank">Sickle</a>. Flexiprep works on `.fastq` files.
Flexiprep is our quality control pipeline. This pipeline checks for possible barcode contamination, clips reads, trims reads and runs
the <a href="http://www.bioinformatics.babraham.ac.uk/projects/fastqc/" target="_blank">Fastqc</a> tool.
Adapter clipping is performed by <a href="https://github.com/marcelm/cutadapt" target="_blank">Cutadapt</a>.
For quality trimming we use <a href="https://github.com/najoshi/sickle" target="_blank">Sickle</a>.
Flexiprep works on `.fastq` files.
## Example
To get the help menu:
~~~
java -jar Biopet-0.2.0-DEV.jar pipeline Flexiprep -h
java -jar </path/to/biopet.jar> pipeline Flexiprep -h
Arguments for Flexiprep:
-R1,--input_r1 <input_r1> R1 fastq file (gzipped allowed)
-sample,--samplename <samplename> Sample name
-library,--libraryname <libraryname> Library name
-outDir,--output_directory <output_directory> Output directory
-R2,--input_r2 <input_r2> R2 fastq file (gzipped allowed)
-skiptrim,--skiptrim Skip Trim fastq files
-skipclip,--skipclip Skip Clip fastq files
-config,--config_file <config_file> JSON config file(s)
-DSC,--disablescatterdefault Disable all scatters
-R1,--input_r1 <input_r1> R1 fastq file (gzipped allowed)
-R2,--input_r2 <input_r2> R2 fastq file (gzipped allowed)
-sample,--sampleid <sampleid> Sample ID
-library,--libid <libid> Library ID
-config,--config_file <config_file> JSON config file(s)
-DSC,--disablescatter Disable all scatters
~~~
As we can see in the above example we provide the options to skip trimming or clipping
since sometimes you want to have the possibility to not perform these tasks e.g.
if there are no adapters present in your .fastq. Note that the pipeline also works on unpaired reads where one should only provide R1.
Note that the pipeline also works on unpaired reads where one should only provide R1.
To start the pipeline (remove `-run` for a dry run):
......@@ -36,9 +33,34 @@ java -jar Biopet-0.2.0.jar pipeline Flexiprep -run -outDir myDir \
-library myLibname -config mySettings.json
~~~
## Configuration and flags
For technical reasons, single sample pipelines, such as this pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.
Command line flags for Flexiprep are:
| Flag (short)| Flag (long) | Type | Function |
| ------------ | ----------- | ---- | -------- |
| -R1 | --input_r1 | Path (**required**) | Path to input fastq file |
| -R2 | --input_r2 | Path (optional) | Path to second read pair fastq file. |
| -sample | --sampleid | String (**required**) | Name of sample |
| -library | --libid | String (**required**) | Name of library |
If `-R2` is given, the pipeline will assume a paired-end setup.
### Config
All other values should be provided in the config. Specific config values towards the mapping pipeline are:
| Name | Type | Function |
| ---- | ---- | -------- |
| skiptrim | Boolean | Skip the trimming step |
| skipclip | Boolean | Skip the clipping step |
## Result files
The results from this pipeline will be a fastq file which is depending on the options either clipped and trimmed, only clipped,
only trimmed or no quality control at all. The pipeline also outputs 2 Fastqc runs one before and one after quality control.
The results from this pipeline will be a fastq file.
The pipeline also outputs 2 Fastqc runs one before and one after quality control.
### Example output
......
......@@ -78,7 +78,7 @@ For the pipeline settings, there are some values that you need to specify while
1. `output_dir`: path to output directory (if it does not exist, Gentrap will create it for you).
2. `aligner`: which aligner to use (`gsnap` or `tophat`)
3. `reference`: this must point to a reference FASTA file and in the same directory, there must be a `.dict` file of the FASTA file.
3. `reference_fasta`: this must point to a reference FASTA file and in the same directory, there must be a `.dict` file of the FASTA file.
4. `expression_measures`: this entry determines which expression measurement modes Gentrap will do. You can choose zero or more from the following: `fragments_per_gene`, `bases_per_gene`, `bases_per_exon`, `cufflinks_strict`, `cufflinks_guided`, and/or `cufflinks_blind`. If you only wish to align, you can set the value as an empty list (`[]`).
5. `strand_protocol`: this determines whether your library is prepared with a specific stranded protocol or not. There are two protocols currently supported now: `dutp` for dUTP-based protocols and `non_specific` for non-strand-specific protocols.
6. `annotation_refflat`: contains the path to an annotation refFlat file of the entire genome
......@@ -100,7 +100,7 @@ Thus, an example settings configuration is as follows:
"output_dir": "/path/to/output/dir",
"expression_measures": ["fragments_per_gene", "bases_per_gene"],
"strand_protocol": "dutp",
"reference": "/path/to/reference",
"reference_fasta": "/path/to/reference",
"annotation_gtf": "/path/to/gtf",
"annotation_refflat": "/path/to/refflat",
"gsnap": {
......
......@@ -17,46 +17,86 @@ After the QC, the pipeline simply maps the reads with the chosen aligner. The re
* <a href="https://github.com/alexdobin/STAR" target="_blank">Star-2pass</a>
* <a href="http://broadinstitute.github.io/picard/" target="_blank">Picard tool suite</a>
## Configuration and flags
For technical reasons, single sample pipelines, such as this mapping pipeline do **not** take a sample config.
Input files are in stead given on the command line as a flag.
Command line flags for the mapping pipeline are:
| Flag (short)| Flag (long) | Type | Function |
| ------------ | ----------- | ---- | -------- |
| -R1 | --input_r1 | Path (**required**) | Path to input fastq file |
| -R2 | --input_r2 | Path (optional) | Path to second read pair fastq file. |
| -sample | --sampleid | String (**required**) | Name of sample |
| -library | --libid | String (**required**) | Name of library |
If `-R2` is given, the pipeline will assume a paired-end setup.
### Config
All other values should be provided in the config. Specific config values towards the mapping pipeline are:
| Name | Type | Function |
| ---- | ---- | -------- |
| aligner | String (optional) | Which aligner to use. Defaults to `bwa`. Choose from [`bwa`, `bwa-aln`, `bowtie`, `gsnap`, `tophat`, `stampy`, `star`, `star-2pass`] |
| skip_flexiprep | Boolean (optional) | Whether to skip the flexiprep QC step (default = False) |
| skip_markduplicates | Boolean (optional) | Whether to skip the Picard Markduplicates step (default = False) |
| skip_metrics | Boolean (optional) | Whether to skip the metrics gathering step (default = False) |
| reference_fasta | Path (**required**) | Path to indexed fasta file to be used as reference |
| platform | String (optional) | Read group Platform (defaults to `illumina`)|
| platform_unit | String (**required**) | Read group platform unit |
| readgroup_sequencing_center | String (**required**) | Read group sequencing center |
| readgroup_description | String (**required**) | Read group description |
| predicted_insertsize | Integer (**required**) | Read group predicted insert size |
It is possible to provide any config value as a command line argument as well, using the `-cv` flag.
E.g. `-cv reference=<path/to/reference>` would set value `reference`.
## Example
Note that one should first create the appropriate [configs](../general/config.md).
Note that one should first create the appropriate [settings config](../general/config.md).
Any supplied sample config will be ignored.
### Example config
```json
{
"reference_fasta": "<path/to/reference">,
"aligner": "bwa",
"skip_metrics": true,
"platform": "our_platform",
"platform_unit": "our_unit",
"readgroup_sequencing_center": "our_center",
"readgroup_description": "our_description",
"predicted_insertsize": 300,
"output_dir": "<path/to/output/dir">
}
```
### Running the pipeline
For the help menu:
~~~
java -jar </path/to/biopet.jar> pipeline mapping -h
Arguments for Mapping:
-R1,--input_r1 <input_r1> R1 fastq file
-outDir,--output_directory <output_directory> Output directory
-R2,--input_r2 <input_r2> R2 fastq file
-outputName,--outputname <outputname> Output name
-skipflexiprep,--skipflexiprep Skip flexiprep
-skipmarkduplicates,--skipmarkduplicates Skip mark duplicates
-skipmetrics,--skipmetrics Skip metrics
-ALN,--aligner <aligner> Aligner
-R,--reference <reference> Reference
-chunking,--chunking Chunking
-numberChunks,--numberchunks <numberchunks> Number of chunks, if not defined pipeline will automatically calculate the number of chunks
-RGID,--rgid <rgid> Readgroup ID
-RGLB,--rglb <rglb> Readgroup Library
-RGPL,--rgpl <rgpl> Readgroup Platform
-RGPU,--rgpu <rgpu> Readgroup platform unit
-RGSM,--rgsm <rgsm> Readgroup sample
-RGCN,--rgcn <rgcn> Readgroup sequencing center
-RGDS,--rgds <rgds> Readgroup description
-RGDT,--rgdt <rgdt> Readgroup sequencing date
-RGPI,--rgpi <rgpi> Readgroup predicted insert size
-config,--config_file <config_file> JSON config file(s)
-DSC,--disablescatterdefault Disable all scatters
-R1,--input_r1 <input_r1> R1 fastq file
-R2,--input_r2 <input_r2> R2 fastq file
-sample,--sampleid <sampleid> Sample ID
-library,--libid <libid> Library ID
-config,--config_file <config_file> JSON / YAML config file(s)
-cv,--config_value <config_value> Config values, value should be formatted like 'key=value' or
'path:path:key=value'
-DSC,--disablescatter Disable all scatters
~~~
To run the pipeline:
~~~
java -jar </path/to/biopet.jar> pipeline mapping -run --config mySettings.json \
-R1 myReads1.fastq -R2 myReads2.fastq -outDir myOutDir -OutputName myReadsOutput \
-R hg19.fasta -RGSM mySampleName -RGLB myLib1
-R1 myReads1.fastq -R2 myReads2.fastq
~~~
Note that removing -R2 causes the pipeline to be able of handlind single end `.fastq` files.
Note that removing -R2 causes the pipeline to assume single end `.fastq` files.
To perform a dry run simply remove `-run` from the commandline call.
......
......@@ -15,10 +15,21 @@ This pipeline uses the following modules and tools:
* [SageCreateTagCounts](../tools/sagetools.md)
## Configuration
## Configuration and flags
Note that one should first create the appropriate [configs](../general/config.md).
Please see the documentation for wrapped pipelines (`Mapping` and `Flexiprep`) for their configuration options and flags.
Specific configuration values for the Sage pipeline are:
| Name | Type | Function |
| ---- | ---- | -------- |
| countbed | Path (required) | Path to count bed file |
| squishedcountbed | Path (optional) | By supplying this file the auto squish job will be skipped |
| transcriptome | Path (required) | Fasta file for transcriptome. Note: Must come from Ensembl! |
| tags_library | Path (optional) | Five-column tab-delimited file (<tag> <firstTag> <AllTags> <FirstAntiTag> <AllAntiTags>). Unsupported option |
## Running Sage
As with other pipelines, you can run the Sage pipeline by invoking the `pipeline` subcommand. There is also a general help available which can be invoked using the `-h` flag:
......@@ -27,13 +38,12 @@ As with other pipelines, you can run the Sage pipeline by invoking the `pipeline
$ java -jar /path/to/biopet.jar pipeline sage -h
Arguments for Sage:
-outDir,--output_directory <output_directory> Output directory
--countbed <countbed> countBed
--squishedcountbed <squishedcountbed> squishedCountBed, by suppling this file the auto squish job will be
skipped
--transcriptome <transcriptome> Transcriptome, used for generation of tag library
-config,--config_file <config_file> JSON config file(s)
-DSC,--disablescatterdefault Disable all scatters
-s,--sample <sample> Only Sample
-config,--config_file <config_file> JSON / YAML config file(s)
-cv,--config_value <config_value> Config values, value should be formatted like 'key=value' or
'path:path:key=value'
-DSC,--disablescatter Disable all scatters
~~~
If you are on SHARK, you can also load the `biopet` module and execute `biopet pipeline` instead:
......
......@@ -3,7 +3,7 @@
## Introduction
This pipeline is build for variant calling on NGS data (preferably Illumina data).
It is based on the <a href="https://www.broadinstitute.org/gatk/guide/best-practices" target="_blank">best practices</a>) of GATK in terms of there approach to variant calling.
It is based on the <a href="https://www.broadinstitute.org/gatk/guide/best-practices" target="_blank">best practices</a>) of GATK in terms of their approach to variant calling.
The pipeline accepts ```.fastq & .bam``` files as input.
----
......@@ -26,9 +26,9 @@ Note that one should first create the appropriate [configs](../general/config.md
### Full pipeline
The full pipeline can start from fastq or from bam file. This pipeline will include pre process steps for the bam files.
The full pipeline can start from fastq or from bam file. This pipeline will include pre-process steps for the bam files.
To get the help menu:
To view the help menu, execute:
~~~
java -jar </path/to/biopet.jar> pipeline shiva -h
......@@ -44,13 +44,15 @@ To run the pipeline:
java -jar </path/to/biopet.jar> pipeline shiva -config MySamples.json -config MySettings.json -run
~~~
To perform a dry run simply remove `-run` from the commandline call.
A dry run can be performed by simply removing the `-run` flag from the command line call.
### Just variantcalling
### Only variant calling
This will not do any pre process steps on the bam files.
It is possible to run Shiva while only performing its variant calling steps.
This has been separated in its own pipeline named `shivavariantcalling`.
As this calling pipeline starts from BAM files, it will naturally not perform any pre-processing steps.
To get the help menu:
To view the help menu, execute:
~~~
java -jar </path/to/biopet.jar> pipeline shivavariantcalling -h
......@@ -68,13 +70,15 @@ To run the pipeline:
java -jar </path/to/biopet.jar> pipeline shivavariantcalling -config MySettings.json -run
~~~
To perform a dry run simply remove `-run` from the commandline call.
A dry run can be performed by simply removing the `-run` flag from the command line call.
----
## Variantcaller
At this moment the following variantcallers modes can be used
## Variant caller
At this moment the following variant callers can be used
`TODO: explain them briefly`
* haplotypecaller
* haplotypecaller_gvcf
......@@ -85,36 +89,59 @@ At this moment the following variantcallers modes can be used
* freebayes
* raw
----
## Multisample and Singlesample
### Multisample
With <a href="https://www.broadinstitute.org/gatk/guide/tagged?tag=multi-sample">multisample</a>
one can perform variantcalling with all samples combined for more statistical power and accuracy.
### Singlesample
If one prefers single sample variantcalling (which is the default) there is no need of setting the joint_variantcalling inside the config.
The single sample variantcalling has 2 modes as well:
----
## Config options
To view all possible config options please navigate to our Gitlab wiki page
<a href="https://git.lumc.nl/biopet/biopet/wikis/GATK-Variantcalling-Pipeline" target="_blank">Config</a>
### Required settings
| Namespace | Name | Type | Default | Function |
| ----------- | ---- | ---- | ------- | -------- |
| - | output_dir | String | | Path to output directory |
| Shiva | variantcallers | List[String] | | Which variant callers to use |
### Config options
| Config Name | Name | Type | Default | Function |
| Namespace | Name | Type | Default | Function |
| ----------- | ---- | ----- | ------- | -------- |
| shiva | reference | String | | reference to align to |
| shiva | reference_fasta | String | | reference to align to |
| shiva | dbsnp | String | | vcf file of dbsnp records |
| shiva | variantcallers | List[String] | | variantcaller to use, see list |
| shiva | multisample_sample_variantcalling | Boolean | true | |
| shiva | single_sample_variantcalling | Boolean | false | |
| shiva | library_variantcalling | Boolean | false | |
| shiva | use_indel_realigner | Boolean | true | Realign indels |
| shiva | use_base_recalibration | Boolean | true | Base recalibrate |
| shiva | use_analyze_covariates | Boolean | false | Analyze covariates during base recalibration step |
| shiva | bam_to_fastq | Boolean | false | Convert bam files to fastq files |
| shiva | correct_readgroups | Boolean | false | Attempt to correct read groups |
| vcffilter | min_sample_depth | Integer | 8 | Filter variants with at least x coverage |
| vcffilter | min_alternate_depth | Integer | 2 | Filter variants with at least x depth on the alternate allele |
| vcffilter | min_samples_pass | Integer | 1 | Minimum amount of samples which pass custom filter (requires additional flags) |
| vcffilter | filter_ref_calls | Boolean | true | Remove reference calls |
| vcfstats | reference | String | Path to reference to be used by `vcfstats` |
Since Shiva uses the [Mapping](../mapping.md) pipeline internally, mapping config values can be specified as well.
For all the options, please see the corresponding documentation for the mapping pipeline.
### Modes
Shiva furthermore supports three modes. The default and recommended option is `multisample_variantcalling`.
During this mode, all bam files will be simultaneously called in one big VCF file. It will work with any number of samples.
On top of that, Shiva provides two separate modes that only work with a single sample.
Those are not recommend, but may be useful to those who need to validate replicates.
Mode `single_sample_variantcalling` calls a single sample as a merged bam file.
I.e., it will merge all libraries in one bam file, then calls on that.
The other mode, `library_variantcalling`, will call simultaneously call all library bam files.
The config for these therefore is:
| Namespace | Name | Type | Default | Function |
| ----------- | ---- | ---- | ------- | -------- |
| shiva | multisample_variantcalling | Boolean | true | Default, multisample calling |
| shiva | single_sample_variantcalling | Boolean | false | Not-recommended, single sample, merged bam |
| shiva | library_variantcalling | Boolean | false | Not-recommended, single sample, per library |
**Config example**
......@@ -124,13 +151,19 @@ To view all possible config options please navigate to our Gitlab wiki page
"samples": {
"SampleID": {
"libraries": {
"lib_id_1": { "bam": "YoureBam.bam" },
"lib_id_1": { "bam": "YourBam.bam" },
"lib_id_2": { "R1": "file_R1.fq.gz", "R2": "file_R2.fq.gz" }
}
}
},
"reference": "<location of fasta of reference>",
"variantcallers": [ "haplotypecaller", "unifiedgenotyper" ],
"shiva": {
"reference": "<location of fasta of reference>",
"variantcallers": [ "haplotypecaller", "unifiedgenotyper" ],
"dbsnp": "</path/to/dbsnp.vcf>",
"vcffilter": {
"min_alternate_depth": 1
}
},
"output_dir": "<output directory>"
}
```
......
......@@ -22,8 +22,9 @@ Arguments for Toucan:
Configuration
-------------
You can set all the usual [flags and options](http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html) of the VEP in the configuration,
with the same name used by native VEP.
As some of these flags might conflict with other Biopet tools/pipelines, it is wise to put the VEP in its own JSON object.
with the same name used by native VEP, except those added after version 75.
The naming scheme for flags an options is indentical to the one used by the VEP
As some of these flags might conflict with other Biopet tools/pipelines, it is wise to put the VEP in its own namespace.
You **MUST** set the following fields:
......@@ -46,7 +47,7 @@ With that in mind, an example configuration using mode `standard` of the VepNorm
"vepnormalizer": {
"mode": "standard"
},
"out_dir": <path_to_output_directory>
"output_dir": <path_to_output_directory>
}
~~~~
......
......@@ -15,10 +15,12 @@
*/
package nl.lumc.sasc.biopet.core
import org.apache.log4j.{ PatternLayout, Appender, WriterAppender, FileAppender }
import org.broadinstitute.gatk.queue.util.{ Logging => GatkLogging }
import java.io.File
import java.io.{ PrintWriter, File }
import nl.lumc.sasc.biopet.core.config.Config
import nl.lumc.sasc.biopet.core.workaround.BiopetQCommandLine
import scala.collection.JavaConversions._
/** Wrapper around executable from Queue */
trait PipelineCommand extends MainCommand with GatkLogging {
......@@ -64,9 +66,18 @@ trait PipelineCommand extends MainCommand with GatkLogging {
}
}
val logDir: File = new File(Config.global.map.getOrElse("output_dir", "./").toString + File.separator + ".log")
logDir.mkdirs()
val logFile = new File(logDir, "biopet." + BiopetQCommandLine.timestamp + ".log")
val a = new WriterAppender(new PatternLayout("%-5p [%d] [%C{1}] - %m%n"), new PrintWriter(logFile))
logger.addAppender(a)
var argv: Array[String] = Array()
argv ++= Array("-S", pipeline)