diff --git a/README.md b/README.md index f8f05d71c3c1e30f6a3a901deb4134251f757bb4..1a63d56e3994f15cdea97b62577cd32ff613e365 100755 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ Biopet (Bio Pipeline Execution Toolkit) is the main pipeline development framewo Biopet is available as a JAR package in SHARK. The easiest way to start using it is to activate the `biopet` environment module, which sets useful aliases and environment variables: ~~~ -$ module load biopet/v0.4.0 +$ module load biopet/v0.6.0 ~~~ With each Biopet release, an accompanying environment module is also released. The latest release is version 0.4.0, thus `biopet/v0.4.0` is the module you would want to load. diff --git a/docs/developer/example-pipeline.md b/docs/developer/example-pipeline.md index 2cfbce145f2cd609e87dd10e11b4681783472ff5..f4f6bb9d76da32118d6a89c3026165bd1669211e 100644 --- a/docs/developer/example-pipeline.md +++ b/docs/developer/example-pipeline.md @@ -155,4 +155,46 @@ Since our pipeline is called `HelloPipeline`, the root of the configoptions will ### Summary output +Any pipeline that mixes in `SummaryQscript` will produce a summary json. +This summary json usually contains statistics and some output results. + +By mixing in `SummaryQscript`, the new pipeline needs to implement three functions: + +1. `summaryFile: File` +2. `summaryFiles: Map[String, File]` +3. `summarySettings: Map[String, Any]` + +Of those three, `summaryFile` is the most important one, and should point to the file where the summary will be written to. +The `summaryFiles` function should contain any extra files one would like to add to the summary. + Files are listed in a separate `files` JSON object, and will by default include any executables used in the pipelines. +The `summarySettings` function should contain any extra settings one would like to add to the summary. + Settings are listed in a separate `settings` JSON object. + + +Apart from these fields, the summary JSON will be populated with statistics from tool extensions that mix in `Summarizable`. +To populate these statistics, one has to call `addSummarizable` on the tool. + + For instance, let's go back to the `fastqc` example. The original declaration was: + + +```scala + val fastqc = new Fastqc(this) + fastqc.fastqfile = config("fastqc_input") + fastqc.output = new File(outputDir, "fastqc.txt") + + // change kmers settings to 9, wrap with `Some()` because `fastqc.kmers` is a `Option` value. + fastqc.kmers = Some(9) + + add(fastqc) +``` + +To add the fastqc summary to our summary JSON all we have to do is write the following line afterwards: + +```scala + addSummarizable(fastqc) +``` + +Summary statistics for fastqc will then end up in a `stats` JSON object in the summary. +See the [tool tutorial](example-tool.md) for how to make a tool extension produce any summary output. + ### Reporting output (optional) \ No newline at end of file diff --git a/docs/developer/example-tool.md b/docs/developer/example-tool.md index b37062d926f87276a728fd843dc6a0e35af6c0af..4f0fb17cc29a4f79b9bf6c46d2453b24a2ac8df5 100644 --- a/docs/developer/example-tool.md +++ b/docs/developer/example-tool.md @@ -210,4 +210,30 @@ object SimpleTool { ### Summary setup (for reporting results to JSON) +Any tool extension can create summary output for use within a larger pipeline. +To accomplish this, it first has to mix in the `Summarizable` trait. +Once that its done, it must implement the following functions: + +1. `summaryFiles: Map[String, File]` +2. `summaryStats: Map[String, Any]` + +The first of these can contain any files one wishes to include into the summary, but can be just an empty map. + +The second function, `summaryStats`, should create a map of statistics. +This function is only executed after the tool has completed running, and it is therefore possible to extract values from the output. + +Suppose, that our tool simply creates a file that lists the amount of lines in the input file. + We could then extract this value, and store it in the summary through the `summaryStats` function. + This would look like the following: + + ```scala + + def summaryStats: Map[String, Any] = { + Map("count" -> Source.fromFile(output).getLines.head.toInt) + } + + ``` + + See the [pipeline tutorial](example-pipeline.md) for how to use these statistics in a pipeline. + diff --git a/docs/developer/scaladocs.md b/docs/developer/scaladocs.md index 0ce38cb288967286943a15e35a98b533d88fc28a..cd25bee821f8ee1937becc5f318729d5df142483 100644 --- a/docs/developer/scaladocs.md +++ b/docs/developer/scaladocs.md @@ -1,2 +1,3 @@ +* [Scaladocs 0.6.0](https://humgenprojects.lumc.nl/sasc/scaladocs/v0.6.0#nl.lumc.sasc.biopet.package) * [Scaladocs 0.5.0](https://humgenprojects.lumc.nl/sasc/scaladocs/v0.5.0#nl.lumc.sasc.biopet.package) * [Scaladocs 0.4.0](https://humgenprojects.lumc.nl/sasc/scaladocs/v0.4.0#nl.lumc.sasc.biopet.package) diff --git a/docs/general/config.md b/docs/general/config.md index edffaefda3f66a517488aa2d1627b3e76b9273f1..d5cda2ca1341a0eeff92bffabbe5944bd69f51f0 100644 --- a/docs/general/config.md +++ b/docs/general/config.md @@ -125,6 +125,16 @@ It is also possible to set the `"species"` flag. Again, we will default to `unkn } ``` +# More advanced use of config files. +### 4 levels of configuring settings +In biopet, a value of a ConfigNamespace (e.g., "reference_fasta") for a tool or a pipeline can be defined in 4 different levels. + * Level-4: As a fixed value hardcoded in biopet source code + * Level-3: As a user specified value in the user config file + * Level-2: As a system specified value in the global config files. On the LUMC's SHARK cluster, these global config files are located at /usr/local/sasc/config. + * Level-1: As a default value provided in biopet source code. + +During execution, biopet framework will resolve the value for each ConfigNamespace following the order from level-4 to level-1. Hence, a value defined in the a higher level will overwrite a value define in a lower level for the same ConfigNamespace. + ### JSON validation To check if the created JSON file is correct their are several possibilities: the simplest way is using [this](http://jsonformatter.curiousconcept.com/) diff --git a/docs/general/memory.md b/docs/general/memory.md new file mode 100644 index 0000000000000000000000000000000000000000..95e38862e440238f0502fae40f1371cbc38459c8 --- /dev/null +++ b/docs/general/memory.md @@ -0,0 +1,42 @@ +# Memory behaviour biopet + +### Calculation + +#### Values per core +- **Default memory per thread**: *core_memory* + (0.5 * *retries*) +- **Resident limit**: (*core_memory* + (0.5 * *retries*)) * *residentFactor* +- **Vmem limit**: (*core_memory* + (0.5 * *retries*)) * (*vmemFactor* + (0.5 * *retries*)) + +We assume here that the cluster will amplify those values by the number of threads. If this is not the case for your cluster please contact us. + +#### Total values +- **Memory limit** (used for java jobs): (*core_memory* + (0.5 * *retries*)) * *threads* + + +### Defaults + +- **core_memory**: 2.0 (in Gb) +- **threads**: 1 +- **residentFactor**: 1.2 +- **vmemFactor**: 1.4, 2.0 for java jobs + +This are de defaults of biopet but each extension in biopet can set their own defaults. As example the *bwa mem* tools +use by default 8 `threads` and `core_memory` of 6.0. + +### Config + +In the config there is the possibility to set the resources. + +- **core_memory**: This overrides the default of the extension +- **threads**: This overrides the default of the extension +- **resident_factor**: This overrides the default of the extension +- **vmem_factor**: This overrides the default of the extension + +- **vmem**: Sets a fixed vmem, **When this is set the retries won't raise the *vmem* anymore** +- **memory_limit**: Sets a fixed memory limit, **When this is set the retries won't raise the *memory limit* anymore** +- **resident_limit**: Sets a fixed resident limit, **When this is set the retries won't raise the *resident limit* anymore** + +### Retry + +In Biopet the number of retries is set to 5 on default. The first retry does not use an increased memory, starting from the 2nd +retry the memory will automatically be increases, according to the calculations mentioned in [Values per core](#Values per core). diff --git a/docs/index.md b/docs/index.md index 7dc9d5dca6764e33e66da571b4de1c49b525c593..67798f7198abf035a5c01b774abb9403dbbf5a8b 100644 --- a/docs/index.md +++ b/docs/index.md @@ -13,10 +13,10 @@ Biopet (Bio Pipeline Execution Toolkit) is the main pipeline development framewo Biopet is available as a JAR package in SHARK. The easiest way to start using it is to activate the `biopet` environment module, which sets useful aliases and environment variables: ~~~ -$ module load biopet/v0.5.0 +$ module load biopet/v0.6.0 ~~~ -With each Biopet release, an accompanying environment module is also released. The latest release is version 0.5.0, thus `biopet/v0.5.0` is the module you would want to load. +With each Biopet release, an accompanying environment module is also released. The latest release is version 0.6.0, thus `biopet/v0.6.0` is the module you would want to load. After loading the module, you can access the biopet package by simply typing `biopet`: diff --git a/docs/pipelines/bam2wig.md b/docs/pipelines/bam2wig.md index 0a51eb278ad4d64365d472e45adb17007fdabb8d..a7d105f7b3a29dd799c794f871a611d7c9097716 100644 --- a/docs/pipelines/bam2wig.md +++ b/docs/pipelines/bam2wig.md @@ -10,9 +10,9 @@ The required configuration file for Bam2Wig is really minimal, only a single JSO ~~~ {"output_dir": "/path/to/output/dir"} ~~~ -For technical reasons, single sample pipelines, such as this mapping pipeline do **not** take a sample config. +For technical reasons, single sample pipelines, such as this pipeline do **not** take a sample config. Input files are in stead given on the command line as a flag. -Bam2wig requires a one to set the `--bamfile` command line argument to point to the to-be-converted BAM file. +Bam2wig requires one to set the `--bamfile` command line argument to point to the to-be-converted BAM file. ## Running Bam2Wig diff --git a/docs/pipelines/basty.md b/docs/pipelines/basty.md index 40db3b00ea5f1ae3c55db74e05c78d565ccce43a..fdb21ae3ca1ed6cdcb1f373aa3265433b30f03e6 100644 --- a/docs/pipelines/basty.md +++ b/docs/pipelines/basty.md @@ -27,6 +27,11 @@ To run Basty, please create the proper [Config](../general/config.md) files. Batsy uses the [Shiva](shiva.md) pipeline internally. Please check the documentation for this pipeline for the options. +#### Sample input extensions + +Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. + + #### Required configuration values | Submodule | Name | Type | Default | Function | @@ -63,14 +68,14 @@ Specific configuration options additional to Basty are: ``` -### Example +### Examples -##### For the help screen: +#### For the help screen: ~~~ biopet pipeline basty -h ~~~ -##### Run the pipeline: +#### Run the pipeline: Note that one should first create the appropriate [configs](../general/config.md). ~~~ diff --git a/docs/pipelines/carp.md b/docs/pipelines/carp.md index bc4ace74e9efda8d8058d13bcf67e6e9ae4d08a6..1391afbf6b21238a9257ba0f94e9146117317162 100644 --- a/docs/pipelines/carp.md +++ b/docs/pipelines/carp.md @@ -4,6 +4,9 @@ Carp is a pipeline for analyzing ChIP-seq NGS data. It uses the BWA MEM aligner and the MACS2 peak caller by default to align ChIP-seq data and call the peaks and allows you to run all your samples (control or otherwise) in one go. +### Sample input extensions + +Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. ## Configuration File @@ -52,8 +55,9 @@ For the pipeline settings, there are some values that you need to specify while While optional settings are: 1. `aligner`: which aligner to use (`bwa` or `bowtie`) -2. `macs2`: Here only the callpeak modus is implemented. But one can set all the options from [macs2 callpeak](https://github -.com/taoliu/MACS/#call-peaks) in this settings config. Note that the config value is: macs2_callpeak +2. `macs2`: Here only the callpeak modus is implemented. But one can set all the options from [macs2 callpeak](https://github.com/taoliu/MACS/#call-peaks) in this settings config. Note that the config value is: `macs2_callpeak` + + ## Running Carp As with other pipelines in the Biopet suite, Carp can be run by specifying the pipeline after the `pipeline` subcommand: diff --git a/docs/pipelines/flexiprep.md b/docs/pipelines/flexiprep.md index 83b2889f3adec27bd506aeb60ef795b9ae80fb9f..cd1e3f8400e9b9bcce0e03fa8ee335e812bb562c 100644 --- a/docs/pipelines/flexiprep.md +++ b/docs/pipelines/flexiprep.md @@ -51,6 +51,10 @@ Command line flags for Flexiprep are: If `-R2` is given, the pipeline will assume a paired-end setup. +### Sample input extensions + +Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. + ### Config All other values should be provided in the config. Specific config values towards the mapping pipeline are: diff --git a/docs/pipelines/gears.md b/docs/pipelines/gears.md index 2b2abf7bbf8d78ddd270c70104cb7d1d8c3cf78c..2e4cffde83657aae033d299272420f5c4deda3f8 100644 --- a/docs/pipelines/gears.md +++ b/docs/pipelines/gears.md @@ -65,6 +65,10 @@ Command line flags for Gears are: If `-R2` is given, the pipeline will assume a paired-end setup. `-bam` is mutualy exclusive with the `-R1` and `-R2` flags. Either specify `-bam` or `-R1` and/or `-R2`. +### Sample input extensions + +Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. + ### Config | Key | Type | default | Function | diff --git a/docs/pipelines/gentrap.md b/docs/pipelines/gentrap.md index fe26fbc96c8edee9efecba3a56787085d292ea6b..baceae70e90ee1c5441fae1b48844ab99b8f5959 100644 --- a/docs/pipelines/gentrap.md +++ b/docs/pipelines/gentrap.md @@ -6,8 +6,9 @@ Gentrap (*generic transcriptome analysis pipeline*) is a general data analysis p At the moment, Gentrap supports the following aligners: -1. GSNAP -2. TopHat +1. [GSNAP](http://research-pub.gene.com/gmap/) +2. [TopHat](http://ccb.jhu.edu/software/tophat/index.shtml) +3. [Star](https://github.com/alexdobin/STAR/releases) and the following quantification modes: @@ -18,10 +19,14 @@ and the following quantification modes: You can also provide a `.refFlat` file containing ribosomal sequence coordinates to measure how many of your libraries originate from ribosomal sequences. Then, you may optionally remove those regions as well. +## Sample input extensions + +Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. + ## Configuration File As with other biopet pipelines, Gentrap relies on a JSON configuration file to run its analyses. There are two important parts here, the configuration for the samples (to determine the sample layout of your experiment) and the configuration for the pipeline settings (to determine which analyses are run). - +To get help creating the appropriate [configs](../general/config.md) please refer to the config page in the general section. ### Sample Configuration Samples are single experimental units whose expression you want to measure. They usually consist of a single sequencing library, but in some cases (for example when the experiment demands each sample have a minimum library depth) a single sample may contain multiple sequencing libraries as well. All this is can be configured using the correct JSON nesting, with the following pattern: @@ -72,6 +77,7 @@ In the example above, there is one sample (named `sample_A`) which contains one In this case, we have two samples (`sample_X` and `sample_Y`) and `sample_Y` has two different libraries (`lib_one` and `lib_two`). Notice that the names of the samples and libraries may change, but several keys such as `samples`, `libraries`, `R1`, and `R2` remain the same. + ### Pipeline Settings Configuration For the pipeline settings, there are some values that you need to specify while some are optional. Required settings are: @@ -79,20 +85,18 @@ For the pipeline settings, there are some values that you need to specify while 1. `output_dir`: path to output directory (if it does not exist, Gentrap will create it for you). 2. `aligner`: which aligner to use (`gsnap` or `tophat`) 3. `reference_fasta`: this must point to a reference FASTA file and in the same directory, there must be a `.dict` file of the FASTA file. -4. `expression_measures`: this entry determines which expression measurement modes Gentrap will do. You can choose zero or more from the following: `fragments_per_gene`, `bases_per_gene`, `bases_per_exon`, `cufflinks_strict`, `cufflinks_guided`, and/or `cufflinks_blind`. If you only wish to align, you can set the value as an empty list (`[]`). +4. `expression_measures`: this entry determines which expression measurement modes Gentrap will do. You can choose zero or more from the following: `fragments_per_gene`, `base_counts`, `cufflinks_strict`, `cufflinks_guided` and/or `cufflinks_blind`. If you only wish to align, you can set the value as an empty list (`[]`). 5. `strand_protocol`: this determines whether your library is prepared with a specific stranded protocol or not. There are two protocols currently supported now: `dutp` for dUTP-based protocols and `non_specific` for non-strand-specific protocols. 6. `annotation_refflat`: contains the path to an annotation refFlat file of the entire genome While optional settings are: 1. `annotation_gtf`: contains path to an annotation GTF file, only required when `expression_measures` contain `fragments_per_gene`, `cufflinks_strict`, and/or `cufflinks_guided`. -2. `annotation_bed`: contains path to a flattened BED file (no overlaps), only required when `expression_measures` contain `bases_per_gene` and/or `bases_per_exon`. +2. `annotation_bed`: contains path to a flattened BED file (no overlaps), only required when `expression_measures` contain `base_counts`. 3. `remove_ribosomal_reads`: whether to remove reads mapping to ribosomal genes or not, defaults to `false`. 4. `ribosomal_refflat`: contains path to a refFlat file of ribosomal gene coordinates, required when `remove_ribosomal_reads` is `true`. 5. `call_variants`: whether to call variants on the RNA-seq data or not, defaults to `false`. -In addition to these, you must also remember to supply the alignment index required by your aligner of choice. For `tophat` this is `bowtie_index`, while for `gsnap` it is `db` and `dir`. - Thus, an example settings configuration is as follows: ~~~ json @@ -100,13 +104,9 @@ Thus, an example settings configuration is as follows: "output_dir": "/path/to/output/dir", "expression_measures": ["fragments_per_gene", "bases_per_gene"], "strand_protocol": "dutp", - "reference_fasta": "/path/to/reference", + "reference_fasta": "/path/to/reference/fastafile", "annotation_gtf": "/path/to/gtf", "annotation_refflat": "/path/to/refflat", - "gsnap": { - "dir": "/path/to/gsnap/db/dir", - "db": "gsnap_db_name" - } } ~~~ @@ -133,7 +133,7 @@ It is also a good idea to specify retries (we recomend `-retry 3` up to `-retry ## Output Files -The number and types of output files depend on your run configuration. What you can always expect, however, is that there will be a summary JSON file of your run called `gentrap.summary.json` and a PDF report in a `report` folder called `gentrap_report.pdf`. The summary file contains files and statistics specific to the current run, which is meant for cases when you wish to do further processing with your Gentrap run (for example, plotting some figures), while the PDF report provides a quick overview of your run results. +The numbers and types of output files depend on your run configuration. What you can always expect, however, is that there will be a summary JSON file of your run called `gentrap.summary.json` and a PDF report in a `report` folder called `gentrap_report.pdf`. The summary file contains files and statistics specific to the current run, which is meant for cases when you wish to do further processing with your Gentrap run (for example, plotting some figures), while the PDF report provides a quick overview of your run results. ## Getting Help diff --git a/docs/pipelines/mapping.md b/docs/pipelines/mapping.md index 868bc7a4cc7491e2ce4464fa2242549bc3b0640e..f04efebc60425764eeb38cbeae90ca9f4751865b 100644 --- a/docs/pipelines/mapping.md +++ b/docs/pipelines/mapping.md @@ -35,6 +35,11 @@ Command line flags for the mapping pipeline are: If `-R2` is given, the pipeline will assume a paired-end setup. +### Sample input extensions + +It is a good idea to check the format of your input files before starting any pipeline. Since the pipeline expects a specific format based on the file extensions. +So for example if one inputs files with a `fastq | fq` extension the pipeline expects an unzipped `fastq` file. When the extension ends with `fastq.gz | fq.gz` the pipeline expects a bgzipped or gzipped `fastq` file. + ### Config All other values should be provided in the config. Specific config values towards the mapping pipeline are: diff --git a/docs/pipelines/sage.md b/docs/pipelines/sage.md index efb90851623d285b953f6b2c56c9c62999a0a6ed..cd3a5832873ca5bdf467d4b2aea718e91143932d 100644 --- a/docs/pipelines/sage.md +++ b/docs/pipelines/sage.md @@ -30,6 +30,11 @@ Specific configuration values for the Sage pipeline are: | transcriptome | Path (required) | Fasta file for transcriptome. Note: Must come from Ensembl! | | tags_library | Path (optional) | Five-column tab-delimited file (<tag> <firstTag> <AllTags> <FirstAntiTag> <AllAntiTags>). Unsupported option | +### Sample input extensions + +Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. + + ## Running Sage As with other pipelines, you can run the Sage pipeline by invoking the `pipeline` subcommand. There is also a general help available which can be invoked using the `-h` flag: diff --git a/docs/pipelines/shiva.md b/docs/pipelines/shiva.md index 88fc86ce93ff309c2d388eb89832900f81b4be85..011bd1e7e046e9a90e48c8a743c4024289e7592f 100644 --- a/docs/pipelines/shiva.md +++ b/docs/pipelines/shiva.md @@ -24,6 +24,12 @@ The pipeline accepts ```.fastq & .bam``` files as input. Note that one should first create the appropriate [configs](../general/config.md). +### Sample input extensions + +Please refer [to our mapping pipeline](mapping.md) for information about how the input samples should be handled. + +Shiva is a special pipeline in the sense that it can also start directly from `bam` files. Note that one should alter the sample config field from `R1` into `bam`. + ### Full pipeline The full pipeline can start from fastq or from bam file. This pipeline will include pre-process steps for the bam files. diff --git a/docs/pipelines/toucan.md b/docs/pipelines/toucan.md index a533bcaed4288429071dc6e32dc7fdb182df201a..12f4ea108c54402f51d37e6f673d97e633058d6f 100644 --- a/docs/pipelines/toucan.md +++ b/docs/pipelines/toucan.md @@ -4,11 +4,13 @@ Toucan Introduction ----------- The Toucan pipeline is a VEP-based annotation pipeline. -Currently, it comprises just two steps: +Currently, it comprises just two steps by default: * Variant Effect Predictor run * [VEP Normalizer on the VEP output](../tools/VepNormalizer.md) +Additionally, annotation and data-sharing with [Varda](http://varda.readthedocs.org/en/latest/) is possible. + Example ----------- @@ -25,7 +27,7 @@ Configuration You can set all the usual [flags and options](http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html) of the VEP in the configuration, with the same name used by native VEP, except those added after version 75. The naming scheme for flags an options is indentical to the one used by the VEP -As some of these flags might conflict with other Biopet tools/pipelines, it is wise to put the VEP in its own namespace. +As some of these flags might conflict with other Biopet tools/pipelines, it is wise to put the VEP in its own config namespace. You **MUST** set the following fields: @@ -53,6 +55,34 @@ With that in mind, an example configuration using mode `standard` of the VepNorm } ~~~ +Varda +----- +Annotation with a [Varda](http://varda.readthedocs.org/en/latest/) database instance is possible. +When annotation with Varda is enabled, data-sharing of your variants into Varda is taken care of as well. +Since Varda requires knowledge about well-covered regions, a gVCF file is additionally ***required*** when using Varda. +This gVCF should contain the same samples as the input VCF. +Toucan will use said gVCF file to generate a bed track of well-covered regions based on the genome quality. + +One can enable to use of Varda by setting the `use_varda` config value to `true`. + +Varda requires some additional config values. The following config values are required: + + * `varda_root`: URL to Varda root. + * `varda_token`: Your user token + +The following config values are optional: + + * `varda_verify_certificate`: By default set to `true`. + Determines whether the client will verify the SSL certificate. + You can also set a path to a certificate file here; + This is useful when your Varda instance has a self-signed certificate. + * `varda_cache_size`: The size of the cache. Default = 20 + * `varda_buffer_size`: The size of the buffer when sending large files. In bytes. Default = 1 Mib. + * `varda_task_poll_wait`: Wait time in seconds for Varda poller. Defaults to 2. + +Annotation queries can be set by the `annotation_queries` config value in the `manwe` config namespace. +By default, a global query is returned. + Running the pipeline --------------- The command to run the pipeline is: @@ -67,6 +97,12 @@ If one wishes to run it on a cluster, the command becomes: biopet pipeline Toucan -Input <input_vcf> -config <config_json> -run -qsub -jobParaEnv <PE> ~~~~ +With Varda: + +~~~~ bash +biopet pipeline Toucan -Input <input_vcf> -gvcf <gvcf file> -config <config_json> -run -qsub -jobParaEnv <PE> +~~~~ + ## Getting Help diff --git a/docs/releasenotes/release_notes_0.6.0.md b/docs/releasenotes/release_notes_0.6.0.md new file mode 100644 index 0000000000000000000000000000000000000000..24cdc8357a778ae6c6e1a18c1735b26185c9e8fd --- /dev/null +++ b/docs/releasenotes/release_notes_0.6.0.md @@ -0,0 +1,29 @@ +# Release notes Biopet version 0.6.0 + +## General Code changes + +* Refactoring Gentrap, It's modules can now be used outside of gentrap also +* Added more unit testing +* Upgrade to Queue 3.5 +* MultisampleMapping is now a base for all multisample pipelines with a default alignment step + +## Functionality + +* [Gears](../pipelines/gears.md): Metagenomics NGS data. Added support for 16S with Kraken and Qiime +* Raise an exception at the beginning of each pipeline when not using absolute paths +* Moved Varscan from Gentrap to Shiva (Varscan can still be used inside Gentrap) +* [Gentrap](../pipelines/gentrap.md): now uses shiva for variantcalling and produce multisample vcf files +* Added Bowtie 2 +* Added fastq validator, flexiprep now aborts when a input file is corrupted +* Added optional vcf validator step in shiva +* Added optional Varda step in Toucan +* Added trimming of reverse complement adapters (flexiprep does this automatic) +* Added [Tinycap](../pipelines/tinycap.md) for smallRNA analysis +* [Gentrap](../pipelines/gentrap.md): Refactoring changed the "expression_measures" options + +## Infrastructure changes + +* Development environment within the LUMC is now tested with Jenkins + * Added integration tests for Gentrap + * Added integration tests for Gears + * Added general MultisampleMapping testing diff --git a/mkdocs.yml b/mkdocs.yml index d78dc48086615f485b916ad485ef4cac4e0e8a03..689c08275838e7df8947aaba1ab0b8e2d90ac68d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -4,6 +4,7 @@ pages: - General: - Config: 'general/config.md' - Requirements: 'general/requirements.md' + - Memory behaviour: 'general/memory.md' - About: 'general/about.md' - License: 'general/license.md' - Pipelines: @@ -34,6 +35,7 @@ pages: - VcfFilter: 'tools/VcfFilter.md' - VepNormalizer: 'tools/VepNormalizer.md' - Release notes: + - 0.6.0: 'releasenotes/release_notes_0.6.0.md' - 0.5.0: 'releasenotes/release_notes_0.5.0.md' - 0.4.0: 'releasenotes/release_notes_0.4.0.md' - 0.3.2: 'releasenotes/release_notes_0.3.2.md'