diff --git a/docs/developer/example-pipeline.md b/docs/developer/example-pipeline.md index ad075e0710f4c6d488cb274228d08c61f570a0c9..2cfbce145f2cd609e87dd10e11b4681783472ff5 100644 --- a/docs/developer/example-pipeline.md +++ b/docs/developer/example-pipeline.md @@ -94,33 +94,65 @@ class HelloPipeline(val root: Configurable) extends QScript with SummaryQScript // This method is the actual pipeline def biopetScript: Unit = { + // Executing a tool like FastQC, calling the extension in `nl.lumc.sasc.biopet.extensions.Fastqc` - // Executing a tool like FastQC - val shiva = new Shiva(this) - shiva.init() - shiva.biopetScript() - addAll(shiva.functions) + val fastqc = new Fastqc(this) + fastqc.fastqfile = config("fastqc_input") + fastqc.output = new File(outputDir, "fastqc.txt") + add(fastqc) - /* Only required when using [[SummaryQScript]] */ - addSummaryQScript(shiva) - - // From here you can use the output files of shiva as input file of other jobs } } -//TODO: Replace object Name, must be the same as the class of the pipeline object HelloPipeline extends PipelineCommand ``` +Looking at the pipeline, you can see that it inherits from `QScript`. `QScript` is the fundamental class which gives access to the Queue scheduling system. In addition `SummaryQScript` (trait) will add another layer of functions which provides functions to handle and create summary files from pipeline output. +`class HelloPipeline(val root: Configurable`, our pipeline is called HelloPipeline and is taking a `root` with configuration options passed down to Biopet via a JSON specified on the commandline (--config). + +``` + def biopetScript: Unit = { + } +``` + +One can start adding pipeline components in `biopetScript`, this is the programmatically equivalent to the `main` method in most popular programming languages. For example, adding a QC tool to the pipeline like `FastQC`. Look at the example shown above. +Setting up the pipeline is done within the pipeline itself, fine-tuning is always possible by overriding in the following way: + +``` + val fastqc = new Fastqc(this) + fastqc.fastqfile = config("fastqc_input") + fastqc.output = new File(outputDir, "fastqc.txt") + + // change kmers settings to 9, wrap with `Some()` because `fastqc.kmers` is a `Option` value. + fastqc.kmers = Some(9) + + add(fastqc) + +``` ### Config setup +For our new pipeline, one should setup the (default) config options. + +Since our pipeline is called `HelloPipeline`, the root of the configoptions will called `hellopipeline` (lowercaps). + +```json +{ + "output_dir": "/home/user/mypipelineoutpt", + "hellopipeline": { + + } +} + +``` + + ### Test pipeline ### Summary output -### Reporting output (opt) \ No newline at end of file +### Reporting output (optional) \ No newline at end of file diff --git a/docs/general/config.md b/docs/general/config.md index ee4a7ac92aa8fba3b1ce42445e5b47437c139412..b107c1b869c1f6746902543e5a713ddef692ae99 100644 --- a/docs/general/config.md +++ b/docs/general/config.md @@ -8,12 +8,14 @@ The sample config should be in [__JSON__](http://www.json.org/) or [__YAML__](ht - Second field should contain the __"libraries"__ - Third field contains __"R1" or "R2"__ or __"bam"__ - The fastq input files can be provided zipped and unzipped +- `output_dir` is a required setting that should be set either in a `config.json` or specified on the invocation command via -cv output_dir=<path/to/outputdir\>. The default value is to place the pipeline output in the current working directory. #### Example sample config ###### yaml: ``` yaml +output_dir: /home/user/myoutputdir samples: Sample_ID1: libraries: @@ -26,6 +28,7 @@ samples: ``` json { + "output_dir": "/home/user/myoutputdir", "samples":{ "Sample_ID1":{ "libraries":{ diff --git a/docs/pipelines/basty.md b/docs/pipelines/basty.md index 4fc39ef0cb899e5d2736fa1dbac7d19267a38d45..40db3b00ea5f1ae3c55db74e05c78d565ccce43a 100644 --- a/docs/pipelines/basty.md +++ b/docs/pipelines/basty.md @@ -52,7 +52,7 @@ Specific configuration options additional to Basty are: ```json { - output_dir: </path/to/out_directory>, + "output_dir": </path/to/out_directory>, "shiva": { "variantcallers": ["freeBayes"] }, diff --git a/docs/pipelines/flexiprep.md b/docs/pipelines/flexiprep.md index c4fd981c0ff67dd3c13c5cba995e9e9715acbd93..83b2889f3adec27bd506aeb60ef795b9ae80fb9f 100644 --- a/docs/pipelines/flexiprep.md +++ b/docs/pipelines/flexiprep.md @@ -30,7 +30,7 @@ Note that the pipeline also works on unpaired reads where one should only provid To start the pipeline (remove `-run` for a dry run): ``` bash -java -jar Biopet-0.2.0.jar pipeline Flexiprep -run -outDir myDir \ +biopet pipeline Flexiprep -run -outDir myDir \ -R1 myFirstReadPair -R2 mySecondReadPair -sample mySampleName \ -library myLibname -config mySettings.json ``` diff --git a/docs/pipelines/gears.md b/docs/pipelines/gears.md index 317505a3460b7b12462dcec870914f940ee53213..6d8d150a05625fb2a1564ea61974e0d91c6bc099 100644 --- a/docs/pipelines/gears.md +++ b/docs/pipelines/gears.md @@ -1,4 +1,4 @@ -# Flexiprep +# Gears ## Introduction Gears is a metagenomics pipeline. (``GE``nome ``A``nnotation of ``R``esidual ``S``equences). One can use this pipeline to identify contamination in sequencing runs on either raw FastQ files or BAM files. diff --git a/docs/releasenotes/release_notes_0.5.0.md b/docs/releasenotes/release_notes_0.5.0.md index 4d5347a42d782b460355b254c3a63a16cd268dda..5280574ea647d4de71f5cf652d4d54733c6c5052 100644 --- a/docs/releasenotes/release_notes_0.5.0.md +++ b/docs/releasenotes/release_notes_0.5.0.md @@ -1,27 +1,37 @@ # Release notes Biopet version 0.5.0 -* Our QC and mapping pipeline now use piping for the most used aligners and QC tools - * This decreases the disk usage and run time -* Improvements in the reporting framework -* Added metagenomics pipeline: [Gears](../pipelines/gears.md) -* Development envoirment within the LUMC now get tested with Jenkins - * Added integration tests Flexiprep - * Added integration tests Mapping - * Added integration tests Shiva - * Added integration tests Toucan +## General Code changes + +* Upgrade to Queue 3.4, with this also the htsjdk library to 1.132 +* Our `QC` and `Mapping` pipeline now use piping for the most used aligners and QC tools + * Reducing I/O over the network + * Reducing the disk usage (storage) and run time * Added version command for Star +* Seperation of the `biopet`-framework into: `Core`, `Extensions`, `Tools` and `Utils` +* Optimized unit testing +* Unit test coverage on `Tools` increased +* Workaround: Added R-script files of Picard to biopet to fix picard jobs (files are not packaged in maven dependency) +* Added external example for developers + +## Functionality + +* Retries of pipeline and tools is now enabled by default +* Improvements in the reporting framework, allowing custom reporting elements for specific pipelines. +* Fixed reports when metrics of Flexiprep is skipped +* Added metagenomics pipeline: [Gears](../pipelines/gears.md) * Added single sample variantcalling with bcftools -* Splitting the Framework into: Core, Extensions, Tools and Utils -* Fixed reports when Metrics of Flexiprep is skipped -* Upgrade to Queue 3.4, with this also the htsjdk library to 1.132 -* Added key support for GATK jobs -* Optimizing unit testing -* Unit test coverage on Tools increased -* Retry is now default enabled +* Added ET + key support for GATK job invocation, disable phone-home feature when key is supplied * Added more debug information in the `.log` directory when `-l debug` is enabled -* Shiva: added support for GenotypeConcordance tool to check against a Golden Standard -* Workaround: Added Rscript files of picard to biopet to fix picard jobs (files are not packaged in maven dependency) -* Shiva: fixed a lot of small bugs when developing integration tests -* Gentrap: Better error handeling on missing annotation files -* Shiva: Workaround: Fixed a dependency on rerun, with this change there can be 2 bam files in the samples folder -* Added external example for developers +* [Shiva](../pipelines/shiva.md): added support for `GenotypeConcordance` tool to check against a Golden Standard +* [Shiva](../pipelines/shiva.md): fixed a lot of small bugs when developing integration tests +* [Shiva](../pipelines/shiva.md): Workaround: Fixed a dependency on rerun, with this change there can be 2 bam files in the samples folder +* [Gentrap](../pipelines/gentrap.md): Improved error handling on missing annotation files + +## Infrastructure changes + +* Development environment within the LUMC now get tested with Jenkins + * Added integration tests Flexiprep + * Added integration tests Gears + * Added integration tests Mapping + * Added integration tests Shiva + * Added integration tests Toucan diff --git a/external-example/src/main/scala/org/example/group/pipelines/HelloPipeline.scala b/external-example/src/main/scala/org/example/group/pipelines/HelloPipeline.scala index e7df3db171cbba466fe7722f9af5aa288993e162..82838b2b17155cd2c4d4c2890b5414b46f896af5 100644 --- a/external-example/src/main/scala/org/example/group/pipelines/HelloPipeline.scala +++ b/external-example/src/main/scala/org/example/group/pipelines/HelloPipeline.scala @@ -32,7 +32,6 @@ class HelloPipeline(val root: Configurable) extends QScript with SummaryQScript fastqc.output = new File(outputDir, "fastqc.txt") add(fastqc) - // From here you can use the output files of shiva as input file of other jobs } } diff --git a/mkdocs.yml b/mkdocs.yml index e0cdc897d4af04deef4d4b2d5f47ecf3772baa49..aaa89a2535df5e06f5b5443ac73bf8334caee329 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -46,7 +46,8 @@ pages: - Example tool: 'developer/example-tool.md' - Example pipeable: 'developer/example-pipeable.md' - Scala docs: - - 0.4.0: 'developer/code-style.md' + - 0.5.0: '/sasc/scaladocs/v0.5.0.0' + - 0.4.0: '/sasc/scaladocs/v0.4.0.0' #- ['developing/Setup.md', 'Developing', 'Setting up your local development environment'] #theme: readthedocs repo_url: https://github.com/biopet/biopet