- 07 Aug, 2020 1 commit
-
-
van den Berg authored
Use 8 cores instead of just 1, according to the documentation speed should almost scale linear with the cores provided. Also reduce the compression level on the output file, since most time for cutadapt is spent re-compressing the data after trimming the adapters.
-
- 29 Jul, 2020 2 commits
-
-
van den Berg authored
-
van den Berg authored
-
- 28 Jul, 2020 5 commits
-
-
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
- 24 Jul, 2020 4 commits
-
-
van den Berg authored
-
van den Berg authored
Revert "Remove explicit tmp folder" See merge request !17
-
van den Berg authored
This reverts commit 1b7d807f
-
van den Berg authored
-
- 22 Jul, 2020 5 commits
-
-
van den Berg authored
This should no longer be needed on the slurm cluster, where each task can request the amount of tmp space it requires explicitly. Removing the tmp from the shared filesystem back onto the host running the analysis should also improve the performance of the pipeline.
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
This way, the same version of picard is used in all tasks in the pipeline. This fixes issue #38
-
van den Berg authored
-
- 26 Jun, 2020 2 commits
-
-
van den Berg authored
Add optional bed coverage output files See merge request !16
-
van den Berg authored
-
- 25 Jun, 2020 2 commits
-
-
van den Berg authored
-
van den Berg authored
-
- 24 Jun, 2020 3 commits
-
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
- 23 Jun, 2020 1 commit
-
-
van den Berg authored
Unfortunately, this adds a limitation on the sample names that can be used with Hutspot, since the naming of the samples in the multiQC parsed output of picard MarkDuplicates is partly ambiguous. This limitation has been added to the readme, and a check has been added to the pipeline snakefile to throw an error when overlapping sample names are detected.
-
- 22 Jun, 2020 1 commit
-
-
van den Berg authored
Previously, Hutspot supported multiple bed files to calculate coverage against. Because of this, the stats.json file had a nested structured where the coverage based on each bed file was stored, including the name of the bed file and the gender, according to that bed file. Since the current version of Hutspot only supports a single bed file, this structure has been simplified. All coverage statistics are now directly under 'coverage' for each sample, and the 'gender' has been moved out of the 'coverage' statistics to the sample level.
-
- 17 Jun, 2020 2 commits
-
-
van den Berg authored
-
van den Berg authored
-
- 03 Jun, 2020 5 commits
-
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
-
van den Berg authored
Hutspot now supports two bed files to calculate coverage. One is the `targetsfile`, which was called `bedfile` before, and holds the targets of the capture kit. The other one is `baitsfile`, which holds the bait locations of the capture kit. It is possible to specify only the `targetsfile`, but if you specify the `baitsfile`, the `targetsfile` must be specified as well, since both are required by picard HsMetrics. Also added a test for invalid configuration files, and shortened the jsonschema validation error to only show the human readable message.
-
- 02 Jun, 2020 7 commits
-
-
van den Berg authored
Pass 'no file' to collect_stats.py by using an empty list in the Snakefile and nargs='?' in the python script. This is cleaner than using '.' as a special file and parsing that logic in the collect stats script.
-
van den Berg authored
-
van den Berg authored
If both a target and bait bedfiles have been specified, calculate the hybrid-selection (HS) statistics using picard.
-
van den Berg authored
Multiple (pytest) processes trying to write the same image to /tmp/singularity can lead to corruption, leading to intermittent failures in the gitlab-ci tests. By specifying a singularity prefix in the snakemake profile, the same images can be re-used, so we only have to worry about concurrent processes writing the same image when a new image is added to the pipeline.
-
van den Berg authored
This also required reordering some snakemake rules to make sure that the correct input files are available. When using rule based inputs, the rules in the Snakefile have to be sorted, and only rule based inputs from rules that occur earlier in the Snakefile can be used.
-
van den Berg authored
The implementation is a bit hacky, since snakemake does not allow for optional input files. As a workaround, "." is passed when the bedfile is not defined, and the collect_stats.py script has been made aware of the special meaning of "." Additionally, Click has been removed as a dependency for collect stats, and the structure of the stats.json file has been updated to only allow for a single entry of coverage stats instead of a list. This has been done to match an earlier change in Hutspot where support for multiple bed files has been dropped.
-
van den Berg authored
-