hutspot issueshttps://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues2018-07-09T12:58:17+02:00https://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/17Gather step may fail if too many chunks2018-07-09T12:58:17+02:00Sander BollenGather step may fail if too many chunksThe gather steps may fail when there are too many chunks, with `Argument list too long`. This e.g. happens with the full GRCh38 assembly (all alt contigs etc). While not documented, it appears CatVariants also [accepts a .list file](http...The gather steps may fail when there are too many chunks, with `Argument list too long`. This e.g. happens with the full GRCh38 assembly (all alt contigs etc). While not documented, it appears CatVariants also [accepts a .list file](https://github.com/broadgsa/gatk-protected/blob/master/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/CatVariants.java#L203) which may contain files names to concatenate.Sander BollenSander Bollenhttps://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/18More sensible name field for covstats2018-02-21T10:55:21+01:00Sander BollenMore sensible name field for covstatsThe `name` field in the covstats metrics is just the name of the json file. This should be a proper name. The path can then be in another field.The `name` field in the covstats metrics is just the name of the json file. This should be a proper name. The path can then be in another field.Sander BollenSander Bollenhttps://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/27Add peddy as qc step2018-05-08T13:58:34+02:00Sander BollenAdd peddy as qc stephttp://peddy.readthedocs.io/en/latest/output.html#outputhttp://peddy.readthedocs.io/en/latest/output.html#outputhttps://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/28GATK42018-06-27T16:56:47+02:00Sander BollenGATK4GATK4 _should_ be faster than GATK3, but it requires some rewriting of the genotyping:
* `GenotypeGVCFs` no longer accepts multiple VCF files
* GVCF files must be "gathered" with `GenomicsDBImport`, which only works on a single interval...GATK4 _should_ be faster than GATK3, but it requires some rewriting of the genotyping:
* `GenotypeGVCFs` no longer accepts multiple VCF files
* GVCF files must be "gathered" with `GenomicsDBImport`, which only works on a single interval per database
* then run `GenotypeGVCFs` on the dbs
* `catvariants` has been renamed to `GatherVCFs`. GATK is _NOT_ compatible with bcftools here.https://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/36split_genome can lead to tiny regions to call2019-09-24T08:41:26+02:00van den Bergsplit_genome can lead to tiny regions to callTo speed up variant calling, the entire genome is split into chunks (100 by default), and variants are called concurrently in all regions. This speeds up the analysis for single samples.
There are various drawbacks to this approach
1. ...To speed up variant calling, the entire genome is split into chunks (100 by default), and variants are called concurrently in all regions. This speeds up the analysis for single samples.
There are various drawbacks to this approach
1. For KG, we typically analyse a batch of samples, which means the speedup from this is quite small, while it adds a lot of overhead by submitting these tasks to the cluster.
2. In fact, split_genome does not generate 100 chunks to call variants on, but almost 200. The reason for this is the fact that there are a bunch of small contigs in the reference, which each get assigned to their own chunk. This is likely to be much worse for GRCh38, which has a lot more small contigs.
3. There is no check for weird edge cases, for example when a regions is very small because it is at the end of a chromosome. It is unclear what the behaviour of GAKT is when it is executed on a region of lets say < 10 bp.GRCh38https://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/44Update gvcf2coverage to 0.22020-07-22T13:38:41+02:00van den BergUpdate gvcf2coverage to 0.2Hutspot currently uses a non-released version of gvcf2coverage that uses MIN_DP by default. Once this functionality is released properly, hutspot should switch to this version.
See https://quay.io/repository/biocontainers/gvcf2coverage?...Hutspot currently uses a non-released version of gvcf2coverage that uses MIN_DP by default. Once this functionality is released properly, hutspot should switch to this version.
See https://quay.io/repository/biocontainers/gvcf2coverage?tab=tags