capture-lumc issueshttps://git.lumc.nl/groups/klinische-genetica/capture-lumc/-/issues2018-07-09T12:58:17+02:00https://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/17Gather step may fail if too many chunks2018-07-09T12:58:17+02:00Sander BollenGather step may fail if too many chunksThe gather steps may fail when there are too many chunks, with `Argument list too long`. This e.g. happens with the full GRCh38 assembly (all alt contigs etc). While not documented, it appears CatVariants also [accepts a .list file](http...The gather steps may fail when there are too many chunks, with `Argument list too long`. This e.g. happens with the full GRCh38 assembly (all alt contigs etc). While not documented, it appears CatVariants also [accepts a .list file](https://github.com/broadgsa/gatk-protected/blob/master/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/CatVariants.java#L203) which may contain files names to concatenate.Sander BollenSander Bollenhttps://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/18More sensible name field for covstats2018-02-21T10:55:21+01:00Sander BollenMore sensible name field for covstatsThe `name` field in the covstats metrics is just the name of the json file. This should be a proper name. The path can then be in another field.The `name` field in the covstats metrics is just the name of the json file. This should be a proper name. The path can then be in another field.Sander BollenSander Bollenhttps://git.lumc.nl/klinische-genetica/capture-lumc/vtools/-/issues/1Properties should be named as nouns2019-08-26T10:07:36+02:00Sander BollenProperties should be named as nouns`vtools.stats.Stats` names a property `as_dict`. This should be simply `dict``vtools.stats.Stats` names a property `as_dict`. This should be simply `dict`Sander BollenSander Bollenhttps://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/27Add peddy as qc step2018-05-08T13:58:34+02:00Sander BollenAdd peddy as qc stephttp://peddy.readthedocs.io/en/latest/output.html#outputhttp://peddy.readthedocs.io/en/latest/output.html#outputhttps://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/28GATK42018-06-27T16:56:47+02:00Sander BollenGATK4GATK4 _should_ be faster than GATK3, but it requires some rewriting of the genotyping:
* `GenotypeGVCFs` no longer accepts multiple VCF files
* GVCF files must be "gathered" with `GenomicsDBImport`, which only works on a single interval...GATK4 _should_ be faster than GATK3, but it requires some rewriting of the genotyping:
* `GenotypeGVCFs` no longer accepts multiple VCF files
* GVCF files must be "gathered" with `GenomicsDBImport`, which only works on a single interval per database
* then run `GenotypeGVCFs` on the dbs
* `catvariants` has been renamed to `GatherVCFs`. GATK is _NOT_ compatible with bcftools here.https://git.lumc.nl/klinische-genetica/capture-lumc/vtools/-/issues/2gcoverage with Bed option in stead of only refflat2019-03-19T10:25:36+01:00Sander Bollengcoverage with Bed option in stead of only refflatSander BollenSander Bollenhttps://git.lumc.nl/klinische-genetica/capture-lumc/vtools/-/issues/4Add optional gvcf file to vtools-evaluate2019-08-26T18:12:18+02:00van den BergAdd optional gvcf file to vtools-evaluateCurrently, `vtools-evaluate` uses compares two VCF files. However, the typical use case is to compare a set of known variants (for example from an OpenArray) to the variants found using NGS sequencing. In this context, the known variants...Currently, `vtools-evaluate` uses compares two VCF files. However, the typical use case is to compare a set of known variants (for example from an OpenArray) to the variants found using NGS sequencing. In this context, the known variants can contain sites that are either:
1. called as homref in the g.vcf file
2. are missing completely from the NGS data
These two cases should not be treated the same (as 'missing data'). When a g.vcf file is specified, the homref calls in this file can be matched against the calls in the OpenArray file.https://git.lumc.nl/klinische-genetica/capture-lumc/vtools/-/issues/5alleles_no_call is always 02019-10-22T09:58:41+02:00van den Bergalleles_no_call is always 0vtools-evaluate only looks at calls that are present in both vcf files. Because of this, the alleles_no_call field in the output is always 0. If a record is present in the 'positive' vcf of known calls but missing from the 'called' vcf f...vtools-evaluate only looks at calls that are present in both vcf files. Because of this, the alleles_no_call field in the output is always 0. If a record is present in the 'positive' vcf of known calls but missing from the 'called' vcf file, the alleles_no_call value should be incrementedhttps://git.lumc.nl/klinische-genetica/capture-lumc/vtools/-/issues/6vtools has no tests2019-10-22T09:58:42+02:00van den Bergvtools has no testsvtools is an important part of the quality control in clinical pipelines, as it is used to compare Array data with the called genotypes. However, there are currently no automated tests to verify that vtools performs as expected.vtools is an important part of the quality control in clinical pipelines, as it is used to compare Array data with the called genotypes. However, there are currently no automated tests to verify that vtools performs as expected.van den Bergvan den Berghttps://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/36split_genome can lead to tiny regions to call2019-09-24T08:41:26+02:00van den Bergsplit_genome can lead to tiny regions to callTo speed up variant calling, the entire genome is split into chunks (100 by default), and variants are called concurrently in all regions. This speeds up the analysis for single samples.
There are various drawbacks to this approach
1. ...To speed up variant calling, the entire genome is split into chunks (100 by default), and variants are called concurrently in all regions. This speeds up the analysis for single samples.
There are various drawbacks to this approach
1. For KG, we typically analyse a batch of samples, which means the speedup from this is quite small, while it adds a lot of overhead by submitting these tasks to the cluster.
2. In fact, split_genome does not generate 100 chunks to call variants on, but almost 200. The reason for this is the fact that there are a bunch of small contigs in the reference, which each get assigned to their own chunk. This is likely to be much worse for GRCh38, which has a lot more small contigs.
3. There is no check for weird edge cases, for example when a regions is very small because it is at the end of a chromosome. It is unclear what the behaviour of GAKT is when it is executed on a region of lets say < 10 bp.GRCh38https://git.lumc.nl/klinische-genetica/capture-lumc/vtools/-/issues/7Add --version command option to vtools2019-10-08T10:56:07+02:00van den BergAdd --version command option to vtoolsFor reproducibility it is important that vtools can print its own version number.For reproducibility it is important that vtools can print its own version number.van den Bergvan den Berghttps://git.lumc.nl/klinische-genetica/capture-lumc/vtools/-/issues/9vtools-filter has assumptions on chromosome names2019-11-25T12:37:52+01:00van den Bergvtools-filter has assumptions on chromosome namesvtools-filter assumes that canonical chromosomes are called `[1, 2, 3]` or `[chr1, chr2, chr3]`, and uses these contig names when deciding whether or not to filter certain variants. This is most likely not compatible with GRCh38, where t...vtools-filter assumes that canonical chromosomes are called `[1, 2, 3]` or `[chr1, chr2, chr3]`, and uses these contig names when deciding whether or not to filter certain variants. This is most likely not compatible with GRCh38, where the accession number and versions of the chromosomes are used.GRCh38https://git.lumc.nl/klinische-genetica/capture-lumc/hutspot/-/issues/44Update gvcf2coverage to 0.22020-07-22T13:38:41+02:00van den BergUpdate gvcf2coverage to 0.2Hutspot currently uses a non-released version of gvcf2coverage that uses MIN_DP by default. Once this functionality is released properly, hutspot should switch to this version.
See https://quay.io/repository/biocontainers/gvcf2coverage?...Hutspot currently uses a non-released version of gvcf2coverage that uses MIN_DP by default. Once this functionality is released properly, hutspot should switch to this version.
See https://quay.io/repository/biocontainers/gvcf2coverage?tab=tags