-
- Downloads
Add ALT_REF_LOCI contigs to GRCh38/hg38 assembly
Using fetchChromSizes [1] and selecting *Download the full sequence report* from the NCBI assembly overview [2] we can generate a mapping from UCSC chromosome names to accession numbers: ./fetchChromSizes hg38 > human.hg38.genome for contig in $(cut -f 1 human.hg38.genome | grep 'alt$'); do code=$(echo $contig | cut -d _ -f 2 | sed 's/v/./') echo -n $contig$'\t' grep $code GCF_000001405.26.assembly.txt | cut -f 7 done > alt_chrom_names.mapping Generate the JSON dictionary entries: >>> import json >>> entries = [] >>> for line in open('alt_chrom_names.mapping'): ... chr, acc = line.strip().split() ... entries.append({'organelle': 'nucleus', ... 'name': chr, ... 'accession': acc}) ... >>> print json.dumps(entries, indent=2) [ { "organelle": "nucleus", "name": "chr12_KI270837v1_alt", "accession": "NT_187588.1" }, { "organelle": "nucleus", "name": "chr13_KI270842v1_alt", "accession": "NT_187596.1" }, ... ] [1] http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/fetchChromSizes [2] ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/GCF_000001405.26.assembly.txt
Loading
Please register or sign in to comment