Skip to content
Snippets Groups Projects
Commit 3a90ba40 authored by Vermaat's avatar Vermaat
Browse files

Add ALT_REF_LOCI contigs to GRCh38/hg38 assembly

Using fetchChromSizes [1] and selecting *Download the full sequence report*
from the NCBI assembly overview [2] we can generate a mapping from UCSC
chromosome names to accession numbers:

    ./fetchChromSizes hg38 > human.hg38.genome
    for contig in $(cut -f 1 human.hg38.genome | grep 'alt$'); do
        code=$(echo $contig | cut -d _ -f 2 | sed 's/v/./')
        echo -n $contig$'\t'
        grep $code GCF_000001405.26.assembly.txt | cut -f 7
    done > alt_chrom_names.mapping

Generate the JSON dictionary entries:

    >>> import json
    >>> entries = []
    >>> for line in open('alt_chrom_names.mapping'):
    ...     chr, acc = line.strip().split()
    ...     entries.append({'organelle': 'nucleus',
    ...                     'name': chr,
    ...                     'accession': acc})
    ...
    >>> print json.dumps(entries, indent=2)
    [
      {
        "organelle": "nucleus",
        "name": "chr12_KI270837v1_alt",
        "accession": "NT_187588.1"
      },
      {
        "organelle": "nucleus",
        "name": "chr13_KI270842v1_alt",
        "accession": "NT_187596.1"
      },
      ...
    ]

[1] http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/fetchChromSizes
[2] ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/GCF_000001405.26.assembly.txt
parent 2cc108a8
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment