Track source for reference files
Track source for reference files
Previously, the original source for a reference file was implicit:
- If accession number starts with
LRG_
, it came from the LRG FTP archive. - If a download URL is known, it was downloaded from there.
- If slice data is known, it was sliced from the NCBI.
- If a GI number is known, it was downloaded from the NCBI.
- Otherwise, it was uploaded.
In preparation for the removal of GI numbers (#349 (closed)), this had to be
revisited. We now store the source explicitely in a new source
field
on the Reference
model. If additional information is needed to
re-fetch the file from this source (e.g., download URL), this is stored
in a new source_data
field (always serialized as a string). This
scheme should be both more explicit and more generic.
Subtasks:
-
Add source
andsource_data
columns. -
Populate columns in migration. -
Load some example data for migration tests. -
Use the columns in the retriever, remove use of old columns. -
Use the columns in cache sync, remove use of old columns. -
Check use of old columns elsewhere. -
Follow-up: remove slice_*
anddownload_url
columns and makesource
NOT NULL. #388 (closed) #389 (closed)