|
|
# Use git-annex to transfer large data files
|
|
|
Big data files in git are handled by git annex. Thus those big files themselves are not commited to the git-lab webserver or other git repositories unless we use git annex commands to explicitly do so.
|
|
|
|
|
|
When a project is closed and we want to transfer files to the project requester, we should use the git annex commands described here for a largely automated procedure.
|
|
|
|
|
|
Most of time for a project requester, it is perferred to access files directly instead of symlinks on its USB hard disk or their SHARK folder. We need to enable [the direct mode](http://git-annex.branchable.com/direct_mode/). When the target file system of the requester is a Windows system or a USB disk in FAT, direct mode is used by default.
|
|
|
|
|
|
## Sample procedure
|
|
|
|
|
|
1. Init a git repository on the target file system, such as a USB harddisk
|
|
|
```
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF$ mkdir annex
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF$ cd annex/
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git init
|
|
|
Initialized empty Git repository in /media/hmei/3E4A-01AF/annex/.git/
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git annex init "usb"
|
|
|
init usb
|
|
|
Detected a filesystem without fifo support.
|
|
|
|
|
|
Disabling ssh connection caching.
|
|
|
|
|
|
Detected a crippled filesystem.
|
|
|
|
|
|
Enabling direct mode.
|
|
|
ok
|
|
|
(Recording state in git...)
|
|
|
```
|
|
|
|
|
|
2. Use the project folder as remote repository and sync it.
|
|
|
```
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git remote add laptop ~/git/source_test_annex/
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git annex sync laptop
|
|
|
commit ok
|
|
|
pull laptop
|
|
|
warning: no common commits
|
|
|
remote: Counting objects: 99, done.
|
|
|
remote: Compressing objects: 100% (77/77), done.
|
|
|
remote: Total 99 (delta 26), reused 0 (delta 0)
|
|
|
Unpacking objects: 100% (99/99), done.
|
|
|
From /home/hmei/git/source_test_annex
|
|
|
* [new branch] git-annex -> laptop/git-annex
|
|
|
* [new branch] master -> laptop/master
|
|
|
* [new branch] synced/git-annex -> laptop/synced/git-annex
|
|
|
* [new branch] synced/master -> laptop/synced/master
|
|
|
|
|
|
Merge made by the 'recursive' strategy.
|
|
|
bam/small | 1 +
|
|
|
cool_big.bam | 1 +
|
|
|
2 files changed, 2 insertions(+)
|
|
|
create mode 120000 bam/small
|
|
|
create mode 120000 cool_big.bam
|
|
|
ok
|
|
|
(merging laptop/git-annex laptop/synced/git-annex into git-annex...)
|
|
|
(Recording state in git...)
|
|
|
push laptop
|
|
|
Counting objects: 15, done.
|
|
|
Delta compression using up to 4 threads.
|
|
|
Compressing objects: 100% (8/8), done.
|
|
|
Writing objects: 100% (10/10), 1.00 KiB | 0 bytes/s, done.
|
|
|
Total 10 (delta 4), reused 0 (delta 0)
|
|
|
To /home/hmei/git/source_test_annex/
|
|
|
2d6a1b1..dc055d8 git-annex -> synced/git-annex
|
|
|
b9e6825..3d262e2 annex/direct/master -> synced/master
|
|
|
ok
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ ls -trl
|
|
|
total 8
|
|
|
-rw-r--r-- 1 hmei hmei 202 mrt 26 21:18 cool_big.bam
|
|
|
drwx------ 2 hmei hmei 4096 mrt 26 21:18 bam
|
|
|
```
|
|
|
|
|
|
3. Since the USB disk uses FAT, by default git-annex is using direct mode. Symlinks are stored as file content.
|
|
|
```
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ cat cool_big.bam
|
|
|
.git/annex/objects/Mm/7V/SHA256E-s414416896--8480c5bf05b9e57663e231f6ce373ff17cf3f771f5ba2b6f63700b7287a6f2b4.bam/SHA256E-s414416896--8480c5bf05b9e57663e231f6ce373ff17cf3f771f5ba2b6f63700b7287a6f2b4.bamhmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$
|
|
|
```
|
|
|
|
|
|
4. Now we can get the actual files
|
|
|
```
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git annex get .
|
|
|
get bam/small (from laptop...)
|
|
|
SHA256E-s7--b08abd37753c0a39a3764a4d1bc8b2ce192751de961160140e5c5a66e2d7afb8
|
|
|
7 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/1)
|
|
|
ok
|
|
|
get cool_big.bam (from laptop...)
|
|
|
SHA256E-s414416896--8480c5bf05b9e57663e231f6ce373ff17cf3f771f5ba2b6f63700b7287a6f2b4.bam
|
|
|
414,416,896 100% 226.47MB/s 0:00:01 (xfr#1, to-chk=0/1)
|
|
|
ok
|
|
|
(Recording state in git...)
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ ls -trl
|
|
|
total 404708
|
|
|
drwx------ 2 hmei hmei 4096 mrt 26 21:18 bam
|
|
|
-rw-r--r-- 1 hmei hmei 414416896 mrt 26 21:18 cool_big.bam
|
|
|
```
|
|
|
|
|
|
5. Don't forget to check the file integrity of the entire repository.
|
|
|
```
|
|
|
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git annex fsck
|
|
|
fsck bam/small (checksum...)
|
|
|
ok
|
|
|
fsck cool_big.bam (checksum...)
|
|
|
ok
|
|
|
``` |
|
|
\ No newline at end of file |