h.mei created page: home authored by Mei's avatar Mei
# Use git-annex to transfer large data files
Big data files in git are handled by git annex. Thus those big files themselves are not commited to the git-lab webserver or other git repositories unless we use git annex commands to explicitly do so.
When a project is closed and we want to transfer files to the project requester, we should use the git annex commands described here for a largely automated procedure.
Most of time for a project requester, it is perferred to access files directly instead of symlinks on its USB hard disk or their SHARK folder. We need to enable [the direct mode](http://git-annex.branchable.com/direct_mode/). When the target file system of the requester is a Windows system or a USB disk in FAT, direct mode is used by default.
## Sample procedure
1. Init a git repository on the target file system, such as a USB harddisk
```
hmei@Leon-LUMC:/media/hmei/3E4A-01AF$ mkdir annex
hmei@Leon-LUMC:/media/hmei/3E4A-01AF$ cd annex/
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git init
Initialized empty Git repository in /media/hmei/3E4A-01AF/annex/.git/
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git annex init "usb"
init usb
Detected a filesystem without fifo support.
Disabling ssh connection caching.
Detected a crippled filesystem.
Enabling direct mode.
ok
(Recording state in git...)
```
2. Use the project folder as remote repository and sync it.
```
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git remote add laptop ~/git/source_test_annex/
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git annex sync laptop
commit ok
pull laptop
warning: no common commits
remote: Counting objects: 99, done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 99 (delta 26), reused 0 (delta 0)
Unpacking objects: 100% (99/99), done.
From /home/hmei/git/source_test_annex
* [new branch] git-annex -> laptop/git-annex
* [new branch] master -> laptop/master
* [new branch] synced/git-annex -> laptop/synced/git-annex
* [new branch] synced/master -> laptop/synced/master
Merge made by the 'recursive' strategy.
bam/small | 1 +
cool_big.bam | 1 +
2 files changed, 2 insertions(+)
create mode 120000 bam/small
create mode 120000 cool_big.bam
ok
(merging laptop/git-annex laptop/synced/git-annex into git-annex...)
(Recording state in git...)
push laptop
Counting objects: 15, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (10/10), 1.00 KiB | 0 bytes/s, done.
Total 10 (delta 4), reused 0 (delta 0)
To /home/hmei/git/source_test_annex/
2d6a1b1..dc055d8 git-annex -> synced/git-annex
b9e6825..3d262e2 annex/direct/master -> synced/master
ok
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ ls -trl
total 8
-rw-r--r-- 1 hmei hmei 202 mrt 26 21:18 cool_big.bam
drwx------ 2 hmei hmei 4096 mrt 26 21:18 bam
```
3. Since the USB disk uses FAT, by default git-annex is using direct mode. Symlinks are stored as file content.
```
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ cat cool_big.bam
.git/annex/objects/Mm/7V/SHA256E-s414416896--8480c5bf05b9e57663e231f6ce373ff17cf3f771f5ba2b6f63700b7287a6f2b4.bam/SHA256E-s414416896--8480c5bf05b9e57663e231f6ce373ff17cf3f771f5ba2b6f63700b7287a6f2b4.bamhmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$
```
4. Now we can get the actual files
```
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git annex get .
get bam/small (from laptop...)
SHA256E-s7--b08abd37753c0a39a3764a4d1bc8b2ce192751de961160140e5c5a66e2d7afb8
7 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/1)
ok
get cool_big.bam (from laptop...)
SHA256E-s414416896--8480c5bf05b9e57663e231f6ce373ff17cf3f771f5ba2b6f63700b7287a6f2b4.bam
414,416,896 100% 226.47MB/s 0:00:01 (xfr#1, to-chk=0/1)
ok
(Recording state in git...)
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ ls -trl
total 404708
drwx------ 2 hmei hmei 4096 mrt 26 21:18 bam
-rw-r--r-- 1 hmei hmei 414416896 mrt 26 21:18 cool_big.bam
```
5. Don't forget to check the file integrity of the entire repository.
```
hmei@Leon-LUMC:/media/hmei/3E4A-01AF/annex$ git annex fsck
fsck bam/small (checksum...)
ok
fsck cool_big.bam (checksum...)
ok
```
\ No newline at end of file