README.md 2.01 KB
Newer Older
npappas's avatar
npappas committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Docs
====

Put all non-data files given by the user and intake meeting document here, e.g., experiment design document, sample sheets from sequencing center.

**intake_meeting_notes.md** is a mandatory file that should contains the notes and answers to the standard questions at project intake meeting.

**sample_info.tsv** is a mandatory file that contains sample level information. 
1. sample (mandatory), contains sampleID without spaces
2. subject (mandatory), contains subjectID without spaces
3. sex (mandatory), either "male", "female" or "unknown"
4. <groups ...> (at least one group is mandatory), contains group name without spaces. These groups will be used to color in PCA plots.
5. additional_phenotype (optional), contains additional phenotype information in free text form.

**readgroup_info.tsv** is a mandatory file that contains readgroup (one biological library can be sequenced with multiple readgroups) level information. 
1. sample (mandatory), contains sampleID without spaces, should be the same as in sample_info.tsv
2. readgroup (mandatory), contains readgroupID without spaces
3. R1 (mandatory), contains file full path to R1 (either in FASTQ or BAM format) 
4. R2 (optional), contains file full path to R2 (in FASTQ)
5. R1_md5 (mandatory), contains md5 checksum of R1 file
6. R2_md5 (optional), contains md5 checksum of R2 file
7. library_insert_size (optional), contains the library insert size (i.e., average sequence fragment size)
8. qc_db_tags (optional), contains a list of comma seperated tags that could be imported into SASC QC DB to support meta-analysis.
9. info (optional), contains additional information about the library, e.g., sample prep time.

**family_phenotype_info.ped** is an optional file in PED format (http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped) that contains family level information and additional phenotypes that needed for performing specific tests, e.g., GWAS. 

Note: Please do not include spaces in your sample names, IDs. All sample, library names can not start with a numeric value.