Commit 79d45590 authored by npappas's avatar npappas

Initial project commit

\ No newline at end of file
{project name}
* Owner : {owner email}
* Analyst : {sasc analyst email}
* Reviewer : {sasc reviewer email}
This folder contains a list of sub-folders. Each sub-folder is the place holder of one analysis/pipeline run. Pipeline/sample config
files should be stored in the specific run folder to make searching easier. Pipeline logs, summary files (e.g., biopet's JSON summary files)
need to be committed to LUMC gitlab server. Important report files, count tables (that are used for communication with project requesters)
should also be committed. When some text files are too large (e.g., larger than 5MB), we can compress them.
We do not require using git annex to manage big files in this analysis folder and file transfer can be done via rsync (that is in default using MD4 for file integrity check).
The following analyses were done:
1. {analysis name}
{name of person doing the analysis}
\ No newline at end of file
Put raw input files and other large static dependencies (e.g. custom genome reference, annotation files) here.
All big files better to be handled as git annex objects. Please refer to for how to work with git annex.
Folder structure can be either flat or hierarchical depending on the project setup. However, the sample sheet in ../Doc folder should specify explicitly the meaning of each input file.
Put all non-data files given by the user and intake meeting document here, e.g., experiment design document, sample sheets from sequencing center.
**** is a mandatory file that should contains the notes and answers to the standard questions at project intake meeting.
**sample_info.tsv** is a mandatory file that contains sample level information.
1. sample (mandatory), contains sampleID without spaces
2. subject (mandatory), contains subjectID without spaces
3. sex (mandatory), either "male", "female" or "unknown"
4. <groups ...> (at least one group is mandatory), contains group name without spaces. These groups will be used to color in PCA plots.
5. additional_phenotype (optional), contains additional phenotype information in free text form.
**readgroup_info.tsv** is a mandatory file that contains readgroup (one biological library can be sequenced with multiple readgroups) level information.
1. sample (mandatory), contains sampleID without spaces, should be the same as in sample_info.tsv
2. readgroup (mandatory), contains readgroupID without spaces
3. R1 (mandatory), contains file full path to R1 (either in FASTQ or BAM format)
4. R2 (optional), contains file full path to R2 (in FASTQ)
5. R1_md5 (mandatory), contains md5 checksum of R1 file
6. R2_md5 (optional), contains md5 checksum of R2 file
7. library_insert_size (optional), contains the library insert size (i.e., average sequence fragment size)
8. qc_db_tags (optional), contains a list of comma seperated tags that could be imported into SASC QC DB to support meta-analysis.
9. info (optional), contains additional information about the library, e.g., sample prep time.
**family_phenotype_info.ped** is an optional file in PED format ( that contains family level information and additional phenotypes that needed for performing specific tests, e.g., GWAS.
Note: Please do not include spaces in your sample names, IDs. All sample, library names can not start with a numeric value.
Notes SASC project intake meeting
Please fill in more details or N/A if not applicable for the following questions.
### Project setup
* What is your research question?
* Please describe your study design.
* How many samples? Do the subjects have family members can be sequenced as well?
* What type of data and experiment (e.g., DNA, RNA, captured exomes, etc) are you interested in?
* For cancer project, do you have tumor normal pairs? If not, why?
* For RNAseq project, please provide sample sheet (ref to other files in /doc)?
* Is this project part of a larger study? How do you compare with other similar project if any?
* What's your publication plan?
### If you already have generated sequencing data or have decided your sequencing protocol, please answer the following questions:
* What is the species you study? Any specific genomics modification?
* Please describe your sequencing platform (the consistency in platforms), coverage, sequenced region (e.g, fullgenome, exome, targeted)
* For RNAseq, is it a strand-specific library and how was the RNA captured (e.g., polyA or rRNA depletion)?
* Was it single-end, paired-end, mate-pair or a combination?
* What is/are the average insert size(s)?
* Which adapter-ligation protocol was used (TruSeq, Nextera, ...)? Please provide a list of used adaptors if possible.
* Did anyone already perform data analysis on the sequencing data (e.g., by the sequencing center)? If yes, what is the conclusion?
### If you are going to do sequencing, please answer the following questions
* Did you already find a sequencing center (e.g. LGTC, BGI) to do sequencing?
* Do you need advice on sequencing design?
### Expected deliverables:
* What kind of output datasets do you need and in which format?
* What kind of documents do you need? E.g. QC report, description of the pipelines.
* If your goal is to perform differential gene expression analysis, please specify your research question by providing which sample sets should be compared to each other. If samples are paired, please indicate in sample sheet.
SASC internal review form before delivery of result
* Reviewer:
* Date:
- [ ] /src folder clean and committed properly
- [ ] /doc folder clean and committed properly
- [ ] sample config json/yaml correct according to sample sheet (e.g., library_info.tsv)
- [ ] pipeline config json/yaml correct according to project setup (e.g.,
- [ ] no warning of unused pipeline settings in biopet pipeline log
- [ ] no unsolved ERROR in biopet pipeline log
- [ ] FASTQC report looks normal? E.g., check 3 random libraries and make sure there is no ERROR flagged figures.
- [ ] Make sure data can be moved to some Long Term storage system (client should provide). Make this an integral part of the delivery.
- [ ] Optionally, check SPIN.
\ No newline at end of file
sample library R1 R2 R1_md5 R2_md5 library_insert_size qc_db_tags info
sample sex group additional_phenotype
Put all sample, pipeline configuration and all custom scripts here.
In particular, for running data analysis project. (e.g. using biopet pipeline), this is recommended way to organize /src folder: Put one samples.json in /src and multiple config.ymls in sub folders /src/01_runxxx, /src/02_runxxx.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment