config.md 4.95 KB
Newer Older
1
2
3
4
# How to create configs

### The sample config

Peter van 't Hof's avatar
Peter van 't Hof committed
5
The sample config should be in [__JSON__](http://www.json.org/) or [__YAML__](http://yaml.org/) format. For yaml the file should be named *.yml or *.yaml.
6
7
8
9

- First field should have the key __"samples"__
- Second field should contain the __"libraries"__
- Third field contains __"R1" or "R2"__ or __"bam"__
Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
10
- The fastq input files can be provided zipped and unzipped
Peter van 't Hof's avatar
Peter van 't Hof committed
11
- `output_dir` is a required setting that should be set either in a `config.json` or specified on the invocation command via -cv output_dir=<path/to/outputdir\>.
12
13

#### Example sample config
Peter van 't Hof's avatar
Peter van 't Hof committed
14
15
16
17

###### yaml:

``` yaml
18
output_dir: /home/user/myoutputdir
Peter van 't Hof's avatar
Peter van 't Hof committed
19
20
21
22
23
24
25
26
27
28
29
samples:
  Sample_ID1:
    libraries:
      MySeries_1:
        R1: R1.fastq.gz
        R2: R2.fastq.gz
```

###### json:

``` json
30
    {  
31
       "output_dir": "/home/user/myoutputdir",
32
33
34
35
36
37
38
39
40
41
42
       "samples":{  
          "Sample_ID1":{  
             "libraries":{  
                "MySeries_1":{  
                   "R1":"Your_R1.fastq.gz",
                   "R2":"Your_R2.fastq.gz"
                }
             }
          }
       }
    }
Peter van 't Hof's avatar
Peter van 't Hof committed
43
```
44

Peter van 't Hof's avatar
Peter van 't Hof committed
45
For BAM files as input one should use a config like this:
46
  
Peter van 't Hof's avatar
Peter van 't Hof committed
47
48
49
50
51
52
53
54
55
``` yaml
samples:
  Sample_ID_1:
    libraries:  
      Lib_ID_1:
        bam: MyFirst.bam
      Lib_ID_2:
        bam: MySecond.bam
```
56
57
58
59
60
61
62


Note that there is a tool called [SamplesTsvToJson](../tools/SamplesTsvToJson.md) this enables a user to get the sample config without any chance of creating a wrongly formatted JSON file.


### The settings config
The settings config enables a user to alter the settings for almost all settings available in the tools used for a given pipeline.
63
This config file should be written in either JSON or YAML format. It can contain setup settings like:
Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
64

65
66
67
68
69
70
71
72
 * references,
 * cut offs,
 * program modes and memory limits (program specific),
 * Whether chunking should be used
 * set program executables (if for some reason the user does not want to use the systems default tools)
 * One could set global variables containing settings for all tools used in the pipeline or set tool specific options one layer 
 deeper into the JSON file. E.g. in the example below the settings for Picard tools are altered only for Picard and not global. 

73

Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
74
``` json
75
"picard": { "validationstringency": "LENIENT" } 
Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
76
```
77
78
79
80
81
82
83
84
85
86
87

Global setting examples are:
~~~
"java_gc_timelimit": 98,
"numberchunks": 25,
"chunking": true
~~~


----

Sander Bollen's avatar
Sander Bollen committed
88
#### References
Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
89
90
Pipelines and tools that use references should now use the reference module.
This gives a more fine-grained control over references and enables a user to curate the references in a structural way.
Peter van 't Hof's avatar
Typo    
Peter van 't Hof committed
91
E.g. pipelines and tools which use a FASTA references should now set value `"reference_fasta"`.
Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
92
93
Additionally, we can set `"reference_name"` for the name to be used (e.g. `"hg19"`). If unset, Biopet will default to `unknown`.
It is also possible to set the `"species"` flag. Again, we will default to `unknown` if unset.
Peter van 't Hof's avatar
Typo    
Peter van 't Hof committed
94

95
#### Example settings config
Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
96
``` json
97
{
Sander Bollen's avatar
Sander Bollen committed
98
99
100
        "reference_fasta": "/references/hg19_nohap/ucsc.hg19_nohap.fasta",
        "reference_name": "hg19_nohap",
        "species": "homo_sapiens",
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
        "dbsnp": "/data/LGTC/projects/vandoorn-melanoma/data/references/hg19_nohap/dbsnp_137.hg19_nohap.vcf",
        "joint_variantcalling": false,
        "haplotypecaller": { "scattercount": 100 },
        "multisample": { "haplotypecaller": { "scattercount": 1000 } },
        "picard": { "validationstringency": "LENIENT" },
        "library_variantcalling_temp": true,
        "target_bed_temp": "/data/LGTC/projects/vandoorn-melanoma/analysis/target.bed",
        "min_dp": 5,
        "bedtools": {"exe":"/share/isilon/system/local/BEDtools/bedtools-2.17.0/bin/bedtools"},
        "bam_to_fastq": true,
        "baserecalibrator": { "memory_limit": 8, "vmem":"16G" },
        "samtofastq": {"memory_limit": 8, "vmem": "16G"},
        "java_gc_timelimit": 98,
        "numberchunks": 25,
        "chunking": true,
        "haplotypecaller": { "scattercount": 1000 }
}
Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
118
```
119

Mei's avatar
Mei committed
120
121
122
123
124
125
126
127
128
129
# More advanced use of config files.
### 4 levels of configuring settings
In biopet, a value for a ConfigNamespace (e.g., "reference_fasta") for a tool or a pipeline can be defined in 4 different levels.
 * Level-1: As a fixed value hardcoded in biopet source code.
 * Level-2: As a user specified value in the config file (provided as an argument when running biopet command)
 * Level-3: As a system specified value in the global config file ()
 * Level-4: As a default value provided in biopet source code.

During execution, biopet framework will resolve the value for each ConfigNamespace following the order from level-1 to level-4. Hence, a value defined in the a higher level will overwrite the one define in a lower value.

130
131
### JSON validation

Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
132
133
To check if the created JSON file is correct their are several possibilities: the simplest way is using [this](http://jsonformatter.curiousconcept.com/)
website. It is also possible to use Python, Scala or any other programming languages for validating JSON files but this requires some more knowledge.