config.md 3.98 KB
Newer Older
1
2
3
4
# How to create configs

### The sample config

Peter van 't Hof's avatar
Peter van 't Hof committed
5
The sample config should be in [__JSON__](http://www.json.org/) or [__YAML__](http://yaml.org/) format. For yaml the file should be named *.yml or *.yaml.
6
7
8
9

- First field should have the key __"samples"__
- Second field should contain the __"libraries"__
- Third field contains __"R1" or "R2"__ or __"bam"__
Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
10
- The fastq input files can be provided zipped and unzipped
11
12

#### Example sample config
Peter van 't Hof's avatar
Peter van 't Hof committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

###### yaml:

``` yaml
samples:
  Sample_ID1:
    libraries:
      MySeries_1:
        R1: R1.fastq.gz
        R2: R2.fastq.gz
```

###### json:

``` json
28
29
30
31
32
33
34
35
36
37
38
39
    {  
       "samples":{  
          "Sample_ID1":{  
             "libraries":{  
                "MySeries_1":{  
                   "R1":"Your_R1.fastq.gz",
                   "R2":"Your_R2.fastq.gz"
                }
             }
          }
       }
    }
Peter van 't Hof's avatar
Peter van 't Hof committed
40
```
41

Peter van 't Hof's avatar
Peter van 't Hof committed
42
For BAM files as input one should use a config like this:
43
  
Peter van 't Hof's avatar
Peter van 't Hof committed
44
45
46
47
48
49
50
51
52
``` yaml
samples:
  Sample_ID_1:
    libraries:  
      Lib_ID_1:
        bam: MyFirst.bam
      Lib_ID_2:
        bam: MySecond.bam
```
53
54
55
56
57
58
59


Note that there is a tool called [SamplesTsvToJson](../tools/SamplesTsvToJson.md) this enables a user to get the sample config without any chance of creating a wrongly formatted JSON file.


### The settings config
The settings config enables a user to alter the settings for almost all settings available in the tools used for a given pipeline.
Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
60
61
This config file should be written in JSON format.
It can contain setup settings like references, cut offs, program modes, memory limits (program specific), if chunking should be used and many more, one can even set program executables here, if for some reason the user does not want to use the systems default tools.
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
One could set global variables containing settings for all tools used in the pipeline or set tool specific options one layer deeper into the JSON file.
E.g. in the example below the settings for Picard tools are altered only for Picard and not global. 

~~~
"picard": { "validationstringency": "LENIENT" } 
~~~

Global setting examples are:
~~~
"java_gc_timelimit": 98,
"numberchunks": 25,
"chunking": true
~~~


----

Sander Bollen's avatar
Sander Bollen committed
79
#### References
Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
80
81
82
83
84
Pipelines and tools that use references should now use the reference module.
This gives a more fine-grained control over references and enables a user to curate the references in a structural way.
E.g. pipelines and tools which uses FASTA references should now set value `"reference_fasta"`.
Additionally, we can set `"reference_name"` for the name to be used (e.g. `"hg19"`). If unset, Biopet will default to `unknown`.
It is also possible to set the `"species"` flag. Again, we will default to `unknown` if unset.
85
86
87
#### Example settings config
~~~
{
Sander Bollen's avatar
Sander Bollen committed
88
89
90
        "reference_fasta": "/references/hg19_nohap/ucsc.hg19_nohap.fasta",
        "reference_name": "hg19_nohap",
        "species": "homo_sapiens",
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
        "dbsnp": "/data/LGTC/projects/vandoorn-melanoma/data/references/hg19_nohap/dbsnp_137.hg19_nohap.vcf",
        "joint_variantcalling": false,
        "haplotypecaller": { "scattercount": 100 },
        "multisample": { "haplotypecaller": { "scattercount": 1000 } },
        "picard": { "validationstringency": "LENIENT" },
        "library_variantcalling_temp": true,
        "target_bed_temp": "/data/LGTC/projects/vandoorn-melanoma/analysis/target.bed",
        "min_dp": 5,
        "bedtools": {"exe":"/share/isilon/system/local/BEDtools/bedtools-2.17.0/bin/bedtools"},
        "bam_to_fastq": true,
        "baserecalibrator": { "memory_limit": 8, "vmem":"16G" },
        "samtofastq": {"memory_limit": 8, "vmem": "16G"},
        "java_gc_timelimit": 98,
        "numberchunks": 25,
        "chunking": true,
        "haplotypecaller": { "scattercount": 1000 }
}
~~~

### JSON validation

Sander van der Zeeuw's avatar
Sander van der Zeeuw committed
112
113
To check if the created JSON file is correct their are several possibilities: the simplest way is using [this](http://jsonformatter.curiousconcept.com/)
website. It is also possible to use Python, Scala or any other programming languages for validating JSON files but this requires some more knowledge.