Commit e3ae79eb authored by sajvanderzeeuw's avatar sajvanderzeeuw

changes in docs format

parent b1deafba
# BiopetFlagstat
## Introduction
This tool has been created to extract all the metrics from a required bam file.
It captures for example the # of mapped reads, # of duplicates, # of mates unmapped, # of reads with a certain mapping quality etc. etc.
## Example
To get the help menu:
~~~
java -jar Biopet-0.2.0.jar tool BiopetFlagstat -h
Usage: BiopetFlagstat [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-I <file> | --inputFile <file>
out is a required file property
-r <chr:start-stop> | --region <chr:start-stop>
out is a required file property
~~~
To run the tool:
~~~
java -jar Biopet-0.2.0.jar tool BiopetFlagstat -I myBAM.bam
~~~
### Output
|Number |Total Flags| Fraction| Name|
|------ | -------- | --------- | ------|
|1 |862623034| 100.0000%| All|
|2 |861096240| 99.8230%| Mapped|
|3 |26506366| 3.0728%| Duplicates|
|4 |431233321| 49.9909%| FirstOfPair|
|5 |431389713| 50.0091%| SecondOfPair|
|6 |430909871| 49.9534%| ReadNegativeStrand|
|7 |0| 0.0000%| NotPrimaryAlignment|
|8 |862623034| 100.0000%| ReadPaired|
|9 |803603283| 93.1581%| ProperPair|
|10 |430922821| 49.9549%| MateNegativeStrand|
|11 |1584255| 0.1837%| MateUnmapped|
|12 |0| 0.0000%| ReadFailsVendorQualityCheck|
|13 |1380318| 0.1600%| SupplementaryAlignment|
|14 |1380318| 0.1600%| SecondaryOrSupplementary|
|15 |821996241| 95.2903%| MAPQ>0|
|16 |810652212| 93.9753%| MAPQ>10|
|17 |802852105| 93.0710%| MAPQ>20|
|18 |789252132| 91.4944%| MAPQ>30|
|19 |770426224| 89.3120%| MAPQ>40|
|20 |758373888| 87.9149%| MAPQ>50|
|21 |0| 0.0000%| MAPQ>60|
|22 |835092541| 96.8085%| First normal, second read inverted (paired end orientation)|
|23 |765156| 0.0887%| First normal, second read normal|
|24 |624090| 0.0723%| First inverted, second read inverted|
|25 |11537740| 1.3375%| First inverted, second read normal|
|26 |1462857| 0.1696%| Mate in same strand|
|27 |11751691| 1.3623%| Mate on other chr|
\ No newline at end of file
# CheckAllelesVcfInBam
## Introduction
This tool has been written to check the allele frequency in BAM files.
## Example
To get the help menu:
~~~
java -jar Biopet-0.2.0.jar tool CheckAllelesVcfInBam -h
Usage: CheckAllelesVcfInBam [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-I <file> | --inputFile <file>
-o <file> | --outputFile <file>
-s <value> | --sample <value>
-b <value> | --bam <value>
-m <value> | --min_mapping_quality <value>
~~~
To run the tool:
~~~
java -jar Biopet-0.2.0.jar tool CheckAllelesVcfInBam --inputFile myVCF.vcf \
--bam myBam1.bam --sample bam_sample1 --outputFile myAlleles.vcf
~~~
Note that the tool can run multiple BAM files at once.
The only thing one needs to make sure off is matching the `--bam` and `--sample` in that same order.
For multiple bam files:
~~~
java -jar Biopet-0.2.0.jar tool CheckAllelesVcfInBam --inputFile myVCF.vcf \
--bam myBam1.bam --sample bam_sample1 --bam myBam2.bam --sample bam_sample2 \
--bam myBam3.bam --sample bam_sample3 --outputFile myAlleles.vcf
~~~
## Output
outputFile = VCF file which contains an extra field with the allele frequencies per sample given to the tool.
# ExtractAlignedFastq
## Introduction
This tool extracts reads from a BAM file based on alignment intervals.
E.g if one is interested in a specific location this tool extracts the full reads from the location.
The tool is also very usefull to create test data sets.
## Example
To get the help menu:
~~~
java -jar Biopet-0.2.0.jar tool ExtractAlignedFastq -h
ExtractAlignedFastq - Select aligned FASTQ records
Usage: ExtractAlignedFastq [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-I <bam> | --input_file <bam>
Input BAM file
-r <interval> | --interval <interval>
Interval strings
-i <fastq> | --in1 <fastq>
Input FASTQ file 1
-j <fastq> | --in2 <fastq>
Input FASTQ file 2 (default: none)
-o <fastq> | --out1 <fastq>
Output FASTQ file 1
-p <fastq> | --out2 <fastq>
Output FASTQ file 2 (default: none)
-Q <value> | --min_mapq <value>
Minimum MAPQ of reads in target region to remove (default: 0)
-s <value> | --read_suffix_length <value>
Length of common suffix from each read pair (default: 0)
This tool creates FASTQ file(s) containing reads mapped to the given alignment intervals.
~~~
To run the tool:
~~~
java -jar Biopet-0.2.0.jar tool ExtractAlignedFastq \
--input_file myBam.bam --in1 myFastq_R1.fastq --out1 myOutFastq_R1.fastq --interval myTarget.bed
~~~
* Note that this tool works for single end and paired end data. The above example can be easily extended for paired end data.
The only thing one should add is: `--in2 myFastq_R2.fastq --out2 myOutFastq_R2.fastq`
* The interval is just a genomic position or multiple genomic positions wherefrom one wants to extract the reads.
## Output
The output of this tool will be fastq files containing only mapped reads with the given alignment intervals extracted from the bam file.
\ No newline at end of file
# FastqSplitter
## Introduction
This tool splits a fastq files based on the number of output files specified. So if one specifies 5 output files it will split the fastq
into 5 files. This can be very usefull if one wants to use chunking option in one of our pipelines, we can generate the exact amount of fastqs
needed for the number of chunks specified. Note that this will be automatically done inside the pipelines.
## Example
To get the help menu:
~~~
java -jar Biopet-0.2.0.jar tool FastqSplitter -h
Usage: FastqSplitter [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-I <file> | --inputFile <file>
out is a required file property
-o <file> | --output <file>
out is a required file property
~~~
To run the tool:
~~~
java -jar Biopet-0.2.0.jar tool FastqSplitter --inputFile myFastq.fastq \
--output mySplittedFastq_1.fastq --output mySplittedFastq_2.fastq \
--output mySplittedFastq_3.fastq
~~~
The above invocation will split the input in 3 equally divided fastq files.
## Output
Multiple fastq files based on the number of outputFiles specified.
\ No newline at end of file
# FindRepeatsPacBio
## Introduction
This tool looks and annotates repeat regions inside a BAM file. It extracts the regions of interest from a bed file and then intersects
those regions with the BAM file. On those extracted regions the tool will perform a
Mpileup and counts all insertions/deletions etc. etc. for that specific location on a per read basis.
## Example
To get the help menu:
~~~
java -jar Biopet-0.2.0-DEV-801b72ed.jar tool FindRepeatsPacBio -h
Usage: FindRepeatsPacBio [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-I <file> | --inputBam <file>
-b <file> | --inputBed <file>
output file, default to stdout
~~~
To run the tool:
~~~
java -jar Biopet-0.2.0.jar tool FindRepeatsPacBio --inputBam myInputbam.bam \
--inputBed myRepeatRegions.bed > mySummary.txt
~~~
Since the default output of the program is printed in stdout we can use > to write the output to a text file.
## Output
The Output is a tab delimited text file which looks like this:
|chr |startPos|stopPos |Repeat_seq|repeatLength|original_Repeat_readLength|
|-----|--------|--------|----------|------------|--------------------------|
|chr4 |3076603 |3076667 |CAG |3 |65 |
|chr4 |3076665 |3076667 |GCC |3 |3 |
|chrX |66765158|66765261|GCA |3 |104 |
table continues below:
|Calculated_repeat_readLength|minLength|maxLength|inserts |
|----------------------------|---------|---------|-------------------------------------|
|61,73,68 |61 |73 |GAC,G,T/A,C,G,G,A,G,A,G/C,C,C,A,C,A,G|
|3,3,3 |3 |3 |// |
|98 |98 |98 |A,G,G |
table continues below:
|deletions |notSpan|
|--------------------|-------|
|1,1,2,1,1,1,2//2,1,1|0 |
|// |0 |
|1,1,1,1,1,1,2,1 |0 |
\ No newline at end of file
# MergeAlleles
## Introduction
This tool is used to merge overlapping alleles.
## Example
To get the help menu:
~~~
java -jar Biopet-0.2.0.jar tool MergeAlleles -h
Usage: MergeAlleles [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-I <file> | --inputVcf <file>
-o <file> | --outputVcf <file>
-R <file> | --reference <file>
~~~
To run the tool:
~~~
java -jar Biopet-0.2.0-DEV-801b72ed.jar tool MergeAlleles \
--inputVcf myInput.vcf --outputVcf myOutput.vcf \
--reference /H.Sapiens/hg19/reference.fa
~~~
## Output
The output of this tool is a VCF file like format containing the merged Alleles only.
\ No newline at end of file
# VcfFilter
## Introduction
This tool enables a user to filter VCF files. For example on sample depth and/or total depth.
It can also be used to filter out the reference calls and/or minimum number of sample passes.
There is a wide set of options which one can use to change the filter settings.
## Example
To open the help menu:
~~~
java -jar Biopet-0.2.0.jar tool VcfFilter -h
Usage: VcfFilter [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-I <file> | --inputVcf <file>
Input vcf file
-o <file> | --outputVcf <file>
Output vcf file
--minSampleDepth <int>
Min value for DP in genotype fields
--minTotalDepth <int>
Min value of DP field in INFO fields
--minAlternateDepth <int>
Min value of AD field in genotype fields
--minSamplesPass <int>
Min number of samples to pass --minAlternateDepth, --minBamAlternateDepth and --minSampleDepth
--minBamAlternateDepth <int>
--denovoInSample <sample>
Only show variants that contain unique alleles in compete set for given sample
--mustHaveVariant <sample>
Given sample must have 1 alternative allele
--diffGenotype <sample:sample>
Given samples must have a different genotype
--filterHetVarToHomVar <sample:sample>
If variants in sample 1 are heterogeneous and alternative alleles are homogeneous in sample 2 variants are filtered
--filterRefCalls
Filter when there are only ref calls
--filterNoCalls
Filter when there are only no calls
--minQualscore <value>
Min qual score
~~~
To run the tool:
~~~
java -jar Biopet-0.2.0.jar tool VcfFilter --inputVcf myInput.vcf \
--outputVcf myOutput.vcf --filterRefCalls --minSampleDepth
~~~
## Output
The output is a vcf file containing the filters specified values.
\ No newline at end of file
# VcfToTsv
## Introduction
This tool enables a user to convert a vcf file to a tab delimited file (TSV).
This can be very usefull since some programs only accept a TSV for downstream analysis.
It gets rid of the vcf header and parses all data columns in a nice TSV file.
There is also a possibility to only select some specific fields from you vcf and only parse those fields to a TSV.
## Example
To open the help menu:
~~~
java -jar Biopet-0.2.0.jar tool VcfToTsv -h
Usage: VcfToTsv [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-I <file> | --inputFile <file>
-o <file> | --outputFile <file>
output file, default to stdout
-f <value> | --field <value>
-i <value> | --info_field <value>
--all_info
--all_format
-s <value> | --sample_field <value>
-d | --disable_defaults
~~~
To run the tool:
~~~
java -jar Biopet-0.2.0.jar tool VcfToTsv --inputFile myVCF.vcf \
--outputFile my_tabDelimited_VCF.tsv --all_info
~~~
## Output
The output of this tool is a TSV file produced from the input vcf file.
Depending on which options are enabled their could be some fields discarded.
\ No newline at end of file
# WipeReads
## Introduction
WipeReads is a tool for removing reads from indexed BAM files.
It respects pairing information and can be set to remove reads whose duplicate
maps outside of the target region. The main use case is to remove reads mapping
to known ribosomal RNA regions (using a supplied BED file containing intervals for these regions).
## Example
To open the help menu:
~~~
java -jar Biopet-0.2.0.jar tool WipeReads -h
WipeReads - Region-based reads removal from an indexed BAM file
Usage: WipeReads [options]
-l <value> | --log_level <value>
Log level
-h | --help
Print usage
-v | --version
Print version
-I <bam> | --input_file <bam>
Input BAM file
-r <bed/gtf/refflat> | --interval_file <bed/gtf/refflat>
Interval BED file
-o <bam> | --output_file <bam>
Output BAM file
-f <bam> | --discarded_file <bam>
Discarded reads BAM file (default: none)
-Q <value> | --min_mapq <value>
Minimum MAPQ of reads in target region to remove (default: 0)
-G <rgid> | --read_group <rgid>
Read group IDs to be removed (default: remove reads from all read groups)
--limit_removal
Whether to remove multiple-mapped reads outside the target regions (default: yes)
--no_make_index
Whether to index output BAM file or not (default: yes)
GTF-only options:
-t <gtf_feature_type> | --feature_type <gtf_feature_type>
GTF feature containing intervals (default: exon)
Advanced options:
--bloom_size <value>
Expected maximum number of reads in target regions (default: 7e7)
--false_positive <value>
False positive rate (default: 4e-7)
This tool will remove BAM records that overlaps a set of given regions.
By default, if the removed reads are also mapped to other regions outside
the given ones, they will also be removed.
~~~
To run the tool:
~~~
java -jar Biopet-0.2.0.jar tool WipeReads --input_file myBam.bam \
--interval_file myRibosomal_regions.bed --output_file myFilteredBam.bam
~~~
## Output
This tool outputs a bam file containing all the reads not inside a ribosomal region.
And optionally a bam file with only the ribosomal reads
......@@ -3,6 +3,7 @@ These tools are written to create the appropriate files for the SAGE pipeline.
Note that these tools are already implemented in the pipeline.
## SageCountFastq
To open the help menu:
~~~
java -jar Biopet-0.2.0.jar tool SageCreateLibrary -h
Usage: SageCountFastq [options]
......@@ -19,6 +20,7 @@ Usage: SageCountFastq [options]
~~~
## SageCreateLibrary
To open the help menu:
~~~
java -jar Biopet-0.2.0.jar tool SageCreateLibrary -h
Usage: SageCreateLibrary [options]
......@@ -42,10 +44,10 @@ Usage: SageCreateLibrary [options]
--noAntiTagsOutput <file>
--allGenesOutput <file>
~~~
## SageCreateTagCounts
To open the help menu:
~~~
java -jar Biopet-0.2.0.jar tool SageCreateTagCounts -h
Usage: SageCreateTagCounts [options]
......
......@@ -11,10 +11,17 @@ pages:
- ['tools/BastyGenerateFasta.md','tools','BastyGenerateFasta']
- ['tools/bedtointerval.md','tools','BedToInterval']
- ['tools/bedtoolscoveragetocounts.md','tools','BedtoolsCoverageToCounts']
- ['tools/BiopetFlagstat.md','tools','BiopetFlagstat']
- ['tools/CheckAllelesVcfInBam.md','tools','CheckAllelesVcfInBam']
- ['tools/ExtractAlignedFastq.md','tools','ExtractAlignedFastq']
- ['tools/FastqSplitter.md', 'tools','FastqSplitter']
- ['tools/FindRepeatsPacBio.md','tools','FindRepeatsPacBio']
- ['tools/VcfFilter.md','tools','VcfFilter']
- ['tools/MpileupToVcf.md', 'tools', 'MpileupToVcf']
- ['tools/sagetools.md', 'tools', 'Sagetools']
- ['tools/WipeReads.md', 'tools', 'WipeReads']
- ['cluster/oge.md', 'OpenGridEngine']
- ['about.md', 'About']
- ['license.md', 'License']
theme: readthedocs
repo_url: https://git.lumc.nl/biopet/biopet
repo_url: https://git.lumc.nl/biopet/biopet
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment