Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
H
hutspot
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Klinische Genetica
capture-lumc
hutspot
Commits
a191f581
Commit
a191f581
authored
7 years ago
by
Sander Bollen
Browse files
Options
Downloads
Patches
Plain Diff
write seqtk as bash script
parent
9256dcbf
No related branches found
No related tags found
1 merge request
!2
Review comments
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
Snakefile
+7
-4
7 additions, 4 deletions
Snakefile
envs/seqtk.yml
+3
-1
3 additions, 1 deletion
envs/seqtk.yml
src/seqtk.py
+0
-46
0 additions, 46 deletions
src/seqtk.py
src/seqtk.sh
+16
-0
16 additions, 0 deletions
src/seqtk.sh
with
26 additions
and
51 deletions
Snakefile
+
7
−
4
View file @
a191f581
...
...
@@ -26,6 +26,7 @@ def fsrc_dir(*args):
covpy = fsrc_dir("src", "covstats.py")
colpy = fsrc_dir("src", "collect_stats.py")
mpy = fsrc_dir("src", "merge_stats.py")
seq = fsrc_dir("src", "seqtk.sh")
if FASTQ_COUNT is None:
fqc = "python {0}".format(fsrc_dir("src", "fastq-count.py"))
...
...
@@ -144,26 +145,28 @@ rule seqtk_r1:
"""Either subsample or link forward fastq file"""
input:
stats=out_path("{sample}/pre_process/{sample}.preqc_count.json"),
fastq=out_path("{sample}/pre_process/{sample}.merged_R1.fastq.gz")
fastq=out_path("{sample}/pre_process/{sample}.merged_R1.fastq.gz"),
seqtk=seq
params:
max_bases=MAX_BASES
output:
fastq=temp(out_path("{sample}/pre_process/{sample}.sampled_R1.fastq.gz"))
conda: "envs/seqtk.yml"
s
cript: "src/seqtk.py
"
s
hell: "bash {input.seqtk} {input.stats} {input.fastq} {output.fastq} {params.max_bases}
"
rule seqtk_r2:
"""Either subsample or link reverse fastq file"""
input:
stats = out_path("{sample}/pre_process/{sample}.preqc_count.json"),
fastq = out_path("{sample}/pre_process/{sample}.merged_R2.fastq.gz")
fastq = out_path("{sample}/pre_process/{sample}.merged_R2.fastq.gz"),
seqtk=seq
params:
max_bases = MAX_BASES
output:
fastq = temp(out_path("{sample}/pre_process/{sample}.sampled_R2.fastq.gz"))
conda: "envs/seqtk.yml"
s
cript: "src/seqtk.py
"
s
hell: "bash {input.seqtk} {input.stats} {input.fastq} {output.fastq} {params.max_bases}
"
# contains original merged fastq files as input to prevent them from being prematurely deleted
...
...
This diff is collapsed.
Click to expand it.
envs/seqtk.yml
+
3
−
1
View file @
a191f581
...
...
@@ -6,4 +6,6 @@ channels:
-
r
dependencies
:
-
seqtk=1.2=0
-
zlib=1.2.11=0
-
bc=1.06=0
-
sed=4.4=1
-
zlib=1.2.11=0
\ No newline at end of file
This diff is collapsed.
Click to expand it.
src/seqtk.py
deleted
100644 → 0
+
0
−
46
View file @
9256dcbf
"""
Little script from running seqtk with conda
Conda directives can
'
t be used with a run directive,
so must be combined with script directive in stead.
This script assumes the following:
- a `snakemake` object exists,
- this object has the following attributes:
- input: a list of two items:
1. output of fastq-count as path to json file
2. a fastq file to be sub-sampled
- output: a list of one item containing path to output file
- params: a list of one item containing the max number of bases
- a `shell` function exists
This will _not_ work outside of a snakemake context.
"""
import
json
from
snakemake
import
shell
def
subsample
(
json_path
,
fastq_path
,
opath
,
max_bases
):
with
open
(
json_path
)
as
handle
:
bases
=
json
.
load
(
handle
)[
'
bases
'
]
if
max_bases
==
""
or
max_bases
is
None
:
frac
=
100
else
:
frac
=
int
(
max_bases
)
/
float
(
bases
)
if
frac
>=
1
:
cmd
=
"
ln -s {0} {1}
"
.
format
(
fastq_path
,
opath
)
else
:
cmd
=
"
seqtk sample -s100 {0} {1} | gzip -c > {2}
"
.
format
(
fastq_path
,
frac
,
opath
)
print
(
"
executing
"
)
print
(
cmd
)
shell
(
cmd
)
subsample
(
snakemake
.
input
[
0
],
snakemake
.
input
[
1
],
snakemake
.
output
[
0
],
snakemake
.
params
[
0
])
This diff is collapsed.
Click to expand it.
src/seqtk.sh
0 → 100644
+
16
−
0
View file @
a191f581
#!/usr/bin/env bash
count_json
=
${
1
}
input_fastq
=
${
2
}
output_fastq
=
${
3
}
max_bases
=
${
4
}
bases
=
$(
jq
'.bases'
$count_json
)
frac
=
$(
jq
-n
"
$max_bases
/
$bases
"
|
sed
-e
"s:e:E:g"
)
echo
$frac
if
((
$(
echo
"
$frac
> 1"
| bc
-l
)
))
;
then
ln
-s
$input_fastq
$output_fastq
else
seqtk sample
-s100
$frac
$input_fastq
|
gzip
-c
>
$output_fastq
fi
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment