Commit 08cf6ddd authored by Hoogenboom, Jerry's avatar Hoogenboom, Jerry
Browse files

Various bug fixes and refinements throughout FDSTools

* Global changes in v0.0.4:
  * FDSTools will now print profiling information to stdout when the -d/--debug
    option was specified.
  * Fixed bug where specifying '-' as the output filename would be taken
    literally, while it should have been interpreted as 'write to standard out'
    (Affected tools: BGCorrect, Samplestats, Seqconvert, Stuttermark).
  * Added more detailed license information to FDSTools.
* BGEstimate v1.1.0:
  * Added a new option -g/--min-genotypes (default: 3). Only alleles that occur
    in at least this number of unique heterozygous genotypes will be
    considered. This is to avoid 'contamination' of the noise profile of one
    allele with the noise of another. If homozygous samples are available for
    an allele, this filter is not applied to that allele. Setting this option
    to 1 effectively disables it. This option has the same cascading effect as
    the -s/--min-samples option, that is, if one allele does not meet...
parent 3a495653
This diff is collapsed.
This file contains the licenses of third-party projects included with FDSTools.
*******************************************************************************
Vega: A Visualization Grammar
Copyright (c) 2013, Trifacta Inc.
Copyright (c) 2015, University of Washington Interactive Data Lab
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*******************************************************************************
D3: Data-Driven Documents
Copyright 2010-2016 Mike Bostock
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the author nor the names of contributors may be used to
endorse or promote products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
\ No newline at end of file
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
""" """
Data analysis tools for Next Generation Sequencing of forensic DNA markers, Data analysis tools for Next Generation Sequencing of forensic DNA markers,
including tools for characterisation and filtering of PCR stutter artefacts and including tools for characterisation and filtering of PCR stutter artefacts and
other systemic noise, and for automatic detection of the alleles in a sample. other systemic noise, and for automatic detection of the alleles in a sample.
""" """
__version_info__ = ('0', '0', '3') __version_info__ = ('0', '0', '4')
__version__ = '.'.join(__version_info__) __version__ = '.'.join(__version_info__)
usage = __doc__.split("\n\n\n") usage = __doc__.split("\n\n\n")
......
#!/usr/bin/env python #!/usr/bin/env python
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
import argparse, pkgutil, os, re, textwrap import argparse, pkgutil, os, re, textwrap
#import cProfile # Imported only if the -d/--debug option is specified
import tools import tools
from . import usage, version from . import usage, version
...@@ -95,6 +116,11 @@ def main(): ...@@ -95,6 +116,11 @@ def main():
__tools__[args.tool].error( __tools__[args.tool].error(
"The following arguments are not known. Please check spelling " "The following arguments are not known. Please check spelling "
"and argument order: '%s'." % "', '".join(unknowns)) "and argument order: '%s'." % "', '".join(unknowns))
if args.debug:
import cProfile
cProfile.runctx(
"args.func(args)", globals(), locals(), sort="tottime")
else:
args.func(args) args.func(args)
except Exception as error: except Exception as error:
if args.debug: if args.debug:
......
#!/usr/bin/env python #!/usr/bin/env python
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
import re, sys, argparse, random, itertools import re, sys, argparse, random, itertools
#import numpy as np # Imported only when calling nnls() #import numpy as np # Imported only when calling nnls()
...@@ -46,6 +66,8 @@ PAT_STR_DEF_BLOCK = re.compile("([ACGT]+)\s+(\d+)\s+(\d+)") ...@@ -46,6 +66,8 @@ PAT_STR_DEF_BLOCK = re.compile("([ACGT]+)\s+(\d+)\s+(\d+)")
# Pattern to split a comma-, semicolon-, or space-separated list. # Pattern to split a comma-, semicolon-, or space-separated list.
PAT_SPLIT = re.compile("\s*[,; \t]\s*") PAT_SPLIT = re.compile("\s*[,; \t]\s*")
PAT_SPLIT_QUOTED = re.compile(
r""""((?:\\"|[^"])*)"|'((?:\\'|[^'])*)'|(\S+)""")
# Pattern that matches a chromosome name/number. # Pattern that matches a chromosome name/number.
PAT_CHROMOSOME = re.compile( PAT_CHROMOSOME = re.compile(
...@@ -689,6 +711,7 @@ def parse_library_ini(handle): ...@@ -689,6 +711,7 @@ def parse_library_ini(handle):
def load_profiles(profilefile, library=None): def load_profiles(profilefile, library=None):
# TODO: To be replaced with load_profiles_new (less memory).
column_names = profilefile.readline().rstrip("\r\n").split("\t") column_names = profilefile.readline().rstrip("\r\n").split("\t")
(colid_marker, colid_allele, colid_sequence, colid_fmean, colid_rmean, (colid_marker, colid_allele, colid_sequence, colid_fmean, colid_rmean,
colid_tool) = get_column_ids(column_names, "marker", "allele", "sequence", colid_tool) = get_column_ids(column_names, "marker", "allele", "sequence",
...@@ -752,6 +775,41 @@ def load_profiles(profilefile, library=None): ...@@ -752,6 +775,41 @@ def load_profiles(profilefile, library=None):
#load_profiles #load_profiles
def load_profiles_new(profilefile, library=None):
# TODO, rename this to load_profiles to complete transition.
column_names = profilefile.readline().rstrip("\r\n").split("\t")
(colid_marker, colid_allele, colid_sequence, colid_fmean, colid_rmean,
colid_tool) = get_column_ids(column_names, "marker", "allele", "sequence",
"fmean", "rmean", "tool")
profiles = {}
for line in profilefile:
line = line.rstrip("\r\n").split("\t")
if line == [""]:
continue
marker = line[colid_marker]
if marker not in profiles:
profiles[marker] = {}
allele = ensure_sequence_format(line[colid_allele], "raw",
library=library, marker=marker)
sequence = ensure_sequence_format(line[colid_sequence], "raw",
library=library, marker=marker)
if allele not in profiles[marker]:
profiles[marker][allele] = {}
elif sequence in profiles[marker][allele]:
raise ValueError(
"Invalid background noise profiles file: encountered "
"multiple values for marker '%s' allele '%s' sequence '%s'" %
(marker, allele, sequence))
profiles[marker][allele][sequence] = {
"forward": float(line[colid_fmean]),
"reverse": float(line[colid_rmean]),
"tool": line[colid_tool]}
return profiles
#load_profiles_new
def pattern_longest_match(pattern, subject): def pattern_longest_match(pattern, subject):
"""Return the longest match of the pattern in the subject string.""" """Return the longest match of the pattern in the subject string."""
# FIXME, this function tries only one match at each position in the # FIXME, this function tries only one match at each position in the
...@@ -1261,21 +1319,32 @@ def get_repeat_pattern(seq): ...@@ -1261,21 +1319,32 @@ def get_repeat_pattern(seq):
def read_sample_data_file(infile, data, annotation_column=None, seqformat=None, def read_sample_data_file(infile, data, annotation_column=None, seqformat=None,
library=None, default_marker=None, library=None, default_marker=None,
drop_special_seq=False, after_correction=False): drop_special_seq=False, after_correction=False,
extra_columns=None):
"""Add data from infile to data dict as [marker, sequence]=reads.""" """Add data from infile to data dict as [marker, sequence]=reads."""
# Get column numbers. # Get column numbers.
column_names = infile.readline().rstrip("\r\n").split("\t") column_names = infile.readline().rstrip("\r\n").split("\t")
colid_sequence = get_column_ids(column_names, "sequence") colid_sequence = get_column_ids(column_names, "sequence")
colid_forward = None colid_forward = None
colid_reverse = None colid_reverse = None
numtype = int
if after_correction: if after_correction:
colid_forward, colid_reverse = get_column_ids(column_names, colid_forward, colid_reverse = get_column_ids(column_names,
"forward_corrected", "reverse_corrected", "forward_corrected", "reverse_corrected",
optional=(after_correction != "require")) optional=(after_correction != "require"))
if colid_forward is None: if colid_forward is None:
colid_forward = get_column_ids(column_names, "forward") colid_forward = get_column_ids(column_names, "forward")
else:
numtype = float
if colid_reverse is None: if colid_reverse is None:
colid_reverse = get_column_ids(column_names, "reverse") colid_reverse = get_column_ids(column_names, "reverse")
else:
numtype = float
if extra_columns is not None:
extra_colids = {c: i for c, i in
((c, get_column_ids(column_names, c, optional=extra_columns[c]))
for c in extra_columns)
if i is not None}
# Get marker name column if it exists. # Get marker name column if it exists.
colid_marker = get_column_ids(column_names, "marker", optional=True) colid_marker = get_column_ids(column_names, "marker", optional=True)
...@@ -1300,8 +1369,11 @@ def read_sample_data_file(infile, data, annotation_column=None, seqformat=None, ...@@ -1300,8 +1369,11 @@ def read_sample_data_file(infile, data, annotation_column=None, seqformat=None,
if (annotation_column is not None and if (annotation_column is not None and
line[colid_annotation].startswith("ALLELE")): line[colid_annotation].startswith("ALLELE")):
found_alleles.append((marker, sequence)) found_alleles.append((marker, sequence))
data[marker, sequence] = map(int, data[marker, sequence] = map(numtype,
(line[colid_forward], line[colid_reverse])) (line[colid_forward], line[colid_reverse]))
if extra_columns is not None:
data[marker, sequence].append(
{c: line[extra_colids[c]] for c in extra_colids})
return found_alleles return found_alleles
#read_sample_data_file #read_sample_data_file
...@@ -1330,7 +1402,7 @@ def get_sample_data(tags_to_files, callback, allelelist=None, ...@@ -1330,7 +1402,7 @@ def get_sample_data(tags_to_files, callback, allelelist=None,
annotation_column=None, seqformat=None, library=None, annotation_column=None, seqformat=None, library=None,
marker=None, homozygotes=False, limit_reads=sys.maxint, marker=None, homozygotes=False, limit_reads=sys.maxint,
drop_samples=0, drop_special_seq=False, drop_samples=0, drop_special_seq=False,
after_correction=False): after_correction=False, extra_columns=None):
if drop_samples: if drop_samples:
sample_tags = tags_to_files.keys() sample_tags = tags_to_files.keys()
for tag in random.sample(xrange(len(sample_tags)), for tag in random.sample(xrange(len(sample_tags)),
...@@ -1344,7 +1416,7 @@ def get_sample_data(tags_to_files, callback, allelelist=None, ...@@ -1344,7 +1416,7 @@ def get_sample_data(tags_to_files, callback, allelelist=None,
infile = sys.stdin if infile == "-" else open(infile, "r") infile = sys.stdin if infile == "-" else open(infile, "r")
alleles.update(read_sample_data_file( alleles.update(read_sample_data_file(
infile, data, annotation_column, seqformat, library, marker, infile, data, annotation_column, seqformat, library, marker,
drop_special_seq, after_correction)) drop_special_seq, after_correction, extra_columns))
if infile != sys.stdin: if infile != sys.stdin:
infile.close() infile.close()
if limit_reads < sys.maxint: if limit_reads < sys.maxint:
...@@ -1620,7 +1692,8 @@ def get_input_output_files(args, single=False, batch_support=False): ...@@ -1620,7 +1692,8 @@ def get_input_output_files(args, single=False, batch_support=False):
for infile in infiles] for infile in infiles]
if len(outfiles) == 1: if len(outfiles) == 1:
outfile = outfiles[0] outfile = sys.stdout if outfiles[0] == "-" else outfiles[0]
if outfile == sys.stdout and len(set(tags)) == 1: if outfile == sys.stdout and len(set(tags)) == 1:
# Write output of single sample to stdout. # Write output of single sample to stdout.
return ((tag, infiles, outfile) for tag in set(tags)) return ((tag, infiles, outfile) for tag in set(tags))
...@@ -1647,6 +1720,16 @@ def get_input_output_files(args, single=False, batch_support=False): ...@@ -1647,6 +1720,16 @@ def get_input_output_files(args, single=False, batch_support=False):
#get_input_output_files #get_input_output_files
def split_quoted_string(text):
return reduce(
lambda x, y: x + ["".join([
y[0].replace("\\\"", "\""),
y[1].replace("\\'", "'"),
y[2]])],
PAT_SPLIT_QUOTED.findall(text), [])
#split_quoted_string
def print_db(text, debug): def print_db(text, debug):
"""Print text if debug is True.""" """Print text if debug is True."""
if debug: if debug:
......
#!/usr/bin/env python #!/usr/bin/env python
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
""" """
Find true alleles in reference samples and detect possible Find true alleles in reference samples and detect possible
contaminations. contaminations.
......
#!/usr/bin/env python #!/usr/bin/env python
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
""" """
Match background noise profiles (obtained from e.g., bgestimate) to Match background noise profiles (obtained from e.g., bgestimate) to
samples. samples.
...@@ -28,7 +49,7 @@ from ..lib import load_profiles, ensure_sequence_format, get_column_ids, \ ...@@ -28,7 +49,7 @@ from ..lib import load_profiles, ensure_sequence_format, get_column_ids, \
nnls, add_sequence_format_args, SEQ_SPECIAL_VALUES, \ nnls, add_sequence_format_args, SEQ_SPECIAL_VALUES, \
add_input_output_args, get_input_output_files add_input_output_args, get_input_output_files
__version__ = "1.0.0" __version__ = "1.0.1"
def get_sample_data(infile, convert_to_raw=False, library=None): def get_sample_data(infile, convert_to_raw=False, library=None):
...@@ -47,6 +68,7 @@ def get_sample_data(infile, convert_to_raw=False, library=None): ...@@ -47,6 +68,7 @@ def get_sample_data(infile, convert_to_raw=False, library=None):
column_names.append("reverse_corrected") column_names.append("reverse_corrected")
column_names.append("total_corrected") column_names.append("total_corrected")
column_names.append("correction_flags") column_names.append("correction_flags")
column_names.append("weight")
colid_marker, colid_sequence, colid_forward, colid_reverse =get_column_ids( colid_marker, colid_sequence, colid_forward, colid_reverse =get_column_ids(
column_names, "marker", "sequence", "forward", "reverse") column_names, "marker", "sequence", "forward", "reverse")
data = {} data = {}
...@@ -64,10 +86,11 @@ def get_sample_data(infile, convert_to_raw=False, library=None): ...@@ -64,10 +86,11 @@ def get_sample_data(infile, convert_to_raw=False, library=None):
cols.append(0) cols.append(0)
cols.append(0) cols.append(0)
cols.append(0) cols.append(0)
cols.append(int(cols[colid_forward])) cols.append(cols[colid_forward])
cols.append(int(cols[colid_reverse])) cols.append(cols[colid_reverse])
cols.append(int(cols[colid_forward]) + int(cols[colid_reverse])) cols.append(cols[colid_forward] + cols[colid_reverse])
cols.append("not_corrected") cols.append("not_corrected")
cols.append((cols[colid_forward] + cols[colid_reverse]) / 100.)
if marker not in data: if marker not in data:
data[marker] = [] data[marker] = []
data[marker].append(cols) data[marker].append(cols)
...@@ -81,11 +104,11 @@ def match_profile(column_names, data, profile, convert_to_raw, library, ...@@ -81,11 +104,11 @@ def match_profile(column_names, data, profile, convert_to_raw, library,
colid_forward_noise, colid_reverse_noise, colid_total_noise, colid_forward_noise, colid_reverse_noise, colid_total_noise,
colid_forward_add, colid_reverse_add, colid_total_add, colid_forward_add, colid_reverse_add, colid_total_add,
colid_forward_corrected, colid_reverse_corrected, colid_forward_corrected, colid_reverse_corrected,
colid_total_corrected, colid_correction_flags) = get_column_ids( colid_total_corrected, colid_correction_flags, colid_weight) = \
column_names, "marker", "sequence", "forward", "reverse", "total", get_column_ids(column_names, "marker", "sequence", "forward",
"forward_noise", "reverse_noise", "total_noise", "forward_add", "reverse", "total", "forward_noise", "reverse_noise", "total_noise",
"reverse_add", "total_add", "forward_corrected", "reverse_corrected", "forward_add", "reverse_add", "total_add", "forward_corrected",
"total_corrected", "correction_flags") "reverse_corrected", "total_corrected", "correction_flags", "weight")
# Enter profiles into P. # Enter profiles into P.
P1 = np.matrix(profile["forward"]) P1 = np.matrix(profile["forward"])
...@@ -127,10 +150,11 @@ def match_profile(column_names, data, profile, convert_to_raw, library, ...@@ -127,10 +150,11 @@ def match_profile(column_names, data, profile, convert_to_raw, library,
reverse_add = np.multiply(A, P2.sum(1).T) reverse_add = np.multiply(A, P2.sum(1).T)
# Round values to 3 decimal positions. # Round values to 3 decimal positions.
forward_noise.round(3, forward_noise); A.round(3, A)
reverse_noise.round(3, reverse_noise); forward_noise.round(3, forward_noise)
forward_add.round(3, forward_add); reverse_noise.round(3, reverse_noise)
reverse_add.round(3, reverse_add); forward_add.round(3, forward_add)
reverse_add.round(3, reverse_add)
j = 0 j = 0
for line in data: for line in data:
...@@ -163,8 +187,10 @@ def match_profile(column_names, data, profile, convert_to_raw, library, ...@@ -163,8 +187,10 @@ def match_profile(column_names, data, profile, convert_to_raw, library,
line[colid_correction_flags] = "corrected_bgpredict" line[colid_correction_flags] = "corrected_bgpredict"
else: else:
line[colid_correction_flags] = "corrected" line[colid_correction_flags] = "corrected"
line[colid_weight] = A[0, i]
else: else:
line[colid_correction_flags] = "corrected_as_background_only" line[colid_correction_flags] = "corrected_as_background_only"
line[colid_weight] = line[colid_total_corrected] / 100.
# Add sequences that are in the profile but not in the sample. # Add sequences that are in the profile but not in the sample.
for i in range(profile["m"]): for i in range(profile["m"]):
...@@ -201,11 +227,13 @@ def match_profile(column_names, data, profile, convert_to_raw, library, ...@@ -201,11 +227,13 @@ def match_profile(column_names, data, profile, convert_to_raw, library,
line[colid_correction_flags] = "corrected_bgpredict" line[colid_correction_flags] = "corrected_bgpredict"
else: else:
line[colid_correction_flags] = "corrected" line[colid_correction_flags] = "corrected"
line[colid_weight] = A[0, i]
else: else:
line[colid_forward_add] = 0 line[colid_forward_add] = 0
line[colid_reverse_add] = 0 line[colid_reverse_add] = 0
line[colid_total_add] = 0 line[colid_total_add] = 0
line[colid_correction_flags] = "corrected_as_background_only" line[colid_correction_flags] = "corrected_as_background_only"
line[colid_weight] = line[colid_total_corrected] / 100.
data.append(line) data.append(line)
#match_profile #match_profile
......
#!/usr/bin/env python #!/usr/bin/env python
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
""" """
Estimate allele-centric background noise profiles (means) from reference Estimate allele-centric background noise profiles (means) from reference
samples. samples.
...@@ -17,7 +38,7 @@ from ..lib import pos_int_arg, add_input_output_args, get_input_output_files,\ ...@@ -17,7 +38,7 @@ from ..lib import pos_int_arg, add_input_output_args, get_input_output_files,\
parse_allelelist, get_sample_data, \ parse_allelelist, get_sample_data, \
add_random_subsampling_args add_random_subsampling_args
__version__ = "1.0.0" __version__ = "1.1.0"
# Default values for parameters are specified below. # Default values for parameters are specified below.
...@@ -41,6 +62,10 @@ _DEF_MIN_SAMPLES = 2 ...@@ -41,6 +62,10 @@ _DEF_MIN_SAMPLES = 2