Commit 08cf6ddd authored by Hoogenboom, Jerry's avatar Hoogenboom, Jerry
Browse files

Various bug fixes and refinements throughout FDSTools

* Global changes in v0.0.4:
  * FDSTools will now print profiling information to stdout when the -d/--debug
    option was specified.
  * Fixed bug where specifying '-' as the output filename would be taken
    literally, while it should have been interpreted as 'write to standard out'
    (Affected tools: BGCorrect, Samplestats, Seqconvert, Stuttermark).
  * Added more detailed license information to FDSTools.
* BGEstimate v1.1.0:
  * Added a new option -g/--min-genotypes (default: 3). Only alleles that occur
    in at least this number of unique heterozygous genotypes will be
    considered. This is to avoid 'contamination' of the noise profile of one
    allele with the noise of another. If homozygous samples are available for
    an allele, this filter is not applied to that allele. Setting this option
    to 1 effectively disables it. This option has the same cascading effect as
    the -s/--min-samples option, that is, if one allele does not meet...
parent 3a495653
This diff is collapsed.
This file contains the licenses of third-party projects included with FDSTools.
*******************************************************************************
Vega: A Visualization Grammar
Copyright (c) 2013, Trifacta Inc.
Copyright (c) 2015, University of Washington Interactive Data Lab
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*******************************************************************************
D3: Data-Driven Documents
Copyright 2010-2016 Mike Bostock
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the author nor the names of contributors may be used to
endorse or promote products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
\ No newline at end of file
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
"""
Data analysis tools for Next Generation Sequencing of forensic DNA markers,
including tools for characterisation and filtering of PCR stutter artefacts and
other systemic noise, and for automatic detection of the alleles in a sample.
"""
__version_info__ = ('0', '0', '3')
__version_info__ = ('0', '0', '4')
__version__ = '.'.join(__version_info__)
usage = __doc__.split("\n\n\n")
......
#!/usr/bin/env python
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
import argparse, pkgutil, os, re, textwrap
#import cProfile # Imported only if the -d/--debug option is specified
import tools
from . import usage, version
......@@ -95,6 +116,11 @@ def main():
__tools__[args.tool].error(
"The following arguments are not known. Please check spelling "
"and argument order: '%s'." % "', '".join(unknowns))
if args.debug:
import cProfile
cProfile.runctx(
"args.func(args)", globals(), locals(), sort="tottime")
else:
args.func(args)
except Exception as error:
if args.debug:
......
#!/usr/bin/env python
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
import re, sys, argparse, random, itertools
#import numpy as np # Imported only when calling nnls()
......@@ -46,6 +66,8 @@ PAT_STR_DEF_BLOCK = re.compile("([ACGT]+)\s+(\d+)\s+(\d+)")
# Pattern to split a comma-, semicolon-, or space-separated list.
PAT_SPLIT = re.compile("\s*[,; \t]\s*")
PAT_SPLIT_QUOTED = re.compile(
r""""((?:\\"|[^"])*)"|'((?:\\'|[^'])*)'|(\S+)""")
# Pattern that matches a chromosome name/number.
PAT_CHROMOSOME = re.compile(
......@@ -689,6 +711,7 @@ def parse_library_ini(handle):
def load_profiles(profilefile, library=None):
# TODO: To be replaced with load_profiles_new (less memory).
column_names = profilefile.readline().rstrip("\r\n").split("\t")
(colid_marker, colid_allele, colid_sequence, colid_fmean, colid_rmean,
colid_tool) = get_column_ids(column_names, "marker", "allele", "sequence",
......@@ -752,6 +775,41 @@ def load_profiles(profilefile, library=None):
#load_profiles
def load_profiles_new(profilefile, library=None):
# TODO, rename this to load_profiles to complete transition.
column_names = profilefile.readline().rstrip("\r\n").split("\t")
(colid_marker, colid_allele, colid_sequence, colid_fmean, colid_rmean,
colid_tool) = get_column_ids(column_names, "marker", "allele", "sequence",
"fmean", "rmean", "tool")
profiles = {}
for line in profilefile:
line = line.rstrip("\r\n").split("\t")
if line == [""]:
continue
marker = line[colid_marker]
if marker not in profiles:
profiles[marker] = {}
allele = ensure_sequence_format(line[colid_allele], "raw",
library=library, marker=marker)
sequence = ensure_sequence_format(line[colid_sequence], "raw",
library=library, marker=marker)
if allele not in profiles[marker]:
profiles[marker][allele] = {}
elif sequence in profiles[marker][allele]:
raise ValueError(
"Invalid background noise profiles file: encountered "
"multiple values for marker '%s' allele '%s' sequence '%s'" %
(marker, allele, sequence))
profiles[marker][allele][sequence] = {
"forward": float(line[colid_fmean]),
"reverse": float(line[colid_rmean]),
"tool": line[colid_tool]}
return profiles
#load_profiles_new
def pattern_longest_match(pattern, subject):
"""Return the longest match of the pattern in the subject string."""
# FIXME, this function tries only one match at each position in the
......@@ -1261,21 +1319,32 @@ def get_repeat_pattern(seq):
def read_sample_data_file(infile, data, annotation_column=None, seqformat=None,
library=None, default_marker=None,
drop_special_seq=False, after_correction=False):
drop_special_seq=False, after_correction=False,
extra_columns=None):
"""Add data from infile to data dict as [marker, sequence]=reads."""
# Get column numbers.
column_names = infile.readline().rstrip("\r\n").split("\t")
colid_sequence = get_column_ids(column_names, "sequence")
colid_forward = None
colid_reverse = None
numtype = int
if after_correction:
colid_forward, colid_reverse = get_column_ids(column_names,
"forward_corrected", "reverse_corrected",
optional=(after_correction != "require"))
if colid_forward is None:
colid_forward = get_column_ids(column_names, "forward")
else:
numtype = float
if colid_reverse is None:
colid_reverse = get_column_ids(column_names, "reverse")
else:
numtype = float
if extra_columns is not None:
extra_colids = {c: i for c, i in
((c, get_column_ids(column_names, c, optional=extra_columns[c]))
for c in extra_columns)
if i is not None}
# Get marker name column if it exists.
colid_marker = get_column_ids(column_names, "marker", optional=True)
......@@ -1300,8 +1369,11 @@ def read_sample_data_file(infile, data, annotation_column=None, seqformat=None,
if (annotation_column is not None and
line[colid_annotation].startswith("ALLELE")):
found_alleles.append((marker, sequence))
data[marker, sequence] = map(int,
data[marker, sequence] = map(numtype,
(line[colid_forward], line[colid_reverse]))
if extra_columns is not None:
data[marker, sequence].append(
{c: line[extra_colids[c]] for c in extra_colids})
return found_alleles
#read_sample_data_file
......@@ -1330,7 +1402,7 @@ def get_sample_data(tags_to_files, callback, allelelist=None,
annotation_column=None, seqformat=None, library=None,
marker=None, homozygotes=False, limit_reads=sys.maxint,
drop_samples=0, drop_special_seq=False,
after_correction=False):
after_correction=False, extra_columns=None):
if drop_samples:
sample_tags = tags_to_files.keys()
for tag in random.sample(xrange(len(sample_tags)),
......@@ -1344,7 +1416,7 @@ def get_sample_data(tags_to_files, callback, allelelist=None,
infile = sys.stdin if infile == "-" else open(infile, "r")
alleles.update(read_sample_data_file(
infile, data, annotation_column, seqformat, library, marker,
drop_special_seq, after_correction))
drop_special_seq, after_correction, extra_columns))
if infile != sys.stdin:
infile.close()
if limit_reads < sys.maxint:
......@@ -1620,7 +1692,8 @@ def get_input_output_files(args, single=False, batch_support=False):
for infile in infiles]
if len(outfiles) == 1:
outfile = outfiles[0]
outfile = sys.stdout if outfiles[0] == "-" else outfiles[0]
if outfile == sys.stdout and len(set(tags)) == 1:
# Write output of single sample to stdout.
return ((tag, infiles, outfile) for tag in set(tags))
......@@ -1647,6 +1720,16 @@ def get_input_output_files(args, single=False, batch_support=False):
#get_input_output_files
def split_quoted_string(text):
return reduce(
lambda x, y: x + ["".join([
y[0].replace("\\\"", "\""),
y[1].replace("\\'", "'"),
y[2]])],
PAT_SPLIT_QUOTED.findall(text), [])
#split_quoted_string
def print_db(text, debug):
"""Print text if debug is True."""
if debug:
......
#!/usr/bin/env python
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
"""
Find true alleles in reference samples and detect possible
contaminations.
......
#!/usr/bin/env python
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
"""
Match background noise profiles (obtained from e.g., bgestimate) to
samples.
......@@ -28,7 +49,7 @@ from ..lib import load_profiles, ensure_sequence_format, get_column_ids, \
nnls, add_sequence_format_args, SEQ_SPECIAL_VALUES, \
add_input_output_args, get_input_output_files
__version__ = "1.0.0"
__version__ = "1.0.1"
def get_sample_data(infile, convert_to_raw=False, library=None):
......@@ -47,6 +68,7 @@ def get_sample_data(infile, convert_to_raw=False, library=None):
column_names.append("reverse_corrected")
column_names.append("total_corrected")
column_names.append("correction_flags")
column_names.append("weight")
colid_marker, colid_sequence, colid_forward, colid_reverse =get_column_ids(
column_names, "marker", "sequence", "forward", "reverse")
data = {}
......@@ -64,10 +86,11 @@ def get_sample_data(infile, convert_to_raw=False, library=None):
cols.append(0)
cols.append(0)
cols.append(0)
cols.append(int(cols[colid_forward]))
cols.append(int(cols[colid_reverse]))
cols.append(int(cols[colid_forward]) + int(cols[colid_reverse]))
cols.append(cols[colid_forward])
cols.append(cols[colid_reverse])
cols.append(cols[colid_forward] + cols[colid_reverse])
cols.append("not_corrected")
cols.append((cols[colid_forward] + cols[colid_reverse]) / 100.)
if marker not in data:
data[marker] = []
data[marker].append(cols)
......@@ -81,11 +104,11 @@ def match_profile(column_names, data, profile, convert_to_raw, library,
colid_forward_noise, colid_reverse_noise, colid_total_noise,
colid_forward_add, colid_reverse_add, colid_total_add,
colid_forward_corrected, colid_reverse_corrected,
colid_total_corrected, colid_correction_flags) = get_column_ids(
column_names, "marker", "sequence", "forward", "reverse", "total",
"forward_noise", "reverse_noise", "total_noise", "forward_add",
"reverse_add", "total_add", "forward_corrected", "reverse_corrected",
"total_corrected", "correction_flags")
colid_total_corrected, colid_correction_flags, colid_weight) = \
get_column_ids(column_names, "marker", "sequence", "forward",
"reverse", "total", "forward_noise", "reverse_noise", "total_noise",
"forward_add", "reverse_add", "total_add", "forward_corrected",
"reverse_corrected", "total_corrected", "correction_flags", "weight")
# Enter profiles into P.
P1 = np.matrix(profile["forward"])
......@@ -127,10 +150,11 @@ def match_profile(column_names, data, profile, convert_to_raw, library,
reverse_add = np.multiply(A, P2.sum(1).T)
# Round values to 3 decimal positions.
forward_noise.round(3, forward_noise);
reverse_noise.round(3, reverse_noise);
forward_add.round(3, forward_add);
reverse_add.round(3, reverse_add);
A.round(3, A)
forward_noise.round(3, forward_noise)
reverse_noise.round(3, reverse_noise)
forward_add.round(3, forward_add)
reverse_add.round(3, reverse_add)
j = 0
for line in data:
......@@ -163,8 +187,10 @@ def match_profile(column_names, data, profile, convert_to_raw, library,
line[colid_correction_flags] = "corrected_bgpredict"
else:
line[colid_correction_flags] = "corrected"
line[colid_weight] = A[0, i]
else:
line[colid_correction_flags] = "corrected_as_background_only"
line[colid_weight] = line[colid_total_corrected] / 100.
# Add sequences that are in the profile but not in the sample.
for i in range(profile["m"]):
......@@ -201,11 +227,13 @@ def match_profile(column_names, data, profile, convert_to_raw, library,
line[colid_correction_flags] = "corrected_bgpredict"
else:
line[colid_correction_flags] = "corrected"
line[colid_weight] = A[0, i]
else:
line[colid_forward_add] = 0
line[colid_reverse_add] = 0
line[colid_total_add] = 0
line[colid_correction_flags] = "corrected_as_background_only"
line[colid_weight] = line[colid_total_corrected] / 100.
data.append(line)
#match_profile
......
#!/usr/bin/env python
#
# Copyright (C) 2016 Jerry Hoogenboom
#
# This file is part of FDSTools, data analysis tools for Next
# Generation Sequencing of forensic DNA markers.
#
# FDSTools is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# FDSTools is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with FDSTools. If not, see <http://www.gnu.org/licenses/>.
#
"""
Estimate allele-centric background noise profiles (means) from reference
samples.
......@@ -17,7 +38,7 @@ from ..lib import pos_int_arg, add_input_output_args, get_input_output_files,\
parse_allelelist, get_sample_data, \
add_random_subsampling_args
__version__ = "1.0.0"
__version__ = "1.1.0"
# Default values for parameters are specified below.
......@@ -41,6 +62,10 @@ _DEF_MIN_SAMPLES = 2
# This value can be overridden by the -S command line option.
_DEF_MIN_SAMPLE_PCT = 80.
# Default minimum number of heterozygous genotypes for each true allele.
# This value can be overridden by the -g command line option.
_DEF_MIN_GENOTYPES = 3
def solve_profile_mixture(forward, reverse, genotypes, n, variance=False,
reportfile=None):
......@@ -246,11 +271,11 @@ def solve_profile_mixture_single(samples, genotypes, n, variance=False,
return P.tolist(), V.tolist()
else:
return P.tolist()
#solve_profile_mixture
#solve_profile_mixture_single
def ensure_min_samples(allelelist, min_samples):
if min_samples <= 1:
def filter_data(allelelist, min_samples, min_genotypes):
if min_samples <= 1 and min_genotypes <= 1:
return
marker_names = set()
......@@ -258,33 +283,51 @@ def ensure_min_samples(allelelist, min_samples):
marker_names.update(allelelist[tag].keys())
for marker in marker_names:
# Get a sample count of each true allele of this marker.
# Get a sample count of each true allele of this marker
# and a sample count of each unique genotype of each allele.
true_alleles = {}
for tag in allelelist:
if marker not in allelelist[tag]:
continue
for true_allele in allelelist[tag][marker]:
try:
true_alleles[true_allele] += 1
true_alleles[true_allele][0] += 1
except KeyError:
true_alleles[true_allele] = 1
true_alleles[true_allele] = [1, {}]
if len(allelelist[tag][marker]) == 1: # Got homozygote!
true_alleles[true_allele][1] = None
elif true_alleles[true_allele][1] is not None:
genotype = "\t".join(sorted(allelelist[tag][marker]))
try:
true_alleles[true_allele][1][genotype] += 1
except KeyError:
true_alleles[true_allele][1][genotype] = 1
# Drop any alleles that occur in less than min_samples samples
# or in less than min_genotypes unique heterozygous genotypes
# (by dropping the sample for this marker completely).
repeat = True
while repeat:
repeat = False
for true_allele in true_alleles:
if 0 < true_alleles[true_allele] < min_samples:
if 0 < true_alleles[true_allele][0] < min_samples or (
true_alleles[true_allele][1] is not None and
0 < len(true_alleles[true_allele][1]) < min_genotypes):
repeat = True
for tag in allelelist:
if marker not in allelelist[tag]:
continue
if true_allele in allelelist[tag][marker]:
genotype = "\t".join(
sorted(allelelist[tag][marker]))
for allele in allelelist[tag][marker]:
true_alleles[allele] -= 1
true_alleles[allele][0] -= 1
if true_alleles[allele][1] is not None:
true_alleles[allele][1][genotype] -= 1
if not true_alleles[allele][1][genotype]:
del true_alleles[allele][1][genotype]
del allelelist[tag][marker]
#ensure_min_samples
#filter_data
def add_sample_data(data, sample_data, sample_alleles, min_pct, min_abs, tag):
......@@ -398,8 +441,8 @@ def preprocess_data(data, min_sample_pct):
def generate_profiles(samples_in, outfile, reportfile, allelefile,
annotation_column, min_pct, min_abs, min_samples,
min_sample_pct, seqformat, library, marker, homozygotes,
limit_reads, drop_samples):
min_sample_pct, min_genotypes, seqformat, library,
marker, homozygotes, limit_reads, drop_samples):
if reportfile:
t0 = time.time()
......@@ -414,9 +457,9 @@ def generate_profiles(samples_in, outfile, reportfile, allelefile,
allelelist, annotation_column, seqformat, library, marker,
homozygotes, limit_reads, drop_samples, True)
# Ensure minimum number of samples per allele.
# Ensure minimum number of samples and genotypes per allele.
allelelist = {tag: allelelist[tag] for tag in sample_data}
ensure_min_samples(allelelist, min_samples)
filter_data(allelelist, min_samples, min_genotypes)
# Combine data from all samples. This takes most time.
data = {}
......@@ -501,6 +544,11 @@ def add_arguments(parser):
help="require this minimum number of samples for each background "
"product, as a percentage of the number of samples with a "
"particular true allele (default: %(default)s)")
filtergroup.add_argument('-g', '--min-genotypes', metavar="N",
type=pos_int_arg, default=_DEF_MIN_GENOTYPES,
help="require this minimum number of unique heterozygous genotypes "
"for each allele for which no homozygous samples are available "
"(default: %(default)s)")
filtergroup.add_argument('-M', '--marker', metavar="MARKER",
help="work only on MARKER")
filtergroup.add_argument('-H', '--homozygotes', action="store_true",
......@@ -522,6 +570,7 @@ def run(args):
generate_profiles(files[0], files[1], args.report, args.allelelist,
args.annotation_column, args.min_pct, args.min_abs,
args.min_samples, args.min_sample_pct,
args.sequence_format, args.library, args.marker,
args.homozygotes, args.limit_reads, args.drop_samples)
args.min_genotypes, args.sequence_format, args.library,
args.marker, args.homozygotes, args.limit_reads,
args.drop_samples)
#run