@@ -35,11 +35,7 @@ It is assumed that these inputfiles are stored under `data/raw/`, as `all_taxCla
Species of interest need to be given by the user, as name. E.g. '_Escherichia coli_' is a valid name. Genus names such as '_Clostridium_' work as well.
These should be entered into the pipeline configuration file: `config/config.yaml`.
Sequences to screen for may be any sequence as fasta file. This file should be listed in the `config/config.yaml` file. Additionally, a table with alternative names for the sequences may be entered. This is useful when screening for genes from GenBank, e.g. when you are looking for the _colibactin A_ gene (clbA) and your reference fasta has the label "lcl|CP025328.1_cds_AUG64981.1_5 [gene=clbA] [locus_tag=CXG97_11595] [protein=colibactin biosynthesis phosphopantetheinyl transferase ClbA] [protein_id=AUG64981.1] [location=2190549..2191283] [gbkey=CDS]", and you want to list just "clbA". A tab-separated table like below can be used to insert better readable names in figures.
Sequences to screen for may be any sequence as fasta file. This directory with fasta files should be listed in the `config/config.yaml` file, next to `reference_directory: `.
When these files are provided to the pipeline, it will try to: