summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-18 05:05:29 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-18 05:05:29 +0000
commit5c6d2f5b3a0c732a425ad2f4edf3621d8fe22816 (patch)
tree3486d3429f03c64e2c780e9776ac1641027163c5
parent94da93d27ee4c2b9eee2fffe38273ec7350e15d3 (diff)
automatic import of python-bactinspectormax
-rw-r--r--.gitignore1
-rw-r--r--python-bactinspectormax.spec611
-rw-r--r--sources1
3 files changed, 613 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..370449a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/BactInspectorMax-0.1.3.tar.gz
diff --git a/python-bactinspectormax.spec b/python-bactinspectormax.spec
new file mode 100644
index 0000000..da0e6a3
--- /dev/null
+++ b/python-bactinspectormax.spec
@@ -0,0 +1,611 @@
+%global _empty_manifest_terminate_build 0
+Name: python-BactInspectorMax
+Version: 0.1.3
+Release: 1
+Summary: Package to investigate mash hits against refseq
+License: MIT
+URL: https://pypi.org/project/BactInspectorMax/
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/97/e3/087fb0be0414f1442d01e481e0ad22e8596680c98e6462b8513cbe307ee9/BactInspectorMax-0.1.3.tar.gz
+BuildArch: noarch
+
+Requires: python3-pandas
+Requires: python3-numpy
+
+%description
+# Bactinspector
+A package to
+
+1. determine the most probable species based on sequence in fasta/fastq files using refseq and Mash (https://mash.readthedocs.io/en/latest/index.html)
+It will count the species of the top ref seq mash matches and report most frequent.
+2. determine the closest reference in refseq to a set of fasta/fastq files
+
+The PyPi package is called BactInspectorMax since my original BactInspector asked for the path to a sketch
+of refseq genomes. In May 2019 a new curated mash sketch of complete bacterial refseq genome was
+created and bundled into the package. This required a special request to increase the PyPi package limit,
+hence the Max.
+The command is still `bactinspector <sub command>` however.
+
+## Data
+The data bundled into bactinspector are the complete bacterial refseq genomes from May 2019. The species assignations have been corrected by changing species where the majority does not match the species described within the refseq info found [here](ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt) and the excellent [Bacsort resource](https://github.com/rrwick/Bacsort) from [Ryan Wick](https://twitter.com/rrwick)
+
+## Dependencies
+[Mash](https://github.com/marbl/Mash/) (> v2.1)
+Installation with conda is recommended
+
+## Installation
+pip3 install bactinspectorMax
+
+## Usage
+```
+usage: bactinspector [-h] [-v]
+ {check_species,closest_match,info,create_species_info} ...
+
+A module to determine the most probable species based on sequence in fasta files using refseq and Mash (https://mash.readthedocs.io/en/latest/index.html)
+It will count the species of the top ref seq mash matches and report most frequent.
+
+In order to use the module:
+ • Specify an input directory and output directory (default is current directory)
+ • Specify either a
+ • fasta file pattern with -f (e.g "*.fas") or
+ • mash sketch file pattern with -m (e.g "*.msh") if you have already sketched the fasta files
+ • By default the top 10 matches will be used. Change this with -n
+ • Speed things up by changing the number of parallel processes to match the cores on your computer using -p
+ • If mash is not in your PATH specify the directory containing the mash executable with -mp
+
+If you want to update the genomes used, follow the instructions on https://gitlab.com/antunderwood/bactinspector/wikis/Updating-the-genomes-in-BactInspector
+and use the create_species_info command to make the required file
+
+positional arguments:
+ {check_species,closest_match,create_species_info}
+ The following commands are available. Type
+ bactinspector <COMMAND> -h for more help on a specific
+ commands
+ check_species Check the most frequent matches to a species in refseq
+ closest_match Report the closest matches to a set of sequences
+ create_species_info
+ Create species info TSV for locally created mash
+ sketches
+ info Provide information about the data in bactinspector
+optional arguments:
+ -h, --help show this help message and exit
+ -v, --version print out software version
+```
+
+### check_species
+Assign a species using matches to refseq. Based on observed intra-species mash distances, a result maybe marked as uncertain if the distance to the best hit is greater than 1.2 x the observed maximum intra-species distance.
+```
+usage: bactinspector check_species [-h] [-i INPUT_DIR] [-o OUTPUT_DIR]
+ [-p PARALLEL_PROCESSES]
+ [-n NUM_BEST_MATCHES] [-d DISTANCE_CUTOFF]
+ [-v ALLOWED_VARIANCE]
+ [-vl ALLOWED_VARIANCE_RARER_SPECIES] [-s]
+ [-l LOCAL_MASH_AND_INFO_FILE_PREFIX]
+ [-mp MASH_PATH]
+ (-f FASTA_FILE_PATTERN | -fq FASTQ_FILE_PATTERN | -m MASH_SKETCH_FILE_PATTERN)
+
+optional arguments:
+ -h, --help show this help message and exit
+ -i INPUT_DIR, --input_dir INPUT_DIR
+ path to input_directory
+ -o OUTPUT_DIR, --output_dir OUTPUT_DIR
+ path to output_directory
+ -p PARALLEL_PROCESSES, --parallel_processes PARALLEL_PROCESSES
+ number of processes to run in parallel
+ -n NUM_BEST_MATCHES, --num_best_matches NUM_BEST_MATCHES
+ number of best matches to return
+ -d DISTANCE_CUTOFF, --distance_cutoff DISTANCE_CUTOFF
+ mash distance cutoff (default 0.05)
+ -v ALLOWED_VARIANCE, --allowed_variance ALLOWED_VARIANCE
+ proportion of max_distance allowed over which a result
+ will be marked as uncertain (default 0.1)
+ -vl ALLOWED_VARIANCE_RARER_SPECIES, --allowed_variance_rarer_species ALLOWED_VARIANCE_RARER_SPECIES
+ proportion of max_distance allowed over which a result
+ will be marked as uncertain for species which have
+ fewer than 10 representatives in refseq (default 0.5)
+ -s, --stdout_summary output a summary of the result to STDOUT
+ -l LOCAL_MASH_AND_INFO_FILE_PREFIX, --local_mash_and_info_file_prefix LOCAL_MASH_AND_INFO_FILE_PREFIX
+ the path prefix to the mash sketch file and
+ corresponding info file
+ -mp MASH_PATH, --mash_path MASH_PATH
+ path to the mash executable. If not provided it is
+ assumed mash is in the PATH
+ -f FASTA_FILE_PATTERN, --fasta_file_pattern FASTA_FILE_PATTERN
+ pattern to match fasta files e.g "*.fas"
+ -fq FASTQ_FILE_PATTERN, --fastq_file_pattern FASTQ_FILE_PATTERN
+ pattern to match fastq files e.g "*.fastq.gz"
+ -m MASH_SKETCH_FILE_PATTERN, --mash_sketch_file_pattern MASH_SKETCH_FILE_PATTERN
+ pattern to match mash sketch files e.g "*.msh"
+```
+
+### closest_match
+Find the closest match of a set of genomes to genomes within refseq. Useful as an objective way of choosing a reference genome when mapping
+
+```
+usage: bactinspector closest_match [-h] [-i INPUT_DIR] [-o OUTPUT_DIR]
+ [-p PARALLEL_PROCESSES] [-r]
+ [-l LOCAL_MASH_AND_INFO_FILE_PREFIX]
+ [-mp MASH_PATH]
+ (-f FASTA_FILE_PATTERN | -fq FASTQ_FILE_PATTERN | -m MASH_SKETCH_FILE_PATTERN)
+
+optional arguments:
+ -h, --help show this help message and exit
+ -i INPUT_DIR, --input_dir INPUT_DIR
+ path to input_directory
+ -o OUTPUT_DIR, --output_dir OUTPUT_DIR
+ path to output_directory
+ -p PARALLEL_PROCESSES, --parallel_processes PARALLEL_PROCESSES
+ number of processes to run in parallel
+ -r, --ref_and_rep_only
+ only include reference and representative sequences
+ -l LOCAL_MASH_AND_INFO_FILE_PREFIX, --local_mash_and_info_file_prefix LOCAL_MASH_AND_INFO_FILE_PREFIX
+ the path prefix to the mash sketch file and
+ corresponding info file
+ -mp MASH_PATH, --mash_path MASH_PATH
+ path to the mash executable. If not provided it is
+ assumed mash is in the PATH
+ -f FASTA_FILE_PATTERN, --fasta_file_pattern FASTA_FILE_PATTERN
+ pattern to match fasta files e.g "*.fas"
+ -fq FASTQ_FILE_PATTERN, --fastq_file_pattern FASTQ_FILE_PATTERN
+ pattern to match fastq files e.g "*.fastq.gz"
+ -m MASH_SKETCH_FILE_PATTERN, --mash_sketch_file_pattern MASH_SKETCH_FILE_PATTERN
+ pattern to match mash sketch files e.g "*.msh"
+```
+### info
+Find out what sequences are present in the mash sketch using a term to search the species name. You can either specify `-s` to search the aggregated species information or `-i` to search the individual refseq records.
+
+```
+usage: bactinspector info [-h] -t SEARCH_TERM (-s | -i)
+
+optional arguments:
+ -h, --help show this help message and exit
+ -t SEARCH_TERM, --search_term SEARCH_TERM
+ search term to use when searching species within
+ bactinspector
+ -s, --summary search the aggregate data
+ -i, --individual_records
+ search the individual refseq records
+```
+
+### create_species_info
+
+An updated mash sketch file and corresponding info file can be created using the process described [here](https://gitlab.com/antunderwood/bactinspector/wikis/Updating-the-genomes-in-BactInspector)
+This process uses the create_species_info commmand
+
+```
+usage: bactinspector create_species_info [-h] -m MASH_INFO_FILE -r
+ REFSEQ_SUMMARY_FILE -b
+ BACSORT_SPECIES_FILE -x
+ BACSORT_EXCLUDED_ASSEMBLIES_FILE
+
+optional arguments:
+ -h, --help show this help message and exit
+ -m MASH_INFO_FILE, --mash_info_file MASH_INFO_FILE
+ path to info file created using mash info -t
+ -r REFSEQ_SUMMARY_FILE, --refseq_summary_file REFSEQ_SUMMARY_FILE
+ path to refseq assembly summary file downloaded via
+ wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteri
+ a/assembly_summary.txt
+ -b BACSORT_SPECIES_FILE, --bacsort_species_file BACSORT_SPECIES_FILE
+ path to bacsort_species_definitions.txt
+ -x BACSORT_EXCLUDED_ASSEMBLIES_FILE, --bacsort_excluded_assemblies_file BACSORT_EXCLUDED_ASSEMBLIES_FILE
+ path to bacsort_excluded_assemblies.txt
+```
+
+
+
+%package -n python3-BactInspectorMax
+Summary: Package to investigate mash hits against refseq
+Provides: python-BactInspectorMax
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-BactInspectorMax
+# Bactinspector
+A package to
+
+1. determine the most probable species based on sequence in fasta/fastq files using refseq and Mash (https://mash.readthedocs.io/en/latest/index.html)
+It will count the species of the top ref seq mash matches and report most frequent.
+2. determine the closest reference in refseq to a set of fasta/fastq files
+
+The PyPi package is called BactInspectorMax since my original BactInspector asked for the path to a sketch
+of refseq genomes. In May 2019 a new curated mash sketch of complete bacterial refseq genome was
+created and bundled into the package. This required a special request to increase the PyPi package limit,
+hence the Max.
+The command is still `bactinspector <sub command>` however.
+
+## Data
+The data bundled into bactinspector are the complete bacterial refseq genomes from May 2019. The species assignations have been corrected by changing species where the majority does not match the species described within the refseq info found [here](ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt) and the excellent [Bacsort resource](https://github.com/rrwick/Bacsort) from [Ryan Wick](https://twitter.com/rrwick)
+
+## Dependencies
+[Mash](https://github.com/marbl/Mash/) (> v2.1)
+Installation with conda is recommended
+
+## Installation
+pip3 install bactinspectorMax
+
+## Usage
+```
+usage: bactinspector [-h] [-v]
+ {check_species,closest_match,info,create_species_info} ...
+
+A module to determine the most probable species based on sequence in fasta files using refseq and Mash (https://mash.readthedocs.io/en/latest/index.html)
+It will count the species of the top ref seq mash matches and report most frequent.
+
+In order to use the module:
+ • Specify an input directory and output directory (default is current directory)
+ • Specify either a
+ • fasta file pattern with -f (e.g "*.fas") or
+ • mash sketch file pattern with -m (e.g "*.msh") if you have already sketched the fasta files
+ • By default the top 10 matches will be used. Change this with -n
+ • Speed things up by changing the number of parallel processes to match the cores on your computer using -p
+ • If mash is not in your PATH specify the directory containing the mash executable with -mp
+
+If you want to update the genomes used, follow the instructions on https://gitlab.com/antunderwood/bactinspector/wikis/Updating-the-genomes-in-BactInspector
+and use the create_species_info command to make the required file
+
+positional arguments:
+ {check_species,closest_match,create_species_info}
+ The following commands are available. Type
+ bactinspector <COMMAND> -h for more help on a specific
+ commands
+ check_species Check the most frequent matches to a species in refseq
+ closest_match Report the closest matches to a set of sequences
+ create_species_info
+ Create species info TSV for locally created mash
+ sketches
+ info Provide information about the data in bactinspector
+optional arguments:
+ -h, --help show this help message and exit
+ -v, --version print out software version
+```
+
+### check_species
+Assign a species using matches to refseq. Based on observed intra-species mash distances, a result maybe marked as uncertain if the distance to the best hit is greater than 1.2 x the observed maximum intra-species distance.
+```
+usage: bactinspector check_species [-h] [-i INPUT_DIR] [-o OUTPUT_DIR]
+ [-p PARALLEL_PROCESSES]
+ [-n NUM_BEST_MATCHES] [-d DISTANCE_CUTOFF]
+ [-v ALLOWED_VARIANCE]
+ [-vl ALLOWED_VARIANCE_RARER_SPECIES] [-s]
+ [-l LOCAL_MASH_AND_INFO_FILE_PREFIX]
+ [-mp MASH_PATH]
+ (-f FASTA_FILE_PATTERN | -fq FASTQ_FILE_PATTERN | -m MASH_SKETCH_FILE_PATTERN)
+
+optional arguments:
+ -h, --help show this help message and exit
+ -i INPUT_DIR, --input_dir INPUT_DIR
+ path to input_directory
+ -o OUTPUT_DIR, --output_dir OUTPUT_DIR
+ path to output_directory
+ -p PARALLEL_PROCESSES, --parallel_processes PARALLEL_PROCESSES
+ number of processes to run in parallel
+ -n NUM_BEST_MATCHES, --num_best_matches NUM_BEST_MATCHES
+ number of best matches to return
+ -d DISTANCE_CUTOFF, --distance_cutoff DISTANCE_CUTOFF
+ mash distance cutoff (default 0.05)
+ -v ALLOWED_VARIANCE, --allowed_variance ALLOWED_VARIANCE
+ proportion of max_distance allowed over which a result
+ will be marked as uncertain (default 0.1)
+ -vl ALLOWED_VARIANCE_RARER_SPECIES, --allowed_variance_rarer_species ALLOWED_VARIANCE_RARER_SPECIES
+ proportion of max_distance allowed over which a result
+ will be marked as uncertain for species which have
+ fewer than 10 representatives in refseq (default 0.5)
+ -s, --stdout_summary output a summary of the result to STDOUT
+ -l LOCAL_MASH_AND_INFO_FILE_PREFIX, --local_mash_and_info_file_prefix LOCAL_MASH_AND_INFO_FILE_PREFIX
+ the path prefix to the mash sketch file and
+ corresponding info file
+ -mp MASH_PATH, --mash_path MASH_PATH
+ path to the mash executable. If not provided it is
+ assumed mash is in the PATH
+ -f FASTA_FILE_PATTERN, --fasta_file_pattern FASTA_FILE_PATTERN
+ pattern to match fasta files e.g "*.fas"
+ -fq FASTQ_FILE_PATTERN, --fastq_file_pattern FASTQ_FILE_PATTERN
+ pattern to match fastq files e.g "*.fastq.gz"
+ -m MASH_SKETCH_FILE_PATTERN, --mash_sketch_file_pattern MASH_SKETCH_FILE_PATTERN
+ pattern to match mash sketch files e.g "*.msh"
+```
+
+### closest_match
+Find the closest match of a set of genomes to genomes within refseq. Useful as an objective way of choosing a reference genome when mapping
+
+```
+usage: bactinspector closest_match [-h] [-i INPUT_DIR] [-o OUTPUT_DIR]
+ [-p PARALLEL_PROCESSES] [-r]
+ [-l LOCAL_MASH_AND_INFO_FILE_PREFIX]
+ [-mp MASH_PATH]
+ (-f FASTA_FILE_PATTERN | -fq FASTQ_FILE_PATTERN | -m MASH_SKETCH_FILE_PATTERN)
+
+optional arguments:
+ -h, --help show this help message and exit
+ -i INPUT_DIR, --input_dir INPUT_DIR
+ path to input_directory
+ -o OUTPUT_DIR, --output_dir OUTPUT_DIR
+ path to output_directory
+ -p PARALLEL_PROCESSES, --parallel_processes PARALLEL_PROCESSES
+ number of processes to run in parallel
+ -r, --ref_and_rep_only
+ only include reference and representative sequences
+ -l LOCAL_MASH_AND_INFO_FILE_PREFIX, --local_mash_and_info_file_prefix LOCAL_MASH_AND_INFO_FILE_PREFIX
+ the path prefix to the mash sketch file and
+ corresponding info file
+ -mp MASH_PATH, --mash_path MASH_PATH
+ path to the mash executable. If not provided it is
+ assumed mash is in the PATH
+ -f FASTA_FILE_PATTERN, --fasta_file_pattern FASTA_FILE_PATTERN
+ pattern to match fasta files e.g "*.fas"
+ -fq FASTQ_FILE_PATTERN, --fastq_file_pattern FASTQ_FILE_PATTERN
+ pattern to match fastq files e.g "*.fastq.gz"
+ -m MASH_SKETCH_FILE_PATTERN, --mash_sketch_file_pattern MASH_SKETCH_FILE_PATTERN
+ pattern to match mash sketch files e.g "*.msh"
+```
+### info
+Find out what sequences are present in the mash sketch using a term to search the species name. You can either specify `-s` to search the aggregated species information or `-i` to search the individual refseq records.
+
+```
+usage: bactinspector info [-h] -t SEARCH_TERM (-s | -i)
+
+optional arguments:
+ -h, --help show this help message and exit
+ -t SEARCH_TERM, --search_term SEARCH_TERM
+ search term to use when searching species within
+ bactinspector
+ -s, --summary search the aggregate data
+ -i, --individual_records
+ search the individual refseq records
+```
+
+### create_species_info
+
+An updated mash sketch file and corresponding info file can be created using the process described [here](https://gitlab.com/antunderwood/bactinspector/wikis/Updating-the-genomes-in-BactInspector)
+This process uses the create_species_info commmand
+
+```
+usage: bactinspector create_species_info [-h] -m MASH_INFO_FILE -r
+ REFSEQ_SUMMARY_FILE -b
+ BACSORT_SPECIES_FILE -x
+ BACSORT_EXCLUDED_ASSEMBLIES_FILE
+
+optional arguments:
+ -h, --help show this help message and exit
+ -m MASH_INFO_FILE, --mash_info_file MASH_INFO_FILE
+ path to info file created using mash info -t
+ -r REFSEQ_SUMMARY_FILE, --refseq_summary_file REFSEQ_SUMMARY_FILE
+ path to refseq assembly summary file downloaded via
+ wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteri
+ a/assembly_summary.txt
+ -b BACSORT_SPECIES_FILE, --bacsort_species_file BACSORT_SPECIES_FILE
+ path to bacsort_species_definitions.txt
+ -x BACSORT_EXCLUDED_ASSEMBLIES_FILE, --bacsort_excluded_assemblies_file BACSORT_EXCLUDED_ASSEMBLIES_FILE
+ path to bacsort_excluded_assemblies.txt
+```
+
+
+
+%package help
+Summary: Development documents and examples for BactInspectorMax
+Provides: python3-BactInspectorMax-doc
+%description help
+# Bactinspector
+A package to
+
+1. determine the most probable species based on sequence in fasta/fastq files using refseq and Mash (https://mash.readthedocs.io/en/latest/index.html)
+It will count the species of the top ref seq mash matches and report most frequent.
+2. determine the closest reference in refseq to a set of fasta/fastq files
+
+The PyPi package is called BactInspectorMax since my original BactInspector asked for the path to a sketch
+of refseq genomes. In May 2019 a new curated mash sketch of complete bacterial refseq genome was
+created and bundled into the package. This required a special request to increase the PyPi package limit,
+hence the Max.
+The command is still `bactinspector <sub command>` however.
+
+## Data
+The data bundled into bactinspector are the complete bacterial refseq genomes from May 2019. The species assignations have been corrected by changing species where the majority does not match the species described within the refseq info found [here](ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt) and the excellent [Bacsort resource](https://github.com/rrwick/Bacsort) from [Ryan Wick](https://twitter.com/rrwick)
+
+## Dependencies
+[Mash](https://github.com/marbl/Mash/) (> v2.1)
+Installation with conda is recommended
+
+## Installation
+pip3 install bactinspectorMax
+
+## Usage
+```
+usage: bactinspector [-h] [-v]
+ {check_species,closest_match,info,create_species_info} ...
+
+A module to determine the most probable species based on sequence in fasta files using refseq and Mash (https://mash.readthedocs.io/en/latest/index.html)
+It will count the species of the top ref seq mash matches and report most frequent.
+
+In order to use the module:
+ • Specify an input directory and output directory (default is current directory)
+ • Specify either a
+ • fasta file pattern with -f (e.g "*.fas") or
+ • mash sketch file pattern with -m (e.g "*.msh") if you have already sketched the fasta files
+ • By default the top 10 matches will be used. Change this with -n
+ • Speed things up by changing the number of parallel processes to match the cores on your computer using -p
+ • If mash is not in your PATH specify the directory containing the mash executable with -mp
+
+If you want to update the genomes used, follow the instructions on https://gitlab.com/antunderwood/bactinspector/wikis/Updating-the-genomes-in-BactInspector
+and use the create_species_info command to make the required file
+
+positional arguments:
+ {check_species,closest_match,create_species_info}
+ The following commands are available. Type
+ bactinspector <COMMAND> -h for more help on a specific
+ commands
+ check_species Check the most frequent matches to a species in refseq
+ closest_match Report the closest matches to a set of sequences
+ create_species_info
+ Create species info TSV for locally created mash
+ sketches
+ info Provide information about the data in bactinspector
+optional arguments:
+ -h, --help show this help message and exit
+ -v, --version print out software version
+```
+
+### check_species
+Assign a species using matches to refseq. Based on observed intra-species mash distances, a result maybe marked as uncertain if the distance to the best hit is greater than 1.2 x the observed maximum intra-species distance.
+```
+usage: bactinspector check_species [-h] [-i INPUT_DIR] [-o OUTPUT_DIR]
+ [-p PARALLEL_PROCESSES]
+ [-n NUM_BEST_MATCHES] [-d DISTANCE_CUTOFF]
+ [-v ALLOWED_VARIANCE]
+ [-vl ALLOWED_VARIANCE_RARER_SPECIES] [-s]
+ [-l LOCAL_MASH_AND_INFO_FILE_PREFIX]
+ [-mp MASH_PATH]
+ (-f FASTA_FILE_PATTERN | -fq FASTQ_FILE_PATTERN | -m MASH_SKETCH_FILE_PATTERN)
+
+optional arguments:
+ -h, --help show this help message and exit
+ -i INPUT_DIR, --input_dir INPUT_DIR
+ path to input_directory
+ -o OUTPUT_DIR, --output_dir OUTPUT_DIR
+ path to output_directory
+ -p PARALLEL_PROCESSES, --parallel_processes PARALLEL_PROCESSES
+ number of processes to run in parallel
+ -n NUM_BEST_MATCHES, --num_best_matches NUM_BEST_MATCHES
+ number of best matches to return
+ -d DISTANCE_CUTOFF, --distance_cutoff DISTANCE_CUTOFF
+ mash distance cutoff (default 0.05)
+ -v ALLOWED_VARIANCE, --allowed_variance ALLOWED_VARIANCE
+ proportion of max_distance allowed over which a result
+ will be marked as uncertain (default 0.1)
+ -vl ALLOWED_VARIANCE_RARER_SPECIES, --allowed_variance_rarer_species ALLOWED_VARIANCE_RARER_SPECIES
+ proportion of max_distance allowed over which a result
+ will be marked as uncertain for species which have
+ fewer than 10 representatives in refseq (default 0.5)
+ -s, --stdout_summary output a summary of the result to STDOUT
+ -l LOCAL_MASH_AND_INFO_FILE_PREFIX, --local_mash_and_info_file_prefix LOCAL_MASH_AND_INFO_FILE_PREFIX
+ the path prefix to the mash sketch file and
+ corresponding info file
+ -mp MASH_PATH, --mash_path MASH_PATH
+ path to the mash executable. If not provided it is
+ assumed mash is in the PATH
+ -f FASTA_FILE_PATTERN, --fasta_file_pattern FASTA_FILE_PATTERN
+ pattern to match fasta files e.g "*.fas"
+ -fq FASTQ_FILE_PATTERN, --fastq_file_pattern FASTQ_FILE_PATTERN
+ pattern to match fastq files e.g "*.fastq.gz"
+ -m MASH_SKETCH_FILE_PATTERN, --mash_sketch_file_pattern MASH_SKETCH_FILE_PATTERN
+ pattern to match mash sketch files e.g "*.msh"
+```
+
+### closest_match
+Find the closest match of a set of genomes to genomes within refseq. Useful as an objective way of choosing a reference genome when mapping
+
+```
+usage: bactinspector closest_match [-h] [-i INPUT_DIR] [-o OUTPUT_DIR]
+ [-p PARALLEL_PROCESSES] [-r]
+ [-l LOCAL_MASH_AND_INFO_FILE_PREFIX]
+ [-mp MASH_PATH]
+ (-f FASTA_FILE_PATTERN | -fq FASTQ_FILE_PATTERN | -m MASH_SKETCH_FILE_PATTERN)
+
+optional arguments:
+ -h, --help show this help message and exit
+ -i INPUT_DIR, --input_dir INPUT_DIR
+ path to input_directory
+ -o OUTPUT_DIR, --output_dir OUTPUT_DIR
+ path to output_directory
+ -p PARALLEL_PROCESSES, --parallel_processes PARALLEL_PROCESSES
+ number of processes to run in parallel
+ -r, --ref_and_rep_only
+ only include reference and representative sequences
+ -l LOCAL_MASH_AND_INFO_FILE_PREFIX, --local_mash_and_info_file_prefix LOCAL_MASH_AND_INFO_FILE_PREFIX
+ the path prefix to the mash sketch file and
+ corresponding info file
+ -mp MASH_PATH, --mash_path MASH_PATH
+ path to the mash executable. If not provided it is
+ assumed mash is in the PATH
+ -f FASTA_FILE_PATTERN, --fasta_file_pattern FASTA_FILE_PATTERN
+ pattern to match fasta files e.g "*.fas"
+ -fq FASTQ_FILE_PATTERN, --fastq_file_pattern FASTQ_FILE_PATTERN
+ pattern to match fastq files e.g "*.fastq.gz"
+ -m MASH_SKETCH_FILE_PATTERN, --mash_sketch_file_pattern MASH_SKETCH_FILE_PATTERN
+ pattern to match mash sketch files e.g "*.msh"
+```
+### info
+Find out what sequences are present in the mash sketch using a term to search the species name. You can either specify `-s` to search the aggregated species information or `-i` to search the individual refseq records.
+
+```
+usage: bactinspector info [-h] -t SEARCH_TERM (-s | -i)
+
+optional arguments:
+ -h, --help show this help message and exit
+ -t SEARCH_TERM, --search_term SEARCH_TERM
+ search term to use when searching species within
+ bactinspector
+ -s, --summary search the aggregate data
+ -i, --individual_records
+ search the individual refseq records
+```
+
+### create_species_info
+
+An updated mash sketch file and corresponding info file can be created using the process described [here](https://gitlab.com/antunderwood/bactinspector/wikis/Updating-the-genomes-in-BactInspector)
+This process uses the create_species_info commmand
+
+```
+usage: bactinspector create_species_info [-h] -m MASH_INFO_FILE -r
+ REFSEQ_SUMMARY_FILE -b
+ BACSORT_SPECIES_FILE -x
+ BACSORT_EXCLUDED_ASSEMBLIES_FILE
+
+optional arguments:
+ -h, --help show this help message and exit
+ -m MASH_INFO_FILE, --mash_info_file MASH_INFO_FILE
+ path to info file created using mash info -t
+ -r REFSEQ_SUMMARY_FILE, --refseq_summary_file REFSEQ_SUMMARY_FILE
+ path to refseq assembly summary file downloaded via
+ wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteri
+ a/assembly_summary.txt
+ -b BACSORT_SPECIES_FILE, --bacsort_species_file BACSORT_SPECIES_FILE
+ path to bacsort_species_definitions.txt
+ -x BACSORT_EXCLUDED_ASSEMBLIES_FILE, --bacsort_excluded_assemblies_file BACSORT_EXCLUDED_ASSEMBLIES_FILE
+ path to bacsort_excluded_assemblies.txt
+```
+
+
+
+%prep
+%autosetup -n BactInspectorMax-0.1.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-BactInspectorMax -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Thu May 18 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..392f49b
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+c52597deeb6185088a486479e9fac797 BactInspectorMax-0.1.3.tar.gz