summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-06-20 04:27:19 +0000
committerCoprDistGit <infra@openeuler.org>2023-06-20 04:27:19 +0000
commit9075c9ff3909d371b32699a423592040a9712428 (patch)
tree43184bc3ea903da3625d322d7fe51803b2d33673
parentd09143c2122ccdd3fa44786796a2e18f4a40478c (diff)
automatic import of python-GeneGrouperopeneuler20.03
-rw-r--r--.gitignore1
-rw-r--r--python-genegrouper.spec585
-rw-r--r--sources1
3 files changed, 587 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..16ce8d6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/GeneGrouper-1.0.3.tar.gz
diff --git a/python-genegrouper.spec b/python-genegrouper.spec
new file mode 100644
index 0000000..8303f54
--- /dev/null
+++ b/python-genegrouper.spec
@@ -0,0 +1,585 @@
+%global _empty_manifest_terminate_build 0
+Name: python-GeneGrouper
+Version: 1.0.3
+Release: 1
+Summary: Find and cluster genomic regions containing a seed gene
+License: MIT License
+URL: https://github.com/agmcfarland/GeneGrouper
+Source0: https://mirrors.aliyun.com/pypi/web/packages/26/9b/a432e0124b851931e00c00871b667a06f318bc23c46edab6fb7eb24a6c64/GeneGrouper-1.0.3.tar.gz
+BuildArch: noarch
+
+
+%description
+<img src="docs/overview_figure.png" alt="GeneGrouper overview figure" width=1000>
+[Why use GeneGrouper?](https://github.com/agmcfarland/GeneGrouper/wiki#what-is-genegrouper)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper outputs](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+[See FAQs](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions)
+# Installation
+GeneGrouper can be installed using pip
+```pip install GeneGrouper```
+[GeneGrouper has multiple dependences.]((https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies#requirements-and-dependencies))
+Follow the code below to create a self-contained conda environment for GeneGrouper. **Recommended**
+**Installing Python and bioinformatic dependencies for grouping**
+```
+conda create -n GeneGrouper_env python=3.9
+source activate GeneGrouper_env #or try: conda activate GeneGrouper_env
+conda config --add channels defaults
+conda config --add channels bioconda
+conda config --add channels conda-forge
+pip install biopython scipy scikit-learn pandas matplotlib GeneGrouper
+conda install -c bioconda mcl blast mmseqs2 fasttree mafft
+```
+**Installing R and required packages for visualizations**
+```
+conda install -c conda-forge r-base=4.1.1 r-svglite r-reshape r-ggplot2 r-cowplot r-dplyr r-gggenes r-ape r-phytools r-BiocManager r-codetools
+# enter R environment
+R
+# install additional packages from CRAN
+install.packages('groupdata2',repos='https://cloud.r-project.org/', quiet=TRUE)
+# install additional packages from
+BiocManager::install("ggtree")
+# quit
+q(save="no")
+```
+[For more information, see the installation wiki page](https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies)
+# Inputs
+### GeneGrouper has two required inputs:
+1. A translated gene sequence in fasta format (with file extension .fasta/.txt)
+2. A folder containing RefSeq GenBank-format genomes (with the file extension .gbff). [See instructions to download many RefSeq genomes at a time.](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions#1-where-can-i-download-genbank-format-refseq-genomes-with-file-extension-gbff)
+# Basic usage
+#### Use `build_database` to make a GeneGrouper database of your RefSeq .gbff genomes
+```
+GeneGrouper -g /path/to/gbff -d /path/to/main_directory \
+build_database
+```
+#### Use `find_regions` to search for regions containing a gene of interest and output to a search-specific directory
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+find_regions \
+-f /path/to/query_gene.fasta
+```
+#### Use `visualize --visual_type main` to output visualizations of group gene architectures and their distribution within genomes and taxa
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type main
+```
+#### Use `visualize --visual_type group` to inspect a GeneGrouper group more closely. Replace <> with a group ID number.
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type group <>
+```
+#### Use `visualize --visual_type tree` to make a phylogenetic tree of each group's seed gene
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type tree
+```
+[See advanced usage examples](https://github.com/agmcfarland/GeneGrouper/wiki/Advanced-usage)
+[See tutorial with provided example data](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+# Outputs
+ 1. **For each search ```find_regions``` outputs:**
+* **Four** tabular files with quantitative and qualitative descriptions of grouping results.
+* **One** fasta file containing all genes used in the analysis.
+2. **For each search, ```visualize --visual_type main``` outputs:**
+* **Three** main visualizations provided.
+3. **For each search, ```visualize --visual_type group \--group_label <n>``` outputs:**
+* **One** additional visualization per group, where ```--group_label <n>``` has `<n>` replaced with the group number.
+* **Two** tabular files containing subgroup information for each ```--group_label <n>``` supplied.
+4. **For each search, ```visualize --visual_type tree``` outputs:**
+* **One** phylogenetic tree of each seed gene in each group.
+[See complete output file descriptions](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+Each search and visualization will have the following file structure. Files under `visualizations` may differ.
+```
+├── main_directory
+│ ├── search_results
+│ │ ├── group_statistics_summmary.csv
+│ │ ├── representative_group_member_summary.csv
+│ │ ├── group_taxa_summary.csv
+│ │ ├── group_regions.csv
+│ │ ├── group_region_seqs.faa
+│ │ ├── visualizations
+│ │ │ ├── group_summary.png
+│ │ │ ├── groups_by_taxa.png
+│ │ │ ├── taxa_searched.png
+│ │ │ ├── inspect_group_-1.png
+│ │ │ ├── representative_seed_phylogeny.png
+│ │ ├── internal_data
+│ │ ├── subgroups
+│ │ ├── seed_results.db
+```
+# Usage options
+### Global flags
+```
+usage: GeneGrouper [-h] [-d] [-n] [-g] [-t]
+ {build_database,find_regions,visualize} ...
+ -h, --help show this help message and exit
+ -d , --project_directory
+ Main directory to contain the base files used for
+ region searching and clustering. Default=current
+ directory.
+ -n , --search_name Name of the directory to contain search-specific
+ results. Default=region_search
+ -g , --genomes_directory
+ Directory containing genbank-file format genomes with
+ the suffix .gbff. Default=./genomes.
+ -t , --threads Number of threads to use. Default=all threads.
+```
+### Subcommands
+```
+ build_database Convert a set of genomes into a useable format for
+ GeneGrouper
+ find_regions Find regions given a translated gene and a set of
+ genomes
+ visualize Visualize GeneGrouper outputs. Three visualization options are provided.
+ Check the --visual_type help description.
+```
+### Subcommand flags
+```build_database```
+```
+usage: GeneGrouper build_database [-h]
+ -h, --help show this help message and exit
+```
+```find_regions```
+```
+usage: GeneGrouper find_regions [-h] -f [-us] [-ds] [-i] [-c] [-hk] [--min_group_size] [-re] [--force]
+ -h, --help show this help message and exit
+ -f , --query_file Provide the absolute path to a fasta file containing a translated gene sequence.
+ -us , --upstream_search
+ Upstream search length in basepairs. Default=10000
+ -ds , --downstream_search
+ Downstream search length in basepairs. Default=10000
+ -i , --seed_identity
+ Identity cutoff for initial blast search. Default=60
+ -c , --seed_coverage
+ Coverage cutoff for initial blast search. Default=90
+ -hk , --seed_hits_kept
+ Number of blast hits to keep. Default=None
+ --min_group_size
+ The minimum number of gene regions to constitute a group. Default=ln(jaccard distance length)
+ -re , --recluster_iterations
+ Number of region re-clustering attempts after the initial clustering. Default=0
+ --force Flag to overwrite search name directory.
+```
+```visualize```
+```
+usage: GeneGrouper visualize [-h] [--visual_type] [--group_label]
+ --visual_type Choices: [main, group, tree]. Use main for main visualizations. Use group to
+ inspect specific group. Use tree for a phylogenetic tree of representative
+ seed sequencess. Default=main
+ --group_label The integer identifier of the group you wish to inspect. Default=-1
+ --image_format Choices: [png, svg]. Output image format. Use svg if you want to edit the
+ images. Default=png.
+ --tip_label_type Choices: [full, group]. Use full to include the sequence ID followed by group
+ ID. Use group to only have the group ID. Default=full
+ --tip_label_size Specify the tip label size in the output image. Default=2
+```
+# Citation
+Alexander G McFarland, Nolan W Kennedy, Carolyn E Mills, Danielle Tullman-Ercek, Curtis Huttenhower, Erica M Hartmann, **Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper**, Bioinformatics, 2021;, btab752, https://doi.org/10.1093/bioinformatics/btab752
+# Contact
+Please message me at alexandermcfarland2022@u.northwestern.edu
+Follow me on twitter [@alexmcfarland_](https://twitter.com/alexmcfarland_)!
+
+%package -n python3-GeneGrouper
+Summary: Find and cluster genomic regions containing a seed gene
+Provides: python-GeneGrouper
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-GeneGrouper
+<img src="docs/overview_figure.png" alt="GeneGrouper overview figure" width=1000>
+[Why use GeneGrouper?](https://github.com/agmcfarland/GeneGrouper/wiki#what-is-genegrouper)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper outputs](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+[See FAQs](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions)
+# Installation
+GeneGrouper can be installed using pip
+```pip install GeneGrouper```
+[GeneGrouper has multiple dependences.]((https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies#requirements-and-dependencies))
+Follow the code below to create a self-contained conda environment for GeneGrouper. **Recommended**
+**Installing Python and bioinformatic dependencies for grouping**
+```
+conda create -n GeneGrouper_env python=3.9
+source activate GeneGrouper_env #or try: conda activate GeneGrouper_env
+conda config --add channels defaults
+conda config --add channels bioconda
+conda config --add channels conda-forge
+pip install biopython scipy scikit-learn pandas matplotlib GeneGrouper
+conda install -c bioconda mcl blast mmseqs2 fasttree mafft
+```
+**Installing R and required packages for visualizations**
+```
+conda install -c conda-forge r-base=4.1.1 r-svglite r-reshape r-ggplot2 r-cowplot r-dplyr r-gggenes r-ape r-phytools r-BiocManager r-codetools
+# enter R environment
+R
+# install additional packages from CRAN
+install.packages('groupdata2',repos='https://cloud.r-project.org/', quiet=TRUE)
+# install additional packages from
+BiocManager::install("ggtree")
+# quit
+q(save="no")
+```
+[For more information, see the installation wiki page](https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies)
+# Inputs
+### GeneGrouper has two required inputs:
+1. A translated gene sequence in fasta format (with file extension .fasta/.txt)
+2. A folder containing RefSeq GenBank-format genomes (with the file extension .gbff). [See instructions to download many RefSeq genomes at a time.](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions#1-where-can-i-download-genbank-format-refseq-genomes-with-file-extension-gbff)
+# Basic usage
+#### Use `build_database` to make a GeneGrouper database of your RefSeq .gbff genomes
+```
+GeneGrouper -g /path/to/gbff -d /path/to/main_directory \
+build_database
+```
+#### Use `find_regions` to search for regions containing a gene of interest and output to a search-specific directory
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+find_regions \
+-f /path/to/query_gene.fasta
+```
+#### Use `visualize --visual_type main` to output visualizations of group gene architectures and their distribution within genomes and taxa
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type main
+```
+#### Use `visualize --visual_type group` to inspect a GeneGrouper group more closely. Replace <> with a group ID number.
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type group <>
+```
+#### Use `visualize --visual_type tree` to make a phylogenetic tree of each group's seed gene
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type tree
+```
+[See advanced usage examples](https://github.com/agmcfarland/GeneGrouper/wiki/Advanced-usage)
+[See tutorial with provided example data](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+# Outputs
+ 1. **For each search ```find_regions``` outputs:**
+* **Four** tabular files with quantitative and qualitative descriptions of grouping results.
+* **One** fasta file containing all genes used in the analysis.
+2. **For each search, ```visualize --visual_type main``` outputs:**
+* **Three** main visualizations provided.
+3. **For each search, ```visualize --visual_type group \--group_label <n>``` outputs:**
+* **One** additional visualization per group, where ```--group_label <n>``` has `<n>` replaced with the group number.
+* **Two** tabular files containing subgroup information for each ```--group_label <n>``` supplied.
+4. **For each search, ```visualize --visual_type tree``` outputs:**
+* **One** phylogenetic tree of each seed gene in each group.
+[See complete output file descriptions](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+Each search and visualization will have the following file structure. Files under `visualizations` may differ.
+```
+├── main_directory
+│ ├── search_results
+│ │ ├── group_statistics_summmary.csv
+│ │ ├── representative_group_member_summary.csv
+│ │ ├── group_taxa_summary.csv
+│ │ ├── group_regions.csv
+│ │ ├── group_region_seqs.faa
+│ │ ├── visualizations
+│ │ │ ├── group_summary.png
+│ │ │ ├── groups_by_taxa.png
+│ │ │ ├── taxa_searched.png
+│ │ │ ├── inspect_group_-1.png
+│ │ │ ├── representative_seed_phylogeny.png
+│ │ ├── internal_data
+│ │ ├── subgroups
+│ │ ├── seed_results.db
+```
+# Usage options
+### Global flags
+```
+usage: GeneGrouper [-h] [-d] [-n] [-g] [-t]
+ {build_database,find_regions,visualize} ...
+ -h, --help show this help message and exit
+ -d , --project_directory
+ Main directory to contain the base files used for
+ region searching and clustering. Default=current
+ directory.
+ -n , --search_name Name of the directory to contain search-specific
+ results. Default=region_search
+ -g , --genomes_directory
+ Directory containing genbank-file format genomes with
+ the suffix .gbff. Default=./genomes.
+ -t , --threads Number of threads to use. Default=all threads.
+```
+### Subcommands
+```
+ build_database Convert a set of genomes into a useable format for
+ GeneGrouper
+ find_regions Find regions given a translated gene and a set of
+ genomes
+ visualize Visualize GeneGrouper outputs. Three visualization options are provided.
+ Check the --visual_type help description.
+```
+### Subcommand flags
+```build_database```
+```
+usage: GeneGrouper build_database [-h]
+ -h, --help show this help message and exit
+```
+```find_regions```
+```
+usage: GeneGrouper find_regions [-h] -f [-us] [-ds] [-i] [-c] [-hk] [--min_group_size] [-re] [--force]
+ -h, --help show this help message and exit
+ -f , --query_file Provide the absolute path to a fasta file containing a translated gene sequence.
+ -us , --upstream_search
+ Upstream search length in basepairs. Default=10000
+ -ds , --downstream_search
+ Downstream search length in basepairs. Default=10000
+ -i , --seed_identity
+ Identity cutoff for initial blast search. Default=60
+ -c , --seed_coverage
+ Coverage cutoff for initial blast search. Default=90
+ -hk , --seed_hits_kept
+ Number of blast hits to keep. Default=None
+ --min_group_size
+ The minimum number of gene regions to constitute a group. Default=ln(jaccard distance length)
+ -re , --recluster_iterations
+ Number of region re-clustering attempts after the initial clustering. Default=0
+ --force Flag to overwrite search name directory.
+```
+```visualize```
+```
+usage: GeneGrouper visualize [-h] [--visual_type] [--group_label]
+ --visual_type Choices: [main, group, tree]. Use main for main visualizations. Use group to
+ inspect specific group. Use tree for a phylogenetic tree of representative
+ seed sequencess. Default=main
+ --group_label The integer identifier of the group you wish to inspect. Default=-1
+ --image_format Choices: [png, svg]. Output image format. Use svg if you want to edit the
+ images. Default=png.
+ --tip_label_type Choices: [full, group]. Use full to include the sequence ID followed by group
+ ID. Use group to only have the group ID. Default=full
+ --tip_label_size Specify the tip label size in the output image. Default=2
+```
+# Citation
+Alexander G McFarland, Nolan W Kennedy, Carolyn E Mills, Danielle Tullman-Ercek, Curtis Huttenhower, Erica M Hartmann, **Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper**, Bioinformatics, 2021;, btab752, https://doi.org/10.1093/bioinformatics/btab752
+# Contact
+Please message me at alexandermcfarland2022@u.northwestern.edu
+Follow me on twitter [@alexmcfarland_](https://twitter.com/alexmcfarland_)!
+
+%package help
+Summary: Development documents and examples for GeneGrouper
+Provides: python3-GeneGrouper-doc
+%description help
+<img src="docs/overview_figure.png" alt="GeneGrouper overview figure" width=1000>
+[Why use GeneGrouper?](https://github.com/agmcfarland/GeneGrouper/wiki#what-is-genegrouper)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper outputs](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+[See FAQs](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions)
+# Installation
+GeneGrouper can be installed using pip
+```pip install GeneGrouper```
+[GeneGrouper has multiple dependences.]((https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies#requirements-and-dependencies))
+Follow the code below to create a self-contained conda environment for GeneGrouper. **Recommended**
+**Installing Python and bioinformatic dependencies for grouping**
+```
+conda create -n GeneGrouper_env python=3.9
+source activate GeneGrouper_env #or try: conda activate GeneGrouper_env
+conda config --add channels defaults
+conda config --add channels bioconda
+conda config --add channels conda-forge
+pip install biopython scipy scikit-learn pandas matplotlib GeneGrouper
+conda install -c bioconda mcl blast mmseqs2 fasttree mafft
+```
+**Installing R and required packages for visualizations**
+```
+conda install -c conda-forge r-base=4.1.1 r-svglite r-reshape r-ggplot2 r-cowplot r-dplyr r-gggenes r-ape r-phytools r-BiocManager r-codetools
+# enter R environment
+R
+# install additional packages from CRAN
+install.packages('groupdata2',repos='https://cloud.r-project.org/', quiet=TRUE)
+# install additional packages from
+BiocManager::install("ggtree")
+# quit
+q(save="no")
+```
+[For more information, see the installation wiki page](https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies)
+# Inputs
+### GeneGrouper has two required inputs:
+1. A translated gene sequence in fasta format (with file extension .fasta/.txt)
+2. A folder containing RefSeq GenBank-format genomes (with the file extension .gbff). [See instructions to download many RefSeq genomes at a time.](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions#1-where-can-i-download-genbank-format-refseq-genomes-with-file-extension-gbff)
+# Basic usage
+#### Use `build_database` to make a GeneGrouper database of your RefSeq .gbff genomes
+```
+GeneGrouper -g /path/to/gbff -d /path/to/main_directory \
+build_database
+```
+#### Use `find_regions` to search for regions containing a gene of interest and output to a search-specific directory
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+find_regions \
+-f /path/to/query_gene.fasta
+```
+#### Use `visualize --visual_type main` to output visualizations of group gene architectures and their distribution within genomes and taxa
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type main
+```
+#### Use `visualize --visual_type group` to inspect a GeneGrouper group more closely. Replace <> with a group ID number.
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type group <>
+```
+#### Use `visualize --visual_type tree` to make a phylogenetic tree of each group's seed gene
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type tree
+```
+[See advanced usage examples](https://github.com/agmcfarland/GeneGrouper/wiki/Advanced-usage)
+[See tutorial with provided example data](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+# Outputs
+ 1. **For each search ```find_regions``` outputs:**
+* **Four** tabular files with quantitative and qualitative descriptions of grouping results.
+* **One** fasta file containing all genes used in the analysis.
+2. **For each search, ```visualize --visual_type main``` outputs:**
+* **Three** main visualizations provided.
+3. **For each search, ```visualize --visual_type group \--group_label <n>``` outputs:**
+* **One** additional visualization per group, where ```--group_label <n>``` has `<n>` replaced with the group number.
+* **Two** tabular files containing subgroup information for each ```--group_label <n>``` supplied.
+4. **For each search, ```visualize --visual_type tree``` outputs:**
+* **One** phylogenetic tree of each seed gene in each group.
+[See complete output file descriptions](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+Each search and visualization will have the following file structure. Files under `visualizations` may differ.
+```
+├── main_directory
+│ ├── search_results
+│ │ ├── group_statistics_summmary.csv
+│ │ ├── representative_group_member_summary.csv
+│ │ ├── group_taxa_summary.csv
+│ │ ├── group_regions.csv
+│ │ ├── group_region_seqs.faa
+│ │ ├── visualizations
+│ │ │ ├── group_summary.png
+│ │ │ ├── groups_by_taxa.png
+│ │ │ ├── taxa_searched.png
+│ │ │ ├── inspect_group_-1.png
+│ │ │ ├── representative_seed_phylogeny.png
+│ │ ├── internal_data
+│ │ ├── subgroups
+│ │ ├── seed_results.db
+```
+# Usage options
+### Global flags
+```
+usage: GeneGrouper [-h] [-d] [-n] [-g] [-t]
+ {build_database,find_regions,visualize} ...
+ -h, --help show this help message and exit
+ -d , --project_directory
+ Main directory to contain the base files used for
+ region searching and clustering. Default=current
+ directory.
+ -n , --search_name Name of the directory to contain search-specific
+ results. Default=region_search
+ -g , --genomes_directory
+ Directory containing genbank-file format genomes with
+ the suffix .gbff. Default=./genomes.
+ -t , --threads Number of threads to use. Default=all threads.
+```
+### Subcommands
+```
+ build_database Convert a set of genomes into a useable format for
+ GeneGrouper
+ find_regions Find regions given a translated gene and a set of
+ genomes
+ visualize Visualize GeneGrouper outputs. Three visualization options are provided.
+ Check the --visual_type help description.
+```
+### Subcommand flags
+```build_database```
+```
+usage: GeneGrouper build_database [-h]
+ -h, --help show this help message and exit
+```
+```find_regions```
+```
+usage: GeneGrouper find_regions [-h] -f [-us] [-ds] [-i] [-c] [-hk] [--min_group_size] [-re] [--force]
+ -h, --help show this help message and exit
+ -f , --query_file Provide the absolute path to a fasta file containing a translated gene sequence.
+ -us , --upstream_search
+ Upstream search length in basepairs. Default=10000
+ -ds , --downstream_search
+ Downstream search length in basepairs. Default=10000
+ -i , --seed_identity
+ Identity cutoff for initial blast search. Default=60
+ -c , --seed_coverage
+ Coverage cutoff for initial blast search. Default=90
+ -hk , --seed_hits_kept
+ Number of blast hits to keep. Default=None
+ --min_group_size
+ The minimum number of gene regions to constitute a group. Default=ln(jaccard distance length)
+ -re , --recluster_iterations
+ Number of region re-clustering attempts after the initial clustering. Default=0
+ --force Flag to overwrite search name directory.
+```
+```visualize```
+```
+usage: GeneGrouper visualize [-h] [--visual_type] [--group_label]
+ --visual_type Choices: [main, group, tree]. Use main for main visualizations. Use group to
+ inspect specific group. Use tree for a phylogenetic tree of representative
+ seed sequencess. Default=main
+ --group_label The integer identifier of the group you wish to inspect. Default=-1
+ --image_format Choices: [png, svg]. Output image format. Use svg if you want to edit the
+ images. Default=png.
+ --tip_label_type Choices: [full, group]. Use full to include the sequence ID followed by group
+ ID. Use group to only have the group ID. Default=full
+ --tip_label_size Specify the tip label size in the output image. Default=2
+```
+# Citation
+Alexander G McFarland, Nolan W Kennedy, Carolyn E Mills, Danielle Tullman-Ercek, Curtis Huttenhower, Erica M Hartmann, **Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper**, Bioinformatics, 2021;, btab752, https://doi.org/10.1093/bioinformatics/btab752
+# Contact
+Please message me at alexandermcfarland2022@u.northwestern.edu
+Follow me on twitter [@alexmcfarland_](https://twitter.com/alexmcfarland_)!
+
+%prep
+%autosetup -n GeneGrouper-1.0.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-GeneGrouper -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..3dc8e13
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+335a17ebf09267c83ce8ca658b222657 GeneGrouper-1.0.3.tar.gz