automatic import of python-GeneGrouperopeneuler20.03

author: CoprDistGit <infra@openeuler.org> 2023-06-20 04:27:19 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-06-20 04:27:19 +0000
commit: 9075c9ff3909d371b32699a423592040a9712428 (patch)
tree: 43184bc3ea903da3625d322d7fe51803b2d33673
parent: d09143c2122ccdd3fa44786796a2e18f4a40478c (diff)
3 files changed, 587 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..16ce8d6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/GeneGrouper-1.0.3.tar.gz
diff --git a/python-genegrouper.spec b/python-genegrouper.spec
new file mode 100644
index 0000000..8303f54
--- /dev/null
+++ b/python-genegrouper.spec
@@ -0,0 +1,585 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-GeneGrouper
+Version:	1.0.3
+Release:	1
+Summary:	Find and cluster genomic regions containing a seed gene
+License:	MIT License
+URL:		https://github.com/agmcfarland/GeneGrouper
+Source0:	https://mirrors.aliyun.com/pypi/web/packages/26/9b/a432e0124b851931e00c00871b667a06f318bc23c46edab6fb7eb24a6c64/GeneGrouper-1.0.3.tar.gz
+BuildArch:	noarch
+
+
+%description
+<img src="docs/overview_figure.png" alt="GeneGrouper overview figure" width=1000>
+[Why use GeneGrouper?](https://github.com/agmcfarland/GeneGrouper/wiki#what-is-genegrouper)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper outputs](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+[See FAQs](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions)
+# Installation
+GeneGrouper can be installed using pip
+```pip install GeneGrouper```
+[GeneGrouper has multiple dependences.]((https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies#requirements-and-dependencies)) 
+Follow the code below to create a self-contained conda environment for GeneGrouper. **Recommended**
+**Installing Python and bioinformatic dependencies for grouping**
+```
+conda create -n GeneGrouper_env python=3.9
+source activate GeneGrouper_env #or try: conda activate GeneGrouper_env
+conda config --add channels defaults
+conda config --add channels bioconda
+conda config --add channels conda-forge
+pip install biopython scipy scikit-learn pandas matplotlib GeneGrouper
+conda install -c bioconda mcl blast mmseqs2 fasttree mafft
+```
+**Installing R and required packages for visualizations**
+```
+conda install -c conda-forge r-base=4.1.1 r-svglite r-reshape r-ggplot2 r-cowplot r-dplyr r-gggenes r-ape r-phytools r-BiocManager r-codetools
+# enter R environment
+R
+# install additional packages from CRAN
+install.packages('groupdata2',repos='https://cloud.r-project.org/', quiet=TRUE)
+# install additional packages from 
+BiocManager::install("ggtree")
+# quit
+q(save="no")
+```
+[For more information, see the installation wiki page](https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies)
+# Inputs
+### GeneGrouper has two required inputs:
+1. A translated gene sequence in fasta format (with file extension .fasta/.txt)
+2. A folder containing RefSeq GenBank-format genomes (with the file extension .gbff). [See instructions to download many RefSeq genomes at a time.](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions#1-where-can-i-download-genbank-format-refseq-genomes-with-file-extension-gbff)
+# Basic usage
+#### Use `build_database` to make a GeneGrouper database of your RefSeq .gbff genomes
+```
+GeneGrouper -g /path/to/gbff -d /path/to/main_directory \
+build_database
+```
+#### Use `find_regions` to search for regions containing a gene of interest and output to a search-specific directory
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+find_regions \
+-f /path/to/query_gene.fasta
+```
+#### Use `visualize --visual_type main` to output visualizations of group gene architectures and their distribution within genomes and taxa
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type main
+```
+#### Use `visualize --visual_type group` to inspect a GeneGrouper group more closely. Replace <> with a group ID number.
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type group <>
+```
+#### Use `visualize --visual_type tree` to make a phylogenetic tree of each group's seed gene
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type tree
+```
+[See advanced usage examples](https://github.com/agmcfarland/GeneGrouper/wiki/Advanced-usage)
+[See tutorial with provided example data](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+# Outputs
+ 1. **For each search ```find_regions``` outputs:**
+* **Four** tabular files with quantitative and qualitative descriptions of grouping results. 
+* **One** fasta file containing all genes used in the analysis.
+2. **For each search,  ```visualize --visual_type main``` outputs:**
+* **Three** main visualizations provided.
+3. **For each search, ```visualize --visual_type group \--group_label <n>``` outputs:**
+* **One** additional visualization per group, where ```--group_label <n>``` has `<n>` replaced with the group number.  
+* **Two** tabular files containing subgroup information for each ```--group_label <n>``` supplied.
+4. **For each search, ```visualize --visual_type tree``` outputs:**
+* **One** phylogenetic tree of each seed gene in each group.
+[See complete output file descriptions](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+Each search and visualization will have the following file structure. Files under `visualizations` may differ.
+```
+├── main_directory
+│   ├── search_results
+│   │   ├── group_statistics_summmary.csv
+│   │   ├── representative_group_member_summary.csv
+│   │   ├── group_taxa_summary.csv
+│   │   ├── group_regions.csv
+│   │   ├── group_region_seqs.faa
+│   │   ├── visualizations
+│   │   │   ├── group_summary.png
+│   │   │   ├── groups_by_taxa.png
+│   │   │   ├── taxa_searched.png
+│   │   │   ├── inspect_group_-1.png
+│   │   │   ├── representative_seed_phylogeny.png
+│   │   ├── internal_data
+│   │   ├── subgroups
+│   │   ├── seed_results.db
+```
+# Usage options
+### Global flags
+```
+usage: GeneGrouper [-h] [-d] [-n] [-g] [-t]
+                   {build_database,find_regions,visualize} ...
+  -h, --help            show this help message and exit
+  -d , --project_directory
+                        Main directory to contain the base files used for
+                        region searching and clustering. Default=current
+                        directory.
+  -n , --search_name    Name of the directory to contain search-specific
+                        results. Default=region_search
+  -g , --genomes_directory
+                        Directory containing genbank-file format genomes with
+                        the suffix .gbff. Default=./genomes.
+  -t , --threads        Number of threads to use. Default=all threads.
+```
+### Subcommands
+```
+    build_database      Convert a set of genomes into a useable format for
+                        GeneGrouper
+    find_regions        Find regions given a translated gene and a set of
+                        genomes
+    visualize           Visualize GeneGrouper outputs. Three visualization options are provided.
+                        Check the --visual_type help description.
+```
+### Subcommand flags
+```build_database```
+```
+usage: GeneGrouper build_database [-h]
+  -h, --help  show this help message and exit
+```
+```find_regions```
+```
+usage: GeneGrouper find_regions [-h] -f  [-us] [-ds] [-i] [-c] [-hk] [--min_group_size] [-re] [--force]
+  -h, --help            show this help message and exit
+  -f , --query_file     Provide the absolute path to a fasta file containing a translated gene sequence.
+  -us , --upstream_search
+                        Upstream search length in basepairs. Default=10000
+  -ds , --downstream_search
+                        Downstream search length in basepairs. Default=10000
+  -i , --seed_identity
+                        Identity cutoff for initial blast search. Default=60
+  -c , --seed_coverage
+                        Coverage cutoff for initial blast search. Default=90
+  -hk , --seed_hits_kept
+                        Number of blast hits to keep. Default=None
+  --min_group_size
+                        The minimum number of gene regions to constitute a group. Default=ln(jaccard distance length)
+  -re , --recluster_iterations
+                        Number of region re-clustering attempts after the initial clustering. Default=0
+  --force               Flag to overwrite search name directory.
+```
+```visualize```
+```
+usage: GeneGrouper visualize [-h] [--visual_type] [--group_label]
+  --visual_type      Choices: [main, group, tree]. Use main for main visualizations. Use group to
+                     inspect specific group. Use tree for a phylogenetic tree of representative
+                     seed sequencess. Default=main
+  --group_label      The integer identifier of the group you wish to inspect. Default=-1
+  --image_format     Choices: [png, svg]. Output image format. Use svg if you want to edit the
+                     images. Default=png.
+  --tip_label_type   Choices: [full, group]. Use full to include the sequence ID followed by group
+                     ID. Use group to only have the group ID. Default=full
+  --tip_label_size   Specify the tip label size in the output image. Default=2
+```
+# Citation
+Alexander G McFarland, Nolan W Kennedy, Carolyn E Mills, Danielle Tullman-Ercek, Curtis Huttenhower, Erica M Hartmann, **Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper**, Bioinformatics, 2021;, btab752, https://doi.org/10.1093/bioinformatics/btab752
+# Contact
+Please message me at alexandermcfarland2022@u.northwestern.edu 
+Follow me on twitter [@alexmcfarland_](https://twitter.com/alexmcfarland_)! 
+
+%package -n python3-GeneGrouper
+Summary:	Find and cluster genomic regions containing a seed gene
+Provides:	python-GeneGrouper
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-GeneGrouper
+<img src="docs/overview_figure.png" alt="GeneGrouper overview figure" width=1000>
+[Why use GeneGrouper?](https://github.com/agmcfarland/GeneGrouper/wiki#what-is-genegrouper)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper outputs](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+[See FAQs](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions)
+# Installation
+GeneGrouper can be installed using pip
+```pip install GeneGrouper```
+[GeneGrouper has multiple dependences.]((https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies#requirements-and-dependencies)) 
+Follow the code below to create a self-contained conda environment for GeneGrouper. **Recommended**
+**Installing Python and bioinformatic dependencies for grouping**
+```
+conda create -n GeneGrouper_env python=3.9
+source activate GeneGrouper_env #or try: conda activate GeneGrouper_env
+conda config --add channels defaults
+conda config --add channels bioconda
+conda config --add channels conda-forge
+pip install biopython scipy scikit-learn pandas matplotlib GeneGrouper
+conda install -c bioconda mcl blast mmseqs2 fasttree mafft
+```
+**Installing R and required packages for visualizations**
+```
+conda install -c conda-forge r-base=4.1.1 r-svglite r-reshape r-ggplot2 r-cowplot r-dplyr r-gggenes r-ape r-phytools r-BiocManager r-codetools
+# enter R environment
+R
+# install additional packages from CRAN
+install.packages('groupdata2',repos='https://cloud.r-project.org/', quiet=TRUE)
+# install additional packages from 
+BiocManager::install("ggtree")
+# quit
+q(save="no")
+```
+[For more information, see the installation wiki page](https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies)
+# Inputs
+### GeneGrouper has two required inputs:
+1. A translated gene sequence in fasta format (with file extension .fasta/.txt)
+2. A folder containing RefSeq GenBank-format genomes (with the file extension .gbff). [See instructions to download many RefSeq genomes at a time.](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions#1-where-can-i-download-genbank-format-refseq-genomes-with-file-extension-gbff)
+# Basic usage
+#### Use `build_database` to make a GeneGrouper database of your RefSeq .gbff genomes
+```
+GeneGrouper -g /path/to/gbff -d /path/to/main_directory \
+build_database
+```
+#### Use `find_regions` to search for regions containing a gene of interest and output to a search-specific directory
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+find_regions \
+-f /path/to/query_gene.fasta
+```
+#### Use `visualize --visual_type main` to output visualizations of group gene architectures and their distribution within genomes and taxa
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type main
+```
+#### Use `visualize --visual_type group` to inspect a GeneGrouper group more closely. Replace <> with a group ID number.
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type group <>
+```
+#### Use `visualize --visual_type tree` to make a phylogenetic tree of each group's seed gene
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type tree
+```
+[See advanced usage examples](https://github.com/agmcfarland/GeneGrouper/wiki/Advanced-usage)
+[See tutorial with provided example data](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+# Outputs
+ 1. **For each search ```find_regions``` outputs:**
+* **Four** tabular files with quantitative and qualitative descriptions of grouping results. 
+* **One** fasta file containing all genes used in the analysis.
+2. **For each search,  ```visualize --visual_type main``` outputs:**
+* **Three** main visualizations provided.
+3. **For each search, ```visualize --visual_type group \--group_label <n>``` outputs:**
+* **One** additional visualization per group, where ```--group_label <n>``` has `<n>` replaced with the group number.  
+* **Two** tabular files containing subgroup information for each ```--group_label <n>``` supplied.
+4. **For each search, ```visualize --visual_type tree``` outputs:**
+* **One** phylogenetic tree of each seed gene in each group.
+[See complete output file descriptions](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+Each search and visualization will have the following file structure. Files under `visualizations` may differ.
+```
+├── main_directory
+│   ├── search_results
+│   │   ├── group_statistics_summmary.csv
+│   │   ├── representative_group_member_summary.csv
+│   │   ├── group_taxa_summary.csv
+│   │   ├── group_regions.csv
+│   │   ├── group_region_seqs.faa
+│   │   ├── visualizations
+│   │   │   ├── group_summary.png
+│   │   │   ├── groups_by_taxa.png
+│   │   │   ├── taxa_searched.png
+│   │   │   ├── inspect_group_-1.png
+│   │   │   ├── representative_seed_phylogeny.png
+│   │   ├── internal_data
+│   │   ├── subgroups
+│   │   ├── seed_results.db
+```
+# Usage options
+### Global flags
+```
+usage: GeneGrouper [-h] [-d] [-n] [-g] [-t]
+                   {build_database,find_regions,visualize} ...
+  -h, --help            show this help message and exit
+  -d , --project_directory
+                        Main directory to contain the base files used for
+                        region searching and clustering. Default=current
+                        directory.
+  -n , --search_name    Name of the directory to contain search-specific
+                        results. Default=region_search
+  -g , --genomes_directory
+                        Directory containing genbank-file format genomes with
+                        the suffix .gbff. Default=./genomes.
+  -t , --threads        Number of threads to use. Default=all threads.
+```
+### Subcommands
+```
+    build_database      Convert a set of genomes into a useable format for
+                        GeneGrouper
+    find_regions        Find regions given a translated gene and a set of
+                        genomes
+    visualize           Visualize GeneGrouper outputs. Three visualization options are provided.
+                        Check the --visual_type help description.
+```
+### Subcommand flags
+```build_database```
+```
+usage: GeneGrouper build_database [-h]
+  -h, --help  show this help message and exit
+```
+```find_regions```
+```
+usage: GeneGrouper find_regions [-h] -f  [-us] [-ds] [-i] [-c] [-hk] [--min_group_size] [-re] [--force]
+  -h, --help            show this help message and exit
+  -f , --query_file     Provide the absolute path to a fasta file containing a translated gene sequence.
+  -us , --upstream_search
+                        Upstream search length in basepairs. Default=10000
+  -ds , --downstream_search
+                        Downstream search length in basepairs. Default=10000
+  -i , --seed_identity
+                        Identity cutoff for initial blast search. Default=60
+  -c , --seed_coverage
+                        Coverage cutoff for initial blast search. Default=90
+  -hk , --seed_hits_kept
+                        Number of blast hits to keep. Default=None
+  --min_group_size
+                        The minimum number of gene regions to constitute a group. Default=ln(jaccard distance length)
+  -re , --recluster_iterations
+                        Number of region re-clustering attempts after the initial clustering. Default=0
+  --force               Flag to overwrite search name directory.
+```
+```visualize```
+```
+usage: GeneGrouper visualize [-h] [--visual_type] [--group_label]
+  --visual_type      Choices: [main, group, tree]. Use main for main visualizations. Use group to
+                     inspect specific group. Use tree for a phylogenetic tree of representative
+                     seed sequencess. Default=main
+  --group_label      The integer identifier of the group you wish to inspect. Default=-1
+  --image_format     Choices: [png, svg]. Output image format. Use svg if you want to edit the
+                     images. Default=png.
+  --tip_label_type   Choices: [full, group]. Use full to include the sequence ID followed by group
+                     ID. Use group to only have the group ID. Default=full
+  --tip_label_size   Specify the tip label size in the output image. Default=2
+```
+# Citation
+Alexander G McFarland, Nolan W Kennedy, Carolyn E Mills, Danielle Tullman-Ercek, Curtis Huttenhower, Erica M Hartmann, **Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper**, Bioinformatics, 2021;, btab752, https://doi.org/10.1093/bioinformatics/btab752
+# Contact
+Please message me at alexandermcfarland2022@u.northwestern.edu 
+Follow me on twitter [@alexmcfarland_](https://twitter.com/alexmcfarland_)! 
+
+%package help
+Summary:	Development documents and examples for GeneGrouper
+Provides:	python3-GeneGrouper-doc
+%description help
+<img src="docs/overview_figure.png" alt="GeneGrouper overview figure" width=1000>
+[Why use GeneGrouper?](https://github.com/agmcfarland/GeneGrouper/wiki#what-is-genegrouper)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper tutorial](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+[See GeneGrouper outputs](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+[See FAQs](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions)
+# Installation
+GeneGrouper can be installed using pip
+```pip install GeneGrouper```
+[GeneGrouper has multiple dependences.]((https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies#requirements-and-dependencies)) 
+Follow the code below to create a self-contained conda environment for GeneGrouper. **Recommended**
+**Installing Python and bioinformatic dependencies for grouping**
+```
+conda create -n GeneGrouper_env python=3.9
+source activate GeneGrouper_env #or try: conda activate GeneGrouper_env
+conda config --add channels defaults
+conda config --add channels bioconda
+conda config --add channels conda-forge
+pip install biopython scipy scikit-learn pandas matplotlib GeneGrouper
+conda install -c bioconda mcl blast mmseqs2 fasttree mafft
+```
+**Installing R and required packages for visualizations**
+```
+conda install -c conda-forge r-base=4.1.1 r-svglite r-reshape r-ggplot2 r-cowplot r-dplyr r-gggenes r-ape r-phytools r-BiocManager r-codetools
+# enter R environment
+R
+# install additional packages from CRAN
+install.packages('groupdata2',repos='https://cloud.r-project.org/', quiet=TRUE)
+# install additional packages from 
+BiocManager::install("ggtree")
+# quit
+q(save="no")
+```
+[For more information, see the installation wiki page](https://github.com/agmcfarland/GeneGrouper/wiki/Installation-and-dependencies)
+# Inputs
+### GeneGrouper has two required inputs:
+1. A translated gene sequence in fasta format (with file extension .fasta/.txt)
+2. A folder containing RefSeq GenBank-format genomes (with the file extension .gbff). [See instructions to download many RefSeq genomes at a time.](https://github.com/agmcfarland/GeneGrouper/wiki/Frequently-Asked-Questions#1-where-can-i-download-genbank-format-refseq-genomes-with-file-extension-gbff)
+# Basic usage
+#### Use `build_database` to make a GeneGrouper database of your RefSeq .gbff genomes
+```
+GeneGrouper -g /path/to/gbff -d /path/to/main_directory \
+build_database
+```
+#### Use `find_regions` to search for regions containing a gene of interest and output to a search-specific directory
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+find_regions \
+-f /path/to/query_gene.fasta
+```
+#### Use `visualize --visual_type main` to output visualizations of group gene architectures and their distribution within genomes and taxa
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type main
+```
+#### Use `visualize --visual_type group` to inspect a GeneGrouper group more closely. Replace <> with a group ID number.
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type group <>
+```
+#### Use `visualize --visual_type tree` to make a phylogenetic tree of each group's seed gene
+```
+GeneGrouper -d /path/to/main_directory -n gene_search \
+visualize \
+--visual_type tree
+```
+[See advanced usage examples](https://github.com/agmcfarland/GeneGrouper/wiki/Advanced-usage)
+[See tutorial with provided example data](https://github.com/agmcfarland/GeneGrouper/wiki/GeneGrouper-tutorial-with-data)
+# Outputs
+ 1. **For each search ```find_regions``` outputs:**
+* **Four** tabular files with quantitative and qualitative descriptions of grouping results. 
+* **One** fasta file containing all genes used in the analysis.
+2. **For each search,  ```visualize --visual_type main``` outputs:**
+* **Three** main visualizations provided.
+3. **For each search, ```visualize --visual_type group \--group_label <n>``` outputs:**
+* **One** additional visualization per group, where ```--group_label <n>``` has `<n>` replaced with the group number.  
+* **Two** tabular files containing subgroup information for each ```--group_label <n>``` supplied.
+4. **For each search, ```visualize --visual_type tree``` outputs:**
+* **One** phylogenetic tree of each seed gene in each group.
+[See complete output file descriptions](https://github.com/agmcfarland/GeneGrouper/wiki/Output-file-descriptions)
+Each search and visualization will have the following file structure. Files under `visualizations` may differ.
+```
+├── main_directory
+│   ├── search_results
+│   │   ├── group_statistics_summmary.csv
+│   │   ├── representative_group_member_summary.csv
+│   │   ├── group_taxa_summary.csv
+│   │   ├── group_regions.csv
+│   │   ├── group_region_seqs.faa
+│   │   ├── visualizations
+│   │   │   ├── group_summary.png
+│   │   │   ├── groups_by_taxa.png
+│   │   │   ├── taxa_searched.png
+│   │   │   ├── inspect_group_-1.png
+│   │   │   ├── representative_seed_phylogeny.png
+│   │   ├── internal_data
+│   │   ├── subgroups
+│   │   ├── seed_results.db
+```
+# Usage options
+### Global flags
+```
+usage: GeneGrouper [-h] [-d] [-n] [-g] [-t]
+                   {build_database,find_regions,visualize} ...
+  -h, --help            show this help message and exit
+  -d , --project_directory
+                        Main directory to contain the base files used for
+                        region searching and clustering. Default=current
+                        directory.
+  -n , --search_name    Name of the directory to contain search-specific
+                        results. Default=region_search
+  -g , --genomes_directory
+                        Directory containing genbank-file format genomes with
+                        the suffix .gbff. Default=./genomes.
+  -t , --threads        Number of threads to use. Default=all threads.
+```
+### Subcommands
+```
+    build_database      Convert a set of genomes into a useable format for
+                        GeneGrouper
+    find_regions        Find regions given a translated gene and a set of
+                        genomes
+    visualize           Visualize GeneGrouper outputs. Three visualization options are provided.
+                        Check the --visual_type help description.
+```
+### Subcommand flags
+```build_database```
+```
+usage: GeneGrouper build_database [-h]
+  -h, --help  show this help message and exit
+```
+```find_regions```
+```
+usage: GeneGrouper find_regions [-h] -f  [-us] [-ds] [-i] [-c] [-hk] [--min_group_size] [-re] [--force]
+  -h, --help            show this help message and exit
+  -f , --query_file     Provide the absolute path to a fasta file containing a translated gene sequence.
+  -us , --upstream_search
+                        Upstream search length in basepairs. Default=10000
+  -ds , --downstream_search
+                        Downstream search length in basepairs. Default=10000
+  -i , --seed_identity
+                        Identity cutoff for initial blast search. Default=60
+  -c , --seed_coverage
+                        Coverage cutoff for initial blast search. Default=90
+  -hk , --seed_hits_kept
+                        Number of blast hits to keep. Default=None
+  --min_group_size
+                        The minimum number of gene regions to constitute a group. Default=ln(jaccard distance length)
+  -re , --recluster_iterations
+                        Number of region re-clustering attempts after the initial clustering. Default=0
+  --force               Flag to overwrite search name directory.
+```
+```visualize```
+```
+usage: GeneGrouper visualize [-h] [--visual_type] [--group_label]
+  --visual_type      Choices: [main, group, tree]. Use main for main visualizations. Use group to
+                     inspect specific group. Use tree for a phylogenetic tree of representative
+                     seed sequencess. Default=main
+  --group_label      The integer identifier of the group you wish to inspect. Default=-1
+  --image_format     Choices: [png, svg]. Output image format. Use svg if you want to edit the
+                     images. Default=png.
+  --tip_label_type   Choices: [full, group]. Use full to include the sequence ID followed by group
+                     ID. Use group to only have the group ID. Default=full
+  --tip_label_size   Specify the tip label size in the output image. Default=2
+```
+# Citation
+Alexander G McFarland, Nolan W Kennedy, Carolyn E Mills, Danielle Tullman-Ercek, Curtis Huttenhower, Erica M Hartmann, **Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper**, Bioinformatics, 2021;, btab752, https://doi.org/10.1093/bioinformatics/btab752
+# Contact
+Please message me at alexandermcfarland2022@u.northwestern.edu 
+Follow me on twitter [@alexmcfarland_](https://twitter.com/alexmcfarland_)! 
+
+%prep
+%autosetup -n GeneGrouper-1.0.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-GeneGrouper -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..3dc8e13
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+335a17ebf09267c83ce8ca658b222657  GeneGrouper-1.0.3.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-06-20 04:27:19 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-06-20 04:27:19 +0000
commit	9075c9ff3909d371b32699a423592040a9712428 (patch)
tree	43184bc3ea903da3625d322d7fe51803b2d33673
parent	d09143c2122ccdd3fa44786796a2e18f4a40478c (diff)