diff options
author | CoprDistGit <infra@openeuler.org> | 2023-05-31 04:31:54 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-05-31 04:31:54 +0000 |
commit | 178956eebe248466ead0f262da67bf72e8e2ac75 (patch) | |
tree | 23ab495027489fc1b5b7c9f6f4b3909416b0f58a /python-vicinator.spec | |
parent | 636e4669567998c7ddd56f047b7ce8c760247929 (diff) |
automatic import of python-vicinator
Diffstat (limited to 'python-vicinator.spec')
-rw-r--r-- | python-vicinator.spec | 551 |
1 files changed, 551 insertions, 0 deletions
diff --git a/python-vicinator.spec b/python-vicinator.spec new file mode 100644 index 0000000..210478f --- /dev/null +++ b/python-vicinator.spec @@ -0,0 +1,551 @@ +%global _empty_manifest_terminate_build 0 +Name: python-Vicinator +Version: 0.0.32 +Release: 1 +Summary: A small python package to trace orthology neighborhood across feature files +License: MIT License +URL: https://github.com/ba1/vicinator +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/fb/20/1bc6dd3bc088bfdd933b1ace6c2a384c7da34491dc54d7f2907621d2b01f/Vicinator-0.0.32.tar.gz +BuildArch: noarch + +Requires: python3-ete3 +Requires: python3-ansi2html +Requires: python3-colorama +Requires: python3-pandas +Requires: python3-importlib-metadata + +%description +[](https://www.travis-ci.org/ba1/Vicinator) +[](https://codecov.io/gh/ba1/Vicinator) +[](https://badge.fury.io/py/Vicinator) +[](https://requires.io/github/ba1/Vicinator/requirements/?branch=master) +[](https://vicinator.readthedocs.io/en/latest/?badge=latest) +[](https://github.com/psf/black) + +# Vicinator + +### What is Vicinator for? + +Vicinator visualizes the microsynteny of grouped proteins (e.g. orthologs) across a large collection of genomes. +As input, it requires a mapping of the genomes' proteins to the respective protein groups and a directory containing +the genomes' feature files, i.e. files of the format *\*.gff* or *\*_feature_table.txt*. + + + + +### What is Vicinator not for? + +As stated above, Vicinator relies on a pre-computed grouping of proteins across genomes. It can not find these +groups of genes for you. + +### Installation + +Vicinator is written for Python 3.6+ + +It is recommended to install Vicinator inside a virtual environment, e.g. with venv: + +`python3 -m venv myenv` + +This activates the new environment called *myenv*. While activated, you can install the latest version via pip. +The following command installs the latest version and all unmet requirements automatically. + +`pip install --upgrades vicinator` + +Requirements: + - ansi2html>=1.5.2 + - colorama>=0.4.4 + - ete3>=3.1.2 + - pandas>=1.1.3 + - importlib-metadata>=3.1.1 + - setuptools-scm>=5.0.1 + +### Options + +``` +python3 vicinator/vicinator.py --help + +usage: vicinator [-h] --tabular-ortholog-groups <orthology_table> --feat-tables-dir <dir_path> + --reference <file_path> --centerprotein-accession <str> + (--extension-size <int> | --extension-mask <int> [<int> ...]) + [--tree <newick_tree_file_path>] [--outdir <dir_path>] [--prefix <str>] + [--outputlabel-map <file_path>] [--nprocs <int>] [--force] [--version] + +Track Microsynteny of target proteins and its orthologs across genomes. + +required arguments: + --tabular-ortholog-groups <orthology_table> + path to mapping file with format + ortholog_group_id<tab>genome_id<tab>protein_seq_id + --feat-tables-dir <dir_path> + path to directory of *.feature_tables.txt or *.gff3 files that shall be + screen + +required arguments (neighborhood): + --reference <file_path> + path to a ncbi style feature table or gff file that acts as a reference + --centerprotein-accession <str> + unique identifier of the central gene of the window + --extension-size <int> + defines the #features that are co-checked to the left and right of the + centerprotein + --extension-mask <int> [<int> ...] + defines the position of features that are co-checked to the left and right + relative to the centerprotein (position 0). + +optional arguments (output): + --tree <newick_tree_file_path> + path to newick tree that includes all taxa to be screened + --outdir <dir_path> path to desired output directory + --prefix <str> if option is set, shows intergenic distances of genes surrounding the + center gene + --outputlabel-map <file_path> + Attempts to replace genome accessions in the outputs with a replacement + string. Requires a two-column map file formatted like so: 'genome file + accession' <tab> 'replacement string'. The replacement will automatically + be cut to a maximum of 30 chars. + +optional arguments (run): + --nprocs <int> Number of CPUs for parallel processing of genomes. Default: Number of + CPUs-1 + --force if option is set, existing ortholog databases in the output dir are + ignored and will be overwritten +``` + +### Input: Required Arguments + +<br/> + +`--tabular-ortholog-groups <orthology_table>` + +>Vicinator requires a tab-separated three-column mapping of orthologs that is formatted like so: +> +> **group_id** \tab **genome_id** \tab **protein_id** +>  + +<br/> + +` --feat-tables-dir <dir_path>` + +>Vicinator expects the path to a directory containing *.gff* format or *_feature_table.txt* +> files of all the genomes you want to trace the microsynteny in. +> +> A recommended source for these files is NCBI RefSeq. In order for the mapping to work, the filenames +> should correspond to the **genome_ids** specified in the mapping file: +> +> E.g. line 7: **OG_2 genomeB protein_X011** +> <br/> +> triggers a search in a feature file named **genomeB.gff** or **genomeB_genomic.gff** or **genomeB_feature_table.txt** +> in the directory specified with `--feat-tables-dir`. Effectively, it tries to locate the protein_X011 in this feature file. + +<br/> + +`--reference <file_path>` +> the path to a reference genome feature file where the center-protein accession must be found + +<br/> + +`--centerprotein-accession` & `--extension-size <int>` + +>Identifies the window of vicinity around a center-protein which is traced based on the findings in the reference +> genome. +>  + +<br/> + +## Example Basic Usage + +`vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-size 3` + +## Example Advanced Usage + +When vicinator receives a phylogenetic tree (with genome_ids as leaf labels) it will trace the microsynteny in order of +increasing phylogentic distance to the reference genome specified. + +`vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-size 3 --tree phylogeny.nwk` + + +## Example Advanced Usage 2 + +When vicinator is started with the `--extension-mask` parameter it excpects a space-separated list of integers representing +the relative positions of proteins to the center-protein vicinator will trace. You don't have to give +them in order since they will be sorted automatically with 0 representing the center protein (always included). + +`vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-mask -35 -1 0 7 9` + + + + +%package -n python3-Vicinator +Summary: A small python package to trace orthology neighborhood across feature files +Provides: python-Vicinator +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-Vicinator +[](https://www.travis-ci.org/ba1/Vicinator) +[](https://codecov.io/gh/ba1/Vicinator) +[](https://badge.fury.io/py/Vicinator) +[](https://requires.io/github/ba1/Vicinator/requirements/?branch=master) +[](https://vicinator.readthedocs.io/en/latest/?badge=latest) +[](https://github.com/psf/black) + +# Vicinator + +### What is Vicinator for? + +Vicinator visualizes the microsynteny of grouped proteins (e.g. orthologs) across a large collection of genomes. +As input, it requires a mapping of the genomes' proteins to the respective protein groups and a directory containing +the genomes' feature files, i.e. files of the format *\*.gff* or *\*_feature_table.txt*. + + + + +### What is Vicinator not for? + +As stated above, Vicinator relies on a pre-computed grouping of proteins across genomes. It can not find these +groups of genes for you. + +### Installation + +Vicinator is written for Python 3.6+ + +It is recommended to install Vicinator inside a virtual environment, e.g. with venv: + +`python3 -m venv myenv` + +This activates the new environment called *myenv*. While activated, you can install the latest version via pip. +The following command installs the latest version and all unmet requirements automatically. + +`pip install --upgrades vicinator` + +Requirements: + - ansi2html>=1.5.2 + - colorama>=0.4.4 + - ete3>=3.1.2 + - pandas>=1.1.3 + - importlib-metadata>=3.1.1 + - setuptools-scm>=5.0.1 + +### Options + +``` +python3 vicinator/vicinator.py --help + +usage: vicinator [-h] --tabular-ortholog-groups <orthology_table> --feat-tables-dir <dir_path> + --reference <file_path> --centerprotein-accession <str> + (--extension-size <int> | --extension-mask <int> [<int> ...]) + [--tree <newick_tree_file_path>] [--outdir <dir_path>] [--prefix <str>] + [--outputlabel-map <file_path>] [--nprocs <int>] [--force] [--version] + +Track Microsynteny of target proteins and its orthologs across genomes. + +required arguments: + --tabular-ortholog-groups <orthology_table> + path to mapping file with format + ortholog_group_id<tab>genome_id<tab>protein_seq_id + --feat-tables-dir <dir_path> + path to directory of *.feature_tables.txt or *.gff3 files that shall be + screen + +required arguments (neighborhood): + --reference <file_path> + path to a ncbi style feature table or gff file that acts as a reference + --centerprotein-accession <str> + unique identifier of the central gene of the window + --extension-size <int> + defines the #features that are co-checked to the left and right of the + centerprotein + --extension-mask <int> [<int> ...] + defines the position of features that are co-checked to the left and right + relative to the centerprotein (position 0). + +optional arguments (output): + --tree <newick_tree_file_path> + path to newick tree that includes all taxa to be screened + --outdir <dir_path> path to desired output directory + --prefix <str> if option is set, shows intergenic distances of genes surrounding the + center gene + --outputlabel-map <file_path> + Attempts to replace genome accessions in the outputs with a replacement + string. Requires a two-column map file formatted like so: 'genome file + accession' <tab> 'replacement string'. The replacement will automatically + be cut to a maximum of 30 chars. + +optional arguments (run): + --nprocs <int> Number of CPUs for parallel processing of genomes. Default: Number of + CPUs-1 + --force if option is set, existing ortholog databases in the output dir are + ignored and will be overwritten +``` + +### Input: Required Arguments + +<br/> + +`--tabular-ortholog-groups <orthology_table>` + +>Vicinator requires a tab-separated three-column mapping of orthologs that is formatted like so: +> +> **group_id** \tab **genome_id** \tab **protein_id** +>  + +<br/> + +` --feat-tables-dir <dir_path>` + +>Vicinator expects the path to a directory containing *.gff* format or *_feature_table.txt* +> files of all the genomes you want to trace the microsynteny in. +> +> A recommended source for these files is NCBI RefSeq. In order for the mapping to work, the filenames +> should correspond to the **genome_ids** specified in the mapping file: +> +> E.g. line 7: **OG_2 genomeB protein_X011** +> <br/> +> triggers a search in a feature file named **genomeB.gff** or **genomeB_genomic.gff** or **genomeB_feature_table.txt** +> in the directory specified with `--feat-tables-dir`. Effectively, it tries to locate the protein_X011 in this feature file. + +<br/> + +`--reference <file_path>` +> the path to a reference genome feature file where the center-protein accession must be found + +<br/> + +`--centerprotein-accession` & `--extension-size <int>` + +>Identifies the window of vicinity around a center-protein which is traced based on the findings in the reference +> genome. +>  + +<br/> + +## Example Basic Usage + +`vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-size 3` + +## Example Advanced Usage + +When vicinator receives a phylogenetic tree (with genome_ids as leaf labels) it will trace the microsynteny in order of +increasing phylogentic distance to the reference genome specified. + +`vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-size 3 --tree phylogeny.nwk` + + +## Example Advanced Usage 2 + +When vicinator is started with the `--extension-mask` parameter it excpects a space-separated list of integers representing +the relative positions of proteins to the center-protein vicinator will trace. You don't have to give +them in order since they will be sorted automatically with 0 representing the center protein (always included). + +`vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-mask -35 -1 0 7 9` + + + + +%package help +Summary: Development documents and examples for Vicinator +Provides: python3-Vicinator-doc +%description help +[](https://www.travis-ci.org/ba1/Vicinator) +[](https://codecov.io/gh/ba1/Vicinator) +[](https://badge.fury.io/py/Vicinator) +[](https://requires.io/github/ba1/Vicinator/requirements/?branch=master) +[](https://vicinator.readthedocs.io/en/latest/?badge=latest) +[](https://github.com/psf/black) + +# Vicinator + +### What is Vicinator for? + +Vicinator visualizes the microsynteny of grouped proteins (e.g. orthologs) across a large collection of genomes. +As input, it requires a mapping of the genomes' proteins to the respective protein groups and a directory containing +the genomes' feature files, i.e. files of the format *\*.gff* or *\*_feature_table.txt*. + + + + +### What is Vicinator not for? + +As stated above, Vicinator relies on a pre-computed grouping of proteins across genomes. It can not find these +groups of genes for you. + +### Installation + +Vicinator is written for Python 3.6+ + +It is recommended to install Vicinator inside a virtual environment, e.g. with venv: + +`python3 -m venv myenv` + +This activates the new environment called *myenv*. While activated, you can install the latest version via pip. +The following command installs the latest version and all unmet requirements automatically. + +`pip install --upgrades vicinator` + +Requirements: + - ansi2html>=1.5.2 + - colorama>=0.4.4 + - ete3>=3.1.2 + - pandas>=1.1.3 + - importlib-metadata>=3.1.1 + - setuptools-scm>=5.0.1 + +### Options + +``` +python3 vicinator/vicinator.py --help + +usage: vicinator [-h] --tabular-ortholog-groups <orthology_table> --feat-tables-dir <dir_path> + --reference <file_path> --centerprotein-accession <str> + (--extension-size <int> | --extension-mask <int> [<int> ...]) + [--tree <newick_tree_file_path>] [--outdir <dir_path>] [--prefix <str>] + [--outputlabel-map <file_path>] [--nprocs <int>] [--force] [--version] + +Track Microsynteny of target proteins and its orthologs across genomes. + +required arguments: + --tabular-ortholog-groups <orthology_table> + path to mapping file with format + ortholog_group_id<tab>genome_id<tab>protein_seq_id + --feat-tables-dir <dir_path> + path to directory of *.feature_tables.txt or *.gff3 files that shall be + screen + +required arguments (neighborhood): + --reference <file_path> + path to a ncbi style feature table or gff file that acts as a reference + --centerprotein-accession <str> + unique identifier of the central gene of the window + --extension-size <int> + defines the #features that are co-checked to the left and right of the + centerprotein + --extension-mask <int> [<int> ...] + defines the position of features that are co-checked to the left and right + relative to the centerprotein (position 0). + +optional arguments (output): + --tree <newick_tree_file_path> + path to newick tree that includes all taxa to be screened + --outdir <dir_path> path to desired output directory + --prefix <str> if option is set, shows intergenic distances of genes surrounding the + center gene + --outputlabel-map <file_path> + Attempts to replace genome accessions in the outputs with a replacement + string. Requires a two-column map file formatted like so: 'genome file + accession' <tab> 'replacement string'. The replacement will automatically + be cut to a maximum of 30 chars. + +optional arguments (run): + --nprocs <int> Number of CPUs for parallel processing of genomes. Default: Number of + CPUs-1 + --force if option is set, existing ortholog databases in the output dir are + ignored and will be overwritten +``` + +### Input: Required Arguments + +<br/> + +`--tabular-ortholog-groups <orthology_table>` + +>Vicinator requires a tab-separated three-column mapping of orthologs that is formatted like so: +> +> **group_id** \tab **genome_id** \tab **protein_id** +>  + +<br/> + +` --feat-tables-dir <dir_path>` + +>Vicinator expects the path to a directory containing *.gff* format or *_feature_table.txt* +> files of all the genomes you want to trace the microsynteny in. +> +> A recommended source for these files is NCBI RefSeq. In order for the mapping to work, the filenames +> should correspond to the **genome_ids** specified in the mapping file: +> +> E.g. line 7: **OG_2 genomeB protein_X011** +> <br/> +> triggers a search in a feature file named **genomeB.gff** or **genomeB_genomic.gff** or **genomeB_feature_table.txt** +> in the directory specified with `--feat-tables-dir`. Effectively, it tries to locate the protein_X011 in this feature file. + +<br/> + +`--reference <file_path>` +> the path to a reference genome feature file where the center-protein accession must be found + +<br/> + +`--centerprotein-accession` & `--extension-size <int>` + +>Identifies the window of vicinity around a center-protein which is traced based on the findings in the reference +> genome. +>  + +<br/> + +## Example Basic Usage + +`vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-size 3` + +## Example Advanced Usage + +When vicinator receives a phylogenetic tree (with genome_ids as leaf labels) it will trace the microsynteny in order of +increasing phylogentic distance to the reference genome specified. + +`vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-size 3 --tree phylogeny.nwk` + + +## Example Advanced Usage 2 + +When vicinator is started with the `--extension-mask` parameter it excpects a space-separated list of integers representing +the relative positions of proteins to the center-protein vicinator will trace. You don't have to give +them in order since they will be sorted automatically with 0 representing the center protein (always included). + +`vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-mask -35 -1 0 7 9` + + + + +%prep +%autosetup -n Vicinator-0.0.32 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-Vicinator -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed May 31 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.32-1 +- Package Spec generated |