%global _empty_manifest_terminate_build 0 Name: python-pyensembl Version: 2.2.8 Release: 1 Summary: Python interface to ensembl reference genome metadata License: http://www.apache.org/licenses/LICENSE-2.0.html URL: https://github.com/openvax/pyensembl Source0: https://mirrors.nju.edu.cn/pypi/web/packages/03/f6/e1c92ecebfda950d1fc15c73a83e2474330b62e9a82a3927e910bea801fd/pyensembl-2.2.8.tar.gz BuildArch: noarch %description PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files. # Example Usage ```python from pyensembl import EnsemblRelease # release 77 uses human reference genome GRCh38 data = EnsemblRelease(77) # will return ['HLA-A'] gene_names = data.gene_names_at_locus(contig=6, position=29945884) # get all exons associated with HLA-A exon_ids = data.exon_ids_of_gene_name('HLA-A') ``` # Installation You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html): ```sh pip install pyensembl ``` This should also install any required packages such as [datacache](https://github.com/openvax/datacache). Before using PyEnsembl, run the following command to download and install Ensembl data: ``` pyensembl install --release --species ``` For example, `pyensembl install --release 75 76 --species human` will download and install all human reference data from Ensembl releases 75 and 76. Alternatively, you can create the `EnsemblRelease` object from inside a Python process and call `ensembl_object.download()` followed by `ensembl_object.index()`. ## Cache Location By default, PyEnsembl uses the platform-specific `Cache` folder and caches the files into the `pyensembl` sub-directory. You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR` as your preferred location for caching: ```sh export PYENSEMBL_CACHE_DIR=/custom/cache/dir ``` or ```python import os os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir' # ... PyEnsembl API usage ``` # Non-Ensembl Data PyEnsembl also allows arbitrary genomes via the specification of local file paths or remote URLs to both Ensembl and non-Ensembl GTF and FASTA files. (Warning: GTF formats can vary, and handling of non-Ensembl data is still very much in development.) For example: ```python data = Genome( reference_name='GRCh38', annotation_name='my_genome_features', gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf') # parse GTF and construct database of genomic features data.index() gene_names = data.gene_names_at_locus(contig=6, position=29945884) ``` # API The `EnsemblRelease` object has methods to let you access all possible combinations of the annotation features *gene\_name*, *gene\_id*, *transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of these genomic elements (contig, start position, end position, strand). ## Genes
genes(contig=None, strand=None)
Returns a list of Gene objects, optionally restricted to a particular contig or strand.
genes_at_locus(contig, position, end=None, strand=None)
Returns a list of Gene objects overlapping a particular position on a contig, optionally extend into a range with the end parameter and restrict to forward or backward strand by passing strand='+' or strand='-'.
gene_by_id(gene_id)
Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").
gene_names(contig=None, strand=None)
Returns all gene names in the annotation database, optionally restricted to a particular contig or strand.
genes_by_name(gene_name)
Get all the unqiue genes with the given name (there might be multiple due to copies in the genome), return a list containing a Gene object for each distinct ID.
gene_by_protein_id(protein_id)
Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")
gene_names_at_locus(contig, position, end=None, strand=None)
Names of genes overlapping with the given locus, optionally restricted by strand. (returns a list to account for overlapping genes)
gene_name_of_gene_id(gene_id)
Returns name of gene with given genen ID.
gene_name_of_transcript_id(transcript_id)
Returns name of gene associated with given transcript ID.
gene_name_of_transcript_name(transcript_name)
Returns name of gene associated with given transcript name.
gene_name_of_exon_id(exon_id)
Returns name of gene associated with given exon ID.
gene_ids(contig=None, strand=None)
Return all gene IDs in the annotation database, optionally restricted by chromosome name or strand.
gene_ids_of_gene_name(gene_name)
Returns all Ensembl gene IDs with the given name.
## Transcripts
transcripts(contig=None, strand=None)
Returns a list of Transcript objects for all transcript entries in the Ensembl database, optionally restricted to a particular contig or strand.
transcript_by_id(transcript_id)
Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")
transcripts_by_name(transcript_name)
Returns a list of Transcript objects for every transcript matching the given name.
transcript_names(contig=None, strand=None)
Returns all transcript names in the annotation database.
transcript_ids(contig=None, strand=None)
Returns all transcript IDs in the annotation database.
transcript_ids_of_gene_id(gene_id)
Return IDs of all transcripts associated with given gene ID.
transcript_ids_of_gene_name(gene_name)
Return IDs of all transcripts associated with given gene name.
transcript_ids_of_transcript_name(transcript_name)
Find all Ensembl transcript IDs with the given name.
transcript_ids_of_exon_id(exon_id)
Return IDs of all transcripts associatd with given exon ID.
## Exons
exon_ids(contig=None, strand=None)
Returns a list of exons IDs in the annotation database, optionally restricted by the given chromosome and strand.
exon_by_id(exon_id)
Construct an Exon object for given Ensembl exon ID (e.g. "ENSE00001209410")
exon_ids_of_gene_id(gene_id)
Returns a list of exon IDs associated with a given gene ID.
exon_ids_of_gene_name(gene_name)
Returns a list of exon IDs associated with a given gene name.
exon_ids_of_transcript_id(transcript_id)
Returns a list of exon IDs associated with a given transcript ID.
exon_ids_of_transcript_name(transcript_name)
Returns a list of exon IDs associated with a given transcript name.
%package -n python3-pyensembl Summary: Python interface to ensembl reference genome metadata Provides: python-pyensembl BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-pyensembl PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files. # Example Usage ```python from pyensembl import EnsemblRelease # release 77 uses human reference genome GRCh38 data = EnsemblRelease(77) # will return ['HLA-A'] gene_names = data.gene_names_at_locus(contig=6, position=29945884) # get all exons associated with HLA-A exon_ids = data.exon_ids_of_gene_name('HLA-A') ``` # Installation You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html): ```sh pip install pyensembl ``` This should also install any required packages such as [datacache](https://github.com/openvax/datacache). Before using PyEnsembl, run the following command to download and install Ensembl data: ``` pyensembl install --release --species ``` For example, `pyensembl install --release 75 76 --species human` will download and install all human reference data from Ensembl releases 75 and 76. Alternatively, you can create the `EnsemblRelease` object from inside a Python process and call `ensembl_object.download()` followed by `ensembl_object.index()`. ## Cache Location By default, PyEnsembl uses the platform-specific `Cache` folder and caches the files into the `pyensembl` sub-directory. You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR` as your preferred location for caching: ```sh export PYENSEMBL_CACHE_DIR=/custom/cache/dir ``` or ```python import os os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir' # ... PyEnsembl API usage ``` # Non-Ensembl Data PyEnsembl also allows arbitrary genomes via the specification of local file paths or remote URLs to both Ensembl and non-Ensembl GTF and FASTA files. (Warning: GTF formats can vary, and handling of non-Ensembl data is still very much in development.) For example: ```python data = Genome( reference_name='GRCh38', annotation_name='my_genome_features', gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf') # parse GTF and construct database of genomic features data.index() gene_names = data.gene_names_at_locus(contig=6, position=29945884) ``` # API The `EnsemblRelease` object has methods to let you access all possible combinations of the annotation features *gene\_name*, *gene\_id*, *transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of these genomic elements (contig, start position, end position, strand). ## Genes
genes(contig=None, strand=None)
Returns a list of Gene objects, optionally restricted to a particular contig or strand.
genes_at_locus(contig, position, end=None, strand=None)
Returns a list of Gene objects overlapping a particular position on a contig, optionally extend into a range with the end parameter and restrict to forward or backward strand by passing strand='+' or strand='-'.
gene_by_id(gene_id)
Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").
gene_names(contig=None, strand=None)
Returns all gene names in the annotation database, optionally restricted to a particular contig or strand.
genes_by_name(gene_name)
Get all the unqiue genes with the given name (there might be multiple due to copies in the genome), return a list containing a Gene object for each distinct ID.
gene_by_protein_id(protein_id)
Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")
gene_names_at_locus(contig, position, end=None, strand=None)
Names of genes overlapping with the given locus, optionally restricted by strand. (returns a list to account for overlapping genes)
gene_name_of_gene_id(gene_id)
Returns name of gene with given genen ID.
gene_name_of_transcript_id(transcript_id)
Returns name of gene associated with given transcript ID.
gene_name_of_transcript_name(transcript_name)
Returns name of gene associated with given transcript name.
gene_name_of_exon_id(exon_id)
Returns name of gene associated with given exon ID.
gene_ids(contig=None, strand=None)
Return all gene IDs in the annotation database, optionally restricted by chromosome name or strand.
gene_ids_of_gene_name(gene_name)
Returns all Ensembl gene IDs with the given name.
## Transcripts
transcripts(contig=None, strand=None)
Returns a list of Transcript objects for all transcript entries in the Ensembl database, optionally restricted to a particular contig or strand.
transcript_by_id(transcript_id)
Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")
transcripts_by_name(transcript_name)
Returns a list of Transcript objects for every transcript matching the given name.
transcript_names(contig=None, strand=None)
Returns all transcript names in the annotation database.
transcript_ids(contig=None, strand=None)
Returns all transcript IDs in the annotation database.
transcript_ids_of_gene_id(gene_id)
Return IDs of all transcripts associated with given gene ID.
transcript_ids_of_gene_name(gene_name)
Return IDs of all transcripts associated with given gene name.
transcript_ids_of_transcript_name(transcript_name)
Find all Ensembl transcript IDs with the given name.
transcript_ids_of_exon_id(exon_id)
Return IDs of all transcripts associatd with given exon ID.
## Exons
exon_ids(contig=None, strand=None)
Returns a list of exons IDs in the annotation database, optionally restricted by the given chromosome and strand.
exon_by_id(exon_id)
Construct an Exon object for given Ensembl exon ID (e.g. "ENSE00001209410")
exon_ids_of_gene_id(gene_id)
Returns a list of exon IDs associated with a given gene ID.
exon_ids_of_gene_name(gene_name)
Returns a list of exon IDs associated with a given gene name.
exon_ids_of_transcript_id(transcript_id)
Returns a list of exon IDs associated with a given transcript ID.
exon_ids_of_transcript_name(transcript_name)
Returns a list of exon IDs associated with a given transcript name.
%package help Summary: Development documents and examples for pyensembl Provides: python3-pyensembl-doc %description help PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files. # Example Usage ```python from pyensembl import EnsemblRelease # release 77 uses human reference genome GRCh38 data = EnsemblRelease(77) # will return ['HLA-A'] gene_names = data.gene_names_at_locus(contig=6, position=29945884) # get all exons associated with HLA-A exon_ids = data.exon_ids_of_gene_name('HLA-A') ``` # Installation You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html): ```sh pip install pyensembl ``` This should also install any required packages such as [datacache](https://github.com/openvax/datacache). Before using PyEnsembl, run the following command to download and install Ensembl data: ``` pyensembl install --release --species ``` For example, `pyensembl install --release 75 76 --species human` will download and install all human reference data from Ensembl releases 75 and 76. Alternatively, you can create the `EnsemblRelease` object from inside a Python process and call `ensembl_object.download()` followed by `ensembl_object.index()`. ## Cache Location By default, PyEnsembl uses the platform-specific `Cache` folder and caches the files into the `pyensembl` sub-directory. You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR` as your preferred location for caching: ```sh export PYENSEMBL_CACHE_DIR=/custom/cache/dir ``` or ```python import os os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir' # ... PyEnsembl API usage ``` # Non-Ensembl Data PyEnsembl also allows arbitrary genomes via the specification of local file paths or remote URLs to both Ensembl and non-Ensembl GTF and FASTA files. (Warning: GTF formats can vary, and handling of non-Ensembl data is still very much in development.) For example: ```python data = Genome( reference_name='GRCh38', annotation_name='my_genome_features', gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf') # parse GTF and construct database of genomic features data.index() gene_names = data.gene_names_at_locus(contig=6, position=29945884) ``` # API The `EnsemblRelease` object has methods to let you access all possible combinations of the annotation features *gene\_name*, *gene\_id*, *transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of these genomic elements (contig, start position, end position, strand). ## Genes
genes(contig=None, strand=None)
Returns a list of Gene objects, optionally restricted to a particular contig or strand.
genes_at_locus(contig, position, end=None, strand=None)
Returns a list of Gene objects overlapping a particular position on a contig, optionally extend into a range with the end parameter and restrict to forward or backward strand by passing strand='+' or strand='-'.
gene_by_id(gene_id)
Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").
gene_names(contig=None, strand=None)
Returns all gene names in the annotation database, optionally restricted to a particular contig or strand.
genes_by_name(gene_name)
Get all the unqiue genes with the given name (there might be multiple due to copies in the genome), return a list containing a Gene object for each distinct ID.
gene_by_protein_id(protein_id)
Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")
gene_names_at_locus(contig, position, end=None, strand=None)
Names of genes overlapping with the given locus, optionally restricted by strand. (returns a list to account for overlapping genes)
gene_name_of_gene_id(gene_id)
Returns name of gene with given genen ID.
gene_name_of_transcript_id(transcript_id)
Returns name of gene associated with given transcript ID.
gene_name_of_transcript_name(transcript_name)
Returns name of gene associated with given transcript name.
gene_name_of_exon_id(exon_id)
Returns name of gene associated with given exon ID.
gene_ids(contig=None, strand=None)
Return all gene IDs in the annotation database, optionally restricted by chromosome name or strand.
gene_ids_of_gene_name(gene_name)
Returns all Ensembl gene IDs with the given name.
## Transcripts
transcripts(contig=None, strand=None)
Returns a list of Transcript objects for all transcript entries in the Ensembl database, optionally restricted to a particular contig or strand.
transcript_by_id(transcript_id)
Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")
transcripts_by_name(transcript_name)
Returns a list of Transcript objects for every transcript matching the given name.
transcript_names(contig=None, strand=None)
Returns all transcript names in the annotation database.
transcript_ids(contig=None, strand=None)
Returns all transcript IDs in the annotation database.
transcript_ids_of_gene_id(gene_id)
Return IDs of all transcripts associated with given gene ID.
transcript_ids_of_gene_name(gene_name)
Return IDs of all transcripts associated with given gene name.
transcript_ids_of_transcript_name(transcript_name)
Find all Ensembl transcript IDs with the given name.
transcript_ids_of_exon_id(exon_id)
Return IDs of all transcripts associatd with given exon ID.
## Exons
exon_ids(contig=None, strand=None)
Returns a list of exons IDs in the annotation database, optionally restricted by the given chromosome and strand.
exon_by_id(exon_id)
Construct an Exon object for given Ensembl exon ID (e.g. "ENSE00001209410")
exon_ids_of_gene_id(gene_id)
Returns a list of exon IDs associated with a given gene ID.
exon_ids_of_gene_name(gene_name)
Returns a list of exon IDs associated with a given gene name.
exon_ids_of_transcript_id(transcript_id)
Returns a list of exon IDs associated with a given transcript ID.
exon_ids_of_transcript_name(transcript_name)
Returns a list of exon IDs associated with a given transcript name.
%prep %autosetup -n pyensembl-2.2.8 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-pyensembl -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Mon May 15 2023 Python_Bot - 2.2.8-1 - Package Spec generated