diff options
author | CoprDistGit <infra@openeuler.org> | 2023-04-11 17:43:41 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-04-11 17:43:41 +0000 |
commit | 01098c86272e2bb2d69e7a7ad54e75a9b0efde90 (patch) | |
tree | 784c20c0f26dcebf6f2e19ce7e8c44ad460e07b1 | |
parent | 8c4e956179d65505400868a94ff8e1e54fc4c05c (diff) |
automatic import of python-biocommons-seqrepo
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-biocommons-seqrepo.spec | 652 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 654 insertions, 0 deletions
@@ -0,0 +1 @@ +/biocommons.seqrepo-0.6.5.tar.gz diff --git a/python-biocommons-seqrepo.spec b/python-biocommons-seqrepo.spec new file mode 100644 index 0000000..7d33ef6 --- /dev/null +++ b/python-biocommons-seqrepo.spec @@ -0,0 +1,652 @@ +%global _empty_manifest_terminate_build 0 +Name: python-biocommons.seqrepo +Version: 0.6.5 +Release: 1 +Summary: Non-redundant, compressed, journalled, file-based storage for biological sequences +License: Apache Software License +URL: https://github.com/biocommons/biocommons.seqrepo +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/f1/55/7bff2f7bd3971925d18bdfb9e666b6d0bb1abd7f28da7a82a19fc5a6529f/biocommons.seqrepo-0.6.5.tar.gz +BuildArch: noarch + +Requires: python3-bioutils +Requires: python3-coloredlogs +Requires: python3-ipython +Requires: python3-pysam +Requires: python3-requests +Requires: python3-requests-html +Requires: python3-six +Requires: python3-tqdm +Requires: python3-yoyo-migrations +Requires: python3-tox + +%description +biocommons.seqrepo +!!!!!!!!!!!!!!!!!! + +Python package for writing and reading a local collection of +biological sequences. The repository is non-redundant, compressed, +and journalled, making it efficient to store and transfer multiple +snapshots. + +Clients refer to sequences and metadata using familiar identifiers, +such as NM_000551.3 or GRCh38:1, or any of several hash-based +identifiers. The interface supports fast slicing of arbitrary regions +of large sequences. + +A "fully-qualified" identifier includes a namespace to disambiguate +accessions (e.g., "1" in GRCh37 and GRCh38). If the namespace is +provided, seqrepo uses it as-is. If the namespace is not provided and +the unqualified identifier refers to a unique sequence, it is +returned; otherwise, ambiguous identifiers will raise an error. + +SeqRepo favors identifiers from [identifiers.org](identifiers.org) +whenever available. Examples include +[refseq](https://registry.identifiers.org/registry/refseq) and +[ensembl](https://registry.identifiers.org/registry/ensembl). + +`seqrepo-rest-service +<https://github.com/biocommons/seqrepo-rest-service>`__ provides a +REST interface and docker image. + +Released under the Apache License, 2.0. + +|ci_rel| | |cov| | |pypi_rel| | `ChangeLog <https://github.com/biocommons/biocommons.seqrepo/tree/master/docs/changelog/0.5>`_ + +Citation +!!!!!!!! + +| Hart RK, Prlić A (2020) +| SeqRepo: A system for managing local collections of biological sequences. +| PLoS ONE 15(12): e0239883. https://doi.org/10.1371/journal.pone.0239883 + + +Features +!!!!!!!! + +* Timestamped, read-only snapshots. +* Space-efficient storage of sequences within a single snapshot and + across snapshots. +* Bandwidth-efficient transfer incremental updates. +* Fast fetching of sequence slices on chromosome-scale sequences. +* Precomputed digests that may be used as sequence aliases. +* Mappings of external aliases (i.e., accessions or identifiers like + NM_013305.4) to sequences. + + +Deployments Scenarios +!!!!!!!!!!!!!!!!!!!!! +* Local read-only archive, mirrored from public site, + accessed via Python API (see `Mirroring documentation <docs/mirror.rst>`__) +* Local read-write archive, maintained with command + line utility and/or API (see `Command Line Interface documentation + <docs/cli.rst>`__). +* Docker data-only container that may be linked to application container. +* SeqRepo and refget REST API for local or remote access (see `seqrepo-rest-service <https://github.com/biocommons/seqrepo-rest-service>`__) + + +Technical Quick Peek +!!!!!!!!!!!!!!!!!!!! + +Within a single snapshot, sequences are stored *non-redundantly* and +*compressed* in an add-only journalled filesystem structure. A +truncated SHA-512 hash is used to assess uniquness and as an +internal id. (The digest is truncated for space efficiency.) + +Sequences are compressed using the Block GZipped Format (`BGZF +<https://samtools.github.io/hts-specs/SAMv1.pdf>`__)), which enables +pysam to provide fast random access to compressed sequences. (Variable +compression typically makes random access impossible.) + +Sequence files are immutable, thereby enabling the use of hardlinks +across snapshots and eliminating redundant transfers (e.g., with +rsync). + +Each sequence id is associated with a namespaced alias in a sqlite +database. Such as ``<seguid,rvvuhY0FxFLNwf10FXFIrSQ7AvQ>``, +``<NCBI,NP_004009.1>``, ``<gi,5032303>``, +``<ensembl-75ENSP00000354464>``, ``<ensembl-85,ENSP00000354464.4>``. +The sqlite database is mutable across releases. + +For calibration, recent releases that include 3 human genome +assemblies (including patches), and full RefSeq sets (NM, NR, NP, NT, +XM, and XP) consumes approximately 8GB. The minimum marginal size for +additional snapshots is approximately 2GB (for the sqlite database, +which is not hardlinked). + +For more information, see `<docs/design.rst>`__. + + + +Requirements +!!!!!!!!!!!! + +Reading a sequence repository requires several Python packages, all of +which are available from pypi. Installation should be as simple as +`pip install biocommons.seqrepo`. + +*Writing* sequence files also requires ``bgzip``, which provided in +the `htslib <https://github.com/samtools/htslib>`__ repo. Ubuntu users +should install the ``tabix`` package with ``sudo apt install tabix``. + +Development and deployments are on Ubuntu. Other systems may work but +are not tested. Patches to get other systems working would be +welcomed. + +**Mac Developers** If you get "xcrun: error: invalid active developer +path", you need to install XCode. See this `StackOverflow answer +<https://apple.stackexchange.com/questions/254380/why-am-i-getting-an-invalid-active-developer-path-when-attempting-to-use-git-a>`__. + + +Quick Start +!!!!!!!!!!! + +On Ubuntu 16.04:: + + $ sudo apt install -y python3-dev gcc zlib1g-dev tabix + $ pip install seqrepo + $ sudo mkdir /usr/local/share/seqrepo + $ sudo chown $USER /usr/local/share/seqrepo + $ seqrepo pull -i 2018-11-26 + $ seqrepo show-status -i 2018-11-26 + seqrepo 0.2.3.post3.dev8+nb8298bd62283 + root directory: /usr/local/share/seqrepo/2018-11-26, 7.9 GB + backends: fastadir (schema 1), seqaliasdb (schema 1) + sequences: 773587 sequences, 93051609959 residues, 192 files + aliases: 5579572 aliases, 5480085 current, 26 namespaces, 773587 sequences + + # Simple Pythonic interface to sequences + >> from biocommons.seqrepo import SeqRepo + >> sr = SeqRepo("/usr/local/share/seqrepo/latest") + >> sr["NC_000001.11"][780000:780020] + 'TGGTGGCACGCGCTTGTAGT' + + # Or, use the seqrepo shell for even easier access + $ seqrepo start-shell -i 2018-11-26 + In [1]: sr["NC_000001.11"][780000:780020] + Out[1]: 'TGGTGGCACGCGCTTGTAGT' + + # N.B. The following output is edited for simplicity + $ seqrepo export -i 2018-11-26 | head -n100 + >SHA1:9a2acba3dd7603f... SEGUID:mirLo912A/MppLuS1cUyFMduLUQ Ensembl-85:GENSCAN00000003538 ... + MDSPLREDDSQTCARLWEAEVKRHSLEGLTVFGTAVQIHNVQRRAIRAKGTQEAQAELLCRGPRLLDRFLEDACILKEGRGTDTGQHCRGDARISSHLEA + SGTHIQLLALFLVSSSDTPPSLLRFCHALEHDIRYNSSFDSYYPLSPHSRHNDDLQTPSSHLGYIITVPDPTLPLTFASLYLGMAPCTSMGSSSMGIFQS + QRIHAFMKGKNKWDEYEGRKESWKIRSNSQTGEPTF + >SHA1:ca996b263102b1... SEGUID:yplrJjECsVqQufeYy0HkDD16z58 NCBI:XR_001733142.1 gi:1034683989 + TTTACGTCTTTCTGGGAATTTATACTGGAAGTATACTTACCTCTGTGCAAAATTGCAAATATATAAGGTAATTCATTCCAGCATTGCTTATATTAGGTTG + AACTATGTAACATTGACATTGATGTGAATCAAAAATGGTTGAAGGCTGGCAGTTTCATATGATTCAGCCTATAATAGCAAAAGATTGAAAAAATCCATTA + ATACAGTGTGGTTCAAAAAAATTTGTTGTATCAAGGTAAAATAATAGCCTGAATATAATTAAGATAGTCTGTGTATACATCGATGAAAACATTGCCAATA + + +See `Installation <docs/installation.rst>`__ and `Mirroring +<docs/mirror.rst>`__ for more information. + +Environment Variables +!!!!!!!!!!!!!!!!!!!!! + +SEQREPO_LRU_CACHE_MAXSIZE sets the lru_cache maxsize for the sqlite query response caching. It defaults to 1 million but can also be set to "none" to be unlimited. + +Developing +!!!!!!!!!! + +Here's how to get started developing:: + + python3.6 -m venv + source venv/bin/activate + pip install -U setuptools pip + make develop + + + + +.. |pypi_rel| image:: https://badge.fury.io/py/biocommons.seqrepo.png + :target: https://pypi.org/pypi?name=biocommons.seqrepo + :align: middle + +.. |ci_rel| image:: https://travis-ci.org/biocommons/biocommons.seqrepo.svg?branch=master + :target: https://travis-ci.org/biocommons/biocommons.seqrepo + :align: middle + +.. |cov| image:: https://coveralls.io/repos/github/biocommons/biocommons.seqrepo/badge.svg?branch= + :target: https://coveralls.io/github/biocommons/biocommons.seqrepo?branch= + + + + +%package -n python3-biocommons.seqrepo +Summary: Non-redundant, compressed, journalled, file-based storage for biological sequences +Provides: python-biocommons.seqrepo +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-biocommons.seqrepo +biocommons.seqrepo +!!!!!!!!!!!!!!!!!! + +Python package for writing and reading a local collection of +biological sequences. The repository is non-redundant, compressed, +and journalled, making it efficient to store and transfer multiple +snapshots. + +Clients refer to sequences and metadata using familiar identifiers, +such as NM_000551.3 or GRCh38:1, or any of several hash-based +identifiers. The interface supports fast slicing of arbitrary regions +of large sequences. + +A "fully-qualified" identifier includes a namespace to disambiguate +accessions (e.g., "1" in GRCh37 and GRCh38). If the namespace is +provided, seqrepo uses it as-is. If the namespace is not provided and +the unqualified identifier refers to a unique sequence, it is +returned; otherwise, ambiguous identifiers will raise an error. + +SeqRepo favors identifiers from [identifiers.org](identifiers.org) +whenever available. Examples include +[refseq](https://registry.identifiers.org/registry/refseq) and +[ensembl](https://registry.identifiers.org/registry/ensembl). + +`seqrepo-rest-service +<https://github.com/biocommons/seqrepo-rest-service>`__ provides a +REST interface and docker image. + +Released under the Apache License, 2.0. + +|ci_rel| | |cov| | |pypi_rel| | `ChangeLog <https://github.com/biocommons/biocommons.seqrepo/tree/master/docs/changelog/0.5>`_ + +Citation +!!!!!!!! + +| Hart RK, Prlić A (2020) +| SeqRepo: A system for managing local collections of biological sequences. +| PLoS ONE 15(12): e0239883. https://doi.org/10.1371/journal.pone.0239883 + + +Features +!!!!!!!! + +* Timestamped, read-only snapshots. +* Space-efficient storage of sequences within a single snapshot and + across snapshots. +* Bandwidth-efficient transfer incremental updates. +* Fast fetching of sequence slices on chromosome-scale sequences. +* Precomputed digests that may be used as sequence aliases. +* Mappings of external aliases (i.e., accessions or identifiers like + NM_013305.4) to sequences. + + +Deployments Scenarios +!!!!!!!!!!!!!!!!!!!!! +* Local read-only archive, mirrored from public site, + accessed via Python API (see `Mirroring documentation <docs/mirror.rst>`__) +* Local read-write archive, maintained with command + line utility and/or API (see `Command Line Interface documentation + <docs/cli.rst>`__). +* Docker data-only container that may be linked to application container. +* SeqRepo and refget REST API for local or remote access (see `seqrepo-rest-service <https://github.com/biocommons/seqrepo-rest-service>`__) + + +Technical Quick Peek +!!!!!!!!!!!!!!!!!!!! + +Within a single snapshot, sequences are stored *non-redundantly* and +*compressed* in an add-only journalled filesystem structure. A +truncated SHA-512 hash is used to assess uniquness and as an +internal id. (The digest is truncated for space efficiency.) + +Sequences are compressed using the Block GZipped Format (`BGZF +<https://samtools.github.io/hts-specs/SAMv1.pdf>`__)), which enables +pysam to provide fast random access to compressed sequences. (Variable +compression typically makes random access impossible.) + +Sequence files are immutable, thereby enabling the use of hardlinks +across snapshots and eliminating redundant transfers (e.g., with +rsync). + +Each sequence id is associated with a namespaced alias in a sqlite +database. Such as ``<seguid,rvvuhY0FxFLNwf10FXFIrSQ7AvQ>``, +``<NCBI,NP_004009.1>``, ``<gi,5032303>``, +``<ensembl-75ENSP00000354464>``, ``<ensembl-85,ENSP00000354464.4>``. +The sqlite database is mutable across releases. + +For calibration, recent releases that include 3 human genome +assemblies (including patches), and full RefSeq sets (NM, NR, NP, NT, +XM, and XP) consumes approximately 8GB. The minimum marginal size for +additional snapshots is approximately 2GB (for the sqlite database, +which is not hardlinked). + +For more information, see `<docs/design.rst>`__. + + + +Requirements +!!!!!!!!!!!! + +Reading a sequence repository requires several Python packages, all of +which are available from pypi. Installation should be as simple as +`pip install biocommons.seqrepo`. + +*Writing* sequence files also requires ``bgzip``, which provided in +the `htslib <https://github.com/samtools/htslib>`__ repo. Ubuntu users +should install the ``tabix`` package with ``sudo apt install tabix``. + +Development and deployments are on Ubuntu. Other systems may work but +are not tested. Patches to get other systems working would be +welcomed. + +**Mac Developers** If you get "xcrun: error: invalid active developer +path", you need to install XCode. See this `StackOverflow answer +<https://apple.stackexchange.com/questions/254380/why-am-i-getting-an-invalid-active-developer-path-when-attempting-to-use-git-a>`__. + + +Quick Start +!!!!!!!!!!! + +On Ubuntu 16.04:: + + $ sudo apt install -y python3-dev gcc zlib1g-dev tabix + $ pip install seqrepo + $ sudo mkdir /usr/local/share/seqrepo + $ sudo chown $USER /usr/local/share/seqrepo + $ seqrepo pull -i 2018-11-26 + $ seqrepo show-status -i 2018-11-26 + seqrepo 0.2.3.post3.dev8+nb8298bd62283 + root directory: /usr/local/share/seqrepo/2018-11-26, 7.9 GB + backends: fastadir (schema 1), seqaliasdb (schema 1) + sequences: 773587 sequences, 93051609959 residues, 192 files + aliases: 5579572 aliases, 5480085 current, 26 namespaces, 773587 sequences + + # Simple Pythonic interface to sequences + >> from biocommons.seqrepo import SeqRepo + >> sr = SeqRepo("/usr/local/share/seqrepo/latest") + >> sr["NC_000001.11"][780000:780020] + 'TGGTGGCACGCGCTTGTAGT' + + # Or, use the seqrepo shell for even easier access + $ seqrepo start-shell -i 2018-11-26 + In [1]: sr["NC_000001.11"][780000:780020] + Out[1]: 'TGGTGGCACGCGCTTGTAGT' + + # N.B. The following output is edited for simplicity + $ seqrepo export -i 2018-11-26 | head -n100 + >SHA1:9a2acba3dd7603f... SEGUID:mirLo912A/MppLuS1cUyFMduLUQ Ensembl-85:GENSCAN00000003538 ... + MDSPLREDDSQTCARLWEAEVKRHSLEGLTVFGTAVQIHNVQRRAIRAKGTQEAQAELLCRGPRLLDRFLEDACILKEGRGTDTGQHCRGDARISSHLEA + SGTHIQLLALFLVSSSDTPPSLLRFCHALEHDIRYNSSFDSYYPLSPHSRHNDDLQTPSSHLGYIITVPDPTLPLTFASLYLGMAPCTSMGSSSMGIFQS + QRIHAFMKGKNKWDEYEGRKESWKIRSNSQTGEPTF + >SHA1:ca996b263102b1... SEGUID:yplrJjECsVqQufeYy0HkDD16z58 NCBI:XR_001733142.1 gi:1034683989 + TTTACGTCTTTCTGGGAATTTATACTGGAAGTATACTTACCTCTGTGCAAAATTGCAAATATATAAGGTAATTCATTCCAGCATTGCTTATATTAGGTTG + AACTATGTAACATTGACATTGATGTGAATCAAAAATGGTTGAAGGCTGGCAGTTTCATATGATTCAGCCTATAATAGCAAAAGATTGAAAAAATCCATTA + ATACAGTGTGGTTCAAAAAAATTTGTTGTATCAAGGTAAAATAATAGCCTGAATATAATTAAGATAGTCTGTGTATACATCGATGAAAACATTGCCAATA + + +See `Installation <docs/installation.rst>`__ and `Mirroring +<docs/mirror.rst>`__ for more information. + +Environment Variables +!!!!!!!!!!!!!!!!!!!!! + +SEQREPO_LRU_CACHE_MAXSIZE sets the lru_cache maxsize for the sqlite query response caching. It defaults to 1 million but can also be set to "none" to be unlimited. + +Developing +!!!!!!!!!! + +Here's how to get started developing:: + + python3.6 -m venv + source venv/bin/activate + pip install -U setuptools pip + make develop + + + + +.. |pypi_rel| image:: https://badge.fury.io/py/biocommons.seqrepo.png + :target: https://pypi.org/pypi?name=biocommons.seqrepo + :align: middle + +.. |ci_rel| image:: https://travis-ci.org/biocommons/biocommons.seqrepo.svg?branch=master + :target: https://travis-ci.org/biocommons/biocommons.seqrepo + :align: middle + +.. |cov| image:: https://coveralls.io/repos/github/biocommons/biocommons.seqrepo/badge.svg?branch= + :target: https://coveralls.io/github/biocommons/biocommons.seqrepo?branch= + + + + +%package help +Summary: Development documents and examples for biocommons.seqrepo +Provides: python3-biocommons.seqrepo-doc +%description help +biocommons.seqrepo +!!!!!!!!!!!!!!!!!! + +Python package for writing and reading a local collection of +biological sequences. The repository is non-redundant, compressed, +and journalled, making it efficient to store and transfer multiple +snapshots. + +Clients refer to sequences and metadata using familiar identifiers, +such as NM_000551.3 or GRCh38:1, or any of several hash-based +identifiers. The interface supports fast slicing of arbitrary regions +of large sequences. + +A "fully-qualified" identifier includes a namespace to disambiguate +accessions (e.g., "1" in GRCh37 and GRCh38). If the namespace is +provided, seqrepo uses it as-is. If the namespace is not provided and +the unqualified identifier refers to a unique sequence, it is +returned; otherwise, ambiguous identifiers will raise an error. + +SeqRepo favors identifiers from [identifiers.org](identifiers.org) +whenever available. Examples include +[refseq](https://registry.identifiers.org/registry/refseq) and +[ensembl](https://registry.identifiers.org/registry/ensembl). + +`seqrepo-rest-service +<https://github.com/biocommons/seqrepo-rest-service>`__ provides a +REST interface and docker image. + +Released under the Apache License, 2.0. + +|ci_rel| | |cov| | |pypi_rel| | `ChangeLog <https://github.com/biocommons/biocommons.seqrepo/tree/master/docs/changelog/0.5>`_ + +Citation +!!!!!!!! + +| Hart RK, Prlić A (2020) +| SeqRepo: A system for managing local collections of biological sequences. +| PLoS ONE 15(12): e0239883. https://doi.org/10.1371/journal.pone.0239883 + + +Features +!!!!!!!! + +* Timestamped, read-only snapshots. +* Space-efficient storage of sequences within a single snapshot and + across snapshots. +* Bandwidth-efficient transfer incremental updates. +* Fast fetching of sequence slices on chromosome-scale sequences. +* Precomputed digests that may be used as sequence aliases. +* Mappings of external aliases (i.e., accessions or identifiers like + NM_013305.4) to sequences. + + +Deployments Scenarios +!!!!!!!!!!!!!!!!!!!!! +* Local read-only archive, mirrored from public site, + accessed via Python API (see `Mirroring documentation <docs/mirror.rst>`__) +* Local read-write archive, maintained with command + line utility and/or API (see `Command Line Interface documentation + <docs/cli.rst>`__). +* Docker data-only container that may be linked to application container. +* SeqRepo and refget REST API for local or remote access (see `seqrepo-rest-service <https://github.com/biocommons/seqrepo-rest-service>`__) + + +Technical Quick Peek +!!!!!!!!!!!!!!!!!!!! + +Within a single snapshot, sequences are stored *non-redundantly* and +*compressed* in an add-only journalled filesystem structure. A +truncated SHA-512 hash is used to assess uniquness and as an +internal id. (The digest is truncated for space efficiency.) + +Sequences are compressed using the Block GZipped Format (`BGZF +<https://samtools.github.io/hts-specs/SAMv1.pdf>`__)), which enables +pysam to provide fast random access to compressed sequences. (Variable +compression typically makes random access impossible.) + +Sequence files are immutable, thereby enabling the use of hardlinks +across snapshots and eliminating redundant transfers (e.g., with +rsync). + +Each sequence id is associated with a namespaced alias in a sqlite +database. Such as ``<seguid,rvvuhY0FxFLNwf10FXFIrSQ7AvQ>``, +``<NCBI,NP_004009.1>``, ``<gi,5032303>``, +``<ensembl-75ENSP00000354464>``, ``<ensembl-85,ENSP00000354464.4>``. +The sqlite database is mutable across releases. + +For calibration, recent releases that include 3 human genome +assemblies (including patches), and full RefSeq sets (NM, NR, NP, NT, +XM, and XP) consumes approximately 8GB. The minimum marginal size for +additional snapshots is approximately 2GB (for the sqlite database, +which is not hardlinked). + +For more information, see `<docs/design.rst>`__. + + + +Requirements +!!!!!!!!!!!! + +Reading a sequence repository requires several Python packages, all of +which are available from pypi. Installation should be as simple as +`pip install biocommons.seqrepo`. + +*Writing* sequence files also requires ``bgzip``, which provided in +the `htslib <https://github.com/samtools/htslib>`__ repo. Ubuntu users +should install the ``tabix`` package with ``sudo apt install tabix``. + +Development and deployments are on Ubuntu. Other systems may work but +are not tested. Patches to get other systems working would be +welcomed. + +**Mac Developers** If you get "xcrun: error: invalid active developer +path", you need to install XCode. See this `StackOverflow answer +<https://apple.stackexchange.com/questions/254380/why-am-i-getting-an-invalid-active-developer-path-when-attempting-to-use-git-a>`__. + + +Quick Start +!!!!!!!!!!! + +On Ubuntu 16.04:: + + $ sudo apt install -y python3-dev gcc zlib1g-dev tabix + $ pip install seqrepo + $ sudo mkdir /usr/local/share/seqrepo + $ sudo chown $USER /usr/local/share/seqrepo + $ seqrepo pull -i 2018-11-26 + $ seqrepo show-status -i 2018-11-26 + seqrepo 0.2.3.post3.dev8+nb8298bd62283 + root directory: /usr/local/share/seqrepo/2018-11-26, 7.9 GB + backends: fastadir (schema 1), seqaliasdb (schema 1) + sequences: 773587 sequences, 93051609959 residues, 192 files + aliases: 5579572 aliases, 5480085 current, 26 namespaces, 773587 sequences + + # Simple Pythonic interface to sequences + >> from biocommons.seqrepo import SeqRepo + >> sr = SeqRepo("/usr/local/share/seqrepo/latest") + >> sr["NC_000001.11"][780000:780020] + 'TGGTGGCACGCGCTTGTAGT' + + # Or, use the seqrepo shell for even easier access + $ seqrepo start-shell -i 2018-11-26 + In [1]: sr["NC_000001.11"][780000:780020] + Out[1]: 'TGGTGGCACGCGCTTGTAGT' + + # N.B. The following output is edited for simplicity + $ seqrepo export -i 2018-11-26 | head -n100 + >SHA1:9a2acba3dd7603f... SEGUID:mirLo912A/MppLuS1cUyFMduLUQ Ensembl-85:GENSCAN00000003538 ... + MDSPLREDDSQTCARLWEAEVKRHSLEGLTVFGTAVQIHNVQRRAIRAKGTQEAQAELLCRGPRLLDRFLEDACILKEGRGTDTGQHCRGDARISSHLEA + SGTHIQLLALFLVSSSDTPPSLLRFCHALEHDIRYNSSFDSYYPLSPHSRHNDDLQTPSSHLGYIITVPDPTLPLTFASLYLGMAPCTSMGSSSMGIFQS + QRIHAFMKGKNKWDEYEGRKESWKIRSNSQTGEPTF + >SHA1:ca996b263102b1... SEGUID:yplrJjECsVqQufeYy0HkDD16z58 NCBI:XR_001733142.1 gi:1034683989 + TTTACGTCTTTCTGGGAATTTATACTGGAAGTATACTTACCTCTGTGCAAAATTGCAAATATATAAGGTAATTCATTCCAGCATTGCTTATATTAGGTTG + AACTATGTAACATTGACATTGATGTGAATCAAAAATGGTTGAAGGCTGGCAGTTTCATATGATTCAGCCTATAATAGCAAAAGATTGAAAAAATCCATTA + ATACAGTGTGGTTCAAAAAAATTTGTTGTATCAAGGTAAAATAATAGCCTGAATATAATTAAGATAGTCTGTGTATACATCGATGAAAACATTGCCAATA + + +See `Installation <docs/installation.rst>`__ and `Mirroring +<docs/mirror.rst>`__ for more information. + +Environment Variables +!!!!!!!!!!!!!!!!!!!!! + +SEQREPO_LRU_CACHE_MAXSIZE sets the lru_cache maxsize for the sqlite query response caching. It defaults to 1 million but can also be set to "none" to be unlimited. + +Developing +!!!!!!!!!! + +Here's how to get started developing:: + + python3.6 -m venv + source venv/bin/activate + pip install -U setuptools pip + make develop + + + + +.. |pypi_rel| image:: https://badge.fury.io/py/biocommons.seqrepo.png + :target: https://pypi.org/pypi?name=biocommons.seqrepo + :align: middle + +.. |ci_rel| image:: https://travis-ci.org/biocommons/biocommons.seqrepo.svg?branch=master + :target: https://travis-ci.org/biocommons/biocommons.seqrepo + :align: middle + +.. |cov| image:: https://coveralls.io/repos/github/biocommons/biocommons.seqrepo/badge.svg?branch= + :target: https://coveralls.io/github/biocommons/biocommons.seqrepo?branch= + + + + +%prep +%autosetup -n biocommons.seqrepo-0.6.5 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-biocommons.seqrepo -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.6.5-1 +- Package Spec generated @@ -0,0 +1 @@ +2a21f81efde5d3998eaea8ad7eb54ca3 biocommons.seqrepo-0.6.5.tar.gz |