automatic import of python-biocommons-seqrepo

author: CoprDistGit <infra@openeuler.org> 2023-04-11 17:43:41 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-04-11 17:43:41 +0000
commit: 01098c86272e2bb2d69e7a7ad54e75a9b0efde90 (patch)
tree: 784c20c0f26dcebf6f2e19ce7e8c44ad460e07b1
parent: 8c4e956179d65505400868a94ff8e1e54fc4c05c (diff)
3 files changed, 654 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..c0c6342 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/biocommons.seqrepo-0.6.5.tar.gz
diff --git a/python-biocommons-seqrepo.spec b/python-biocommons-seqrepo.spec
new file mode 100644
index 0000000..7d33ef6
--- /dev/null
+++ b/python-biocommons-seqrepo.spec
@@ -0,0 +1,652 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-biocommons.seqrepo
+Version:	0.6.5
+Release:	1
+Summary:	Non-redundant, compressed, journalled, file-based storage for biological sequences
+License:	Apache Software License
+URL:		https://github.com/biocommons/biocommons.seqrepo
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/f1/55/7bff2f7bd3971925d18bdfb9e666b6d0bb1abd7f28da7a82a19fc5a6529f/biocommons.seqrepo-0.6.5.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-bioutils
+Requires:	python3-coloredlogs
+Requires:	python3-ipython
+Requires:	python3-pysam
+Requires:	python3-requests
+Requires:	python3-requests-html
+Requires:	python3-six
+Requires:	python3-tqdm
+Requires:	python3-yoyo-migrations
+Requires:	python3-tox
+
+%description
+biocommons.seqrepo
+!!!!!!!!!!!!!!!!!!
+
+Python package for writing and reading a local collection of
+biological sequences.  The repository is non-redundant, compressed,
+and journalled, making it efficient to store and transfer multiple
+snapshots.
+
+Clients refer to sequences and metadata using familiar identifiers,
+such as NM_000551.3 or GRCh38:1, or any of several hash-based
+identifiers. The interface supports fast slicing of arbitrary regions
+of large sequences.
+
+A "fully-qualified" identifier includes a namespace to disambiguate
+accessions (e.g., "1" in GRCh37 and GRCh38). If the namespace is
+provided, seqrepo uses it as-is. If the namespace is not provided and
+the unqualified identifier refers to a unique sequence, it is
+returned; otherwise, ambiguous identifiers will raise an error.
+
+SeqRepo favors identifiers from [identifiers.org](identifiers.org)
+whenever available.  Examples include
+[refseq](https://registry.identifiers.org/registry/refseq) and
+[ensembl](https://registry.identifiers.org/registry/ensembl).
+
+`seqrepo-rest-service
+<https://github.com/biocommons/seqrepo-rest-service>`__ provides a
+REST interface and docker image.
+
+Released under the Apache License, 2.0.
+
+|ci_rel| | |cov| | |pypi_rel| | `ChangeLog <https://github.com/biocommons/biocommons.seqrepo/tree/master/docs/changelog/0.5>`_
+
+Citation
+!!!!!!!!
+
+| Hart RK, Prlić A (2020)
+| SeqRepo: A system for managing local collections of biological sequences.
+| PLoS ONE 15(12): e0239883. https://doi.org/10.1371/journal.pone.0239883
+
+
+Features
+!!!!!!!!
+
+* Timestamped, read-only snapshots.
+* Space-efficient storage of sequences within a single snapshot and
+  across snapshots.
+* Bandwidth-efficient transfer incremental updates.
+* Fast fetching of sequence slices on chromosome-scale sequences.
+* Precomputed digests that may be used as sequence aliases.
+* Mappings of external aliases (i.e., accessions or identifiers like
+  NM_013305.4) to sequences.
+
+
+Deployments Scenarios
+!!!!!!!!!!!!!!!!!!!!!
+* Local read-only archive, mirrored from public site,
+  accessed via Python API (see `Mirroring documentation <docs/mirror.rst>`__)
+* Local read-write archive, maintained with command
+  line utility and/or API (see `Command Line Interface documentation
+  <docs/cli.rst>`__).
+* Docker data-only container that may be linked to application container.
+* SeqRepo and refget REST API for local or remote access (see `seqrepo-rest-service <https://github.com/biocommons/seqrepo-rest-service>`__)
+
+
+Technical Quick Peek
+!!!!!!!!!!!!!!!!!!!!
+
+Within a single snapshot, sequences are stored *non-redundantly* and
+*compressed* in an add-only journalled filesystem structure.  A
+truncated SHA-512 hash is used to assess uniquness and as an
+internal id.  (The digest is truncated for space efficiency.)
+
+Sequences are compressed using the Block GZipped Format (`BGZF
+<https://samtools.github.io/hts-specs/SAMv1.pdf>`__)), which enables
+pysam to provide fast random access to compressed sequences. (Variable
+compression typically makes random access impossible.)
+
+Sequence files are immutable, thereby enabling the use of hardlinks
+across snapshots and eliminating redundant transfers (e.g., with
+rsync).
+
+Each sequence id is associated with a namespaced alias in a sqlite
+database.  Such as ``<seguid,rvvuhY0FxFLNwf10FXFIrSQ7AvQ>``,
+``<NCBI,NP_004009.1>``, ``<gi,5032303>``,
+``<ensembl-75ENSP00000354464>``, ``<ensembl-85,ENSP00000354464.4>``.
+The sqlite database is mutable across releases.
+
+For calibration, recent releases that include 3 human genome
+assemblies (including patches), and full RefSeq sets (NM, NR, NP, NT,
+XM, and XP) consumes approximately 8GB.  The minimum marginal size for
+additional snapshots is approximately 2GB (for the sqlite database,
+which is not hardlinked).
+
+For more information, see `<docs/design.rst>`__.
+
+
+
+Requirements
+!!!!!!!!!!!!
+
+Reading a sequence repository requires several Python packages, all of
+which are available from pypi. Installation should be as simple as
+`pip install biocommons.seqrepo`.
+
+*Writing* sequence files also requires ``bgzip``, which provided in
+the `htslib <https://github.com/samtools/htslib>`__ repo. Ubuntu users
+should install the ``tabix`` package with ``sudo apt install tabix``.
+
+Development and deployments are on Ubuntu. Other systems may work but
+are not tested.  Patches to get other systems working would be
+welcomed.
+
+**Mac Developers** If you get "xcrun: error: invalid active developer
+path", you need to install XCode. See this `StackOverflow answer
+<https://apple.stackexchange.com/questions/254380/why-am-i-getting-an-invalid-active-developer-path-when-attempting-to-use-git-a>`__.
+
+
+Quick Start
+!!!!!!!!!!!
+
+On Ubuntu 16.04::
+
+  $ sudo apt install -y python3-dev gcc zlib1g-dev tabix
+  $ pip install seqrepo
+  $ sudo mkdir /usr/local/share/seqrepo
+  $ sudo chown $USER /usr/local/share/seqrepo
+  $ seqrepo pull -i 2018-11-26 
+  $ seqrepo show-status -i 2018-11-26 
+  seqrepo 0.2.3.post3.dev8+nb8298bd62283
+  root directory: /usr/local/share/seqrepo/2018-11-26, 7.9 GB
+  backends: fastadir (schema 1), seqaliasdb (schema 1) 
+  sequences: 773587 sequences, 93051609959 residues, 192 files
+  aliases: 5579572 aliases, 5480085 current, 26 namespaces, 773587 sequences
+  
+  # Simple Pythonic interface to sequences
+  >> from biocommons.seqrepo import SeqRepo
+  >> sr = SeqRepo("/usr/local/share/seqrepo/latest")
+  >> sr["NC_000001.11"][780000:780020]
+  'TGGTGGCACGCGCTTGTAGT'
+
+  # Or, use the seqrepo shell for even easier access
+  $ seqrepo start-shell -i 2018-11-26
+  In [1]: sr["NC_000001.11"][780000:780020]
+  Out[1]: 'TGGTGGCACGCGCTTGTAGT'
+  
+  # N.B. The following output is edited for simplicity
+  $ seqrepo export -i 2018-11-26 | head -n100
+  >SHA1:9a2acba3dd7603f... SEGUID:mirLo912A/MppLuS1cUyFMduLUQ Ensembl-85:GENSCAN00000003538 ...
+  MDSPLREDDSQTCARLWEAEVKRHSLEGLTVFGTAVQIHNVQRRAIRAKGTQEAQAELLCRGPRLLDRFLEDACILKEGRGTDTGQHCRGDARISSHLEA
+  SGTHIQLLALFLVSSSDTPPSLLRFCHALEHDIRYNSSFDSYYPLSPHSRHNDDLQTPSSHLGYIITVPDPTLPLTFASLYLGMAPCTSMGSSSMGIFQS
+  QRIHAFMKGKNKWDEYEGRKESWKIRSNSQTGEPTF
+  >SHA1:ca996b263102b1... SEGUID:yplrJjECsVqQufeYy0HkDD16z58 NCBI:XR_001733142.1 gi:1034683989
+  TTTACGTCTTTCTGGGAATTTATACTGGAAGTATACTTACCTCTGTGCAAAATTGCAAATATATAAGGTAATTCATTCCAGCATTGCTTATATTAGGTTG
+  AACTATGTAACATTGACATTGATGTGAATCAAAAATGGTTGAAGGCTGGCAGTTTCATATGATTCAGCCTATAATAGCAAAAGATTGAAAAAATCCATTA
+  ATACAGTGTGGTTCAAAAAAATTTGTTGTATCAAGGTAAAATAATAGCCTGAATATAATTAAGATAGTCTGTGTATACATCGATGAAAACATTGCCAATA
+
+
+See `Installation <docs/installation.rst>`__ and `Mirroring
+<docs/mirror.rst>`__ for more information.
+
+Environment Variables
+!!!!!!!!!!!!!!!!!!!!!
+
+SEQREPO_LRU_CACHE_MAXSIZE sets the lru_cache maxsize for the sqlite query response caching. It defaults to 1 million but can also be set to "none" to be unlimited.
+
+Developing
+!!!!!!!!!!
+
+Here's how to get started developing::
+
+  python3.6 -m venv
+  source venv/bin/activate
+  pip install -U setuptools pip
+  make develop
+
+
+
+
+.. |pypi_rel| image:: https://badge.fury.io/py/biocommons.seqrepo.png
+  :target: https://pypi.org/pypi?name=biocommons.seqrepo
+  :align: middle
+
+.. |ci_rel| image:: https://travis-ci.org/biocommons/biocommons.seqrepo.svg?branch=master
+  :target: https://travis-ci.org/biocommons/biocommons.seqrepo
+  :align: middle 
+
+.. |cov| image:: https://coveralls.io/repos/github/biocommons/biocommons.seqrepo/badge.svg?branch=
+  :target: https://coveralls.io/github/biocommons/biocommons.seqrepo?branch=
+
+
+
+
+%package -n python3-biocommons.seqrepo
+Summary:	Non-redundant, compressed, journalled, file-based storage for biological sequences
+Provides:	python-biocommons.seqrepo
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-biocommons.seqrepo
+biocommons.seqrepo
+!!!!!!!!!!!!!!!!!!
+
+Python package for writing and reading a local collection of
+biological sequences.  The repository is non-redundant, compressed,
+and journalled, making it efficient to store and transfer multiple
+snapshots.
+
+Clients refer to sequences and metadata using familiar identifiers,
+such as NM_000551.3 or GRCh38:1, or any of several hash-based
+identifiers. The interface supports fast slicing of arbitrary regions
+of large sequences.
+
+A "fully-qualified" identifier includes a namespace to disambiguate
+accessions (e.g., "1" in GRCh37 and GRCh38). If the namespace is
+provided, seqrepo uses it as-is. If the namespace is not provided and
+the unqualified identifier refers to a unique sequence, it is
+returned; otherwise, ambiguous identifiers will raise an error.
+
+SeqRepo favors identifiers from [identifiers.org](identifiers.org)
+whenever available.  Examples include
+[refseq](https://registry.identifiers.org/registry/refseq) and
+[ensembl](https://registry.identifiers.org/registry/ensembl).
+
+`seqrepo-rest-service
+<https://github.com/biocommons/seqrepo-rest-service>`__ provides a
+REST interface and docker image.
+
+Released under the Apache License, 2.0.
+
+|ci_rel| | |cov| | |pypi_rel| | `ChangeLog <https://github.com/biocommons/biocommons.seqrepo/tree/master/docs/changelog/0.5>`_
+
+Citation
+!!!!!!!!
+
+| Hart RK, Prlić A (2020)
+| SeqRepo: A system for managing local collections of biological sequences.
+| PLoS ONE 15(12): e0239883. https://doi.org/10.1371/journal.pone.0239883
+
+
+Features
+!!!!!!!!
+
+* Timestamped, read-only snapshots.
+* Space-efficient storage of sequences within a single snapshot and
+  across snapshots.
+* Bandwidth-efficient transfer incremental updates.
+* Fast fetching of sequence slices on chromosome-scale sequences.
+* Precomputed digests that may be used as sequence aliases.
+* Mappings of external aliases (i.e., accessions or identifiers like
+  NM_013305.4) to sequences.
+
+
+Deployments Scenarios
+!!!!!!!!!!!!!!!!!!!!!
+* Local read-only archive, mirrored from public site,
+  accessed via Python API (see `Mirroring documentation <docs/mirror.rst>`__)
+* Local read-write archive, maintained with command
+  line utility and/or API (see `Command Line Interface documentation
+  <docs/cli.rst>`__).
+* Docker data-only container that may be linked to application container.
+* SeqRepo and refget REST API for local or remote access (see `seqrepo-rest-service <https://github.com/biocommons/seqrepo-rest-service>`__)
+
+
+Technical Quick Peek
+!!!!!!!!!!!!!!!!!!!!
+
+Within a single snapshot, sequences are stored *non-redundantly* and
+*compressed* in an add-only journalled filesystem structure.  A
+truncated SHA-512 hash is used to assess uniquness and as an
+internal id.  (The digest is truncated for space efficiency.)
+
+Sequences are compressed using the Block GZipped Format (`BGZF
+<https://samtools.github.io/hts-specs/SAMv1.pdf>`__)), which enables
+pysam to provide fast random access to compressed sequences. (Variable
+compression typically makes random access impossible.)
+
+Sequence files are immutable, thereby enabling the use of hardlinks
+across snapshots and eliminating redundant transfers (e.g., with
+rsync).
+
+Each sequence id is associated with a namespaced alias in a sqlite
+database.  Such as ``<seguid,rvvuhY0FxFLNwf10FXFIrSQ7AvQ>``,
+``<NCBI,NP_004009.1>``, ``<gi,5032303>``,
+``<ensembl-75ENSP00000354464>``, ``<ensembl-85,ENSP00000354464.4>``.
+The sqlite database is mutable across releases.
+
+For calibration, recent releases that include 3 human genome
+assemblies (including patches), and full RefSeq sets (NM, NR, NP, NT,
+XM, and XP) consumes approximately 8GB.  The minimum marginal size for
+additional snapshots is approximately 2GB (for the sqlite database,
+which is not hardlinked).
+
+For more information, see `<docs/design.rst>`__.
+
+
+
+Requirements
+!!!!!!!!!!!!
+
+Reading a sequence repository requires several Python packages, all of
+which are available from pypi. Installation should be as simple as
+`pip install biocommons.seqrepo`.
+
+*Writing* sequence files also requires ``bgzip``, which provided in
+the `htslib <https://github.com/samtools/htslib>`__ repo. Ubuntu users
+should install the ``tabix`` package with ``sudo apt install tabix``.
+
+Development and deployments are on Ubuntu. Other systems may work but
+are not tested.  Patches to get other systems working would be
+welcomed.
+
+**Mac Developers** If you get "xcrun: error: invalid active developer
+path", you need to install XCode. See this `StackOverflow answer
+<https://apple.stackexchange.com/questions/254380/why-am-i-getting-an-invalid-active-developer-path-when-attempting-to-use-git-a>`__.
+
+
+Quick Start
+!!!!!!!!!!!
+
+On Ubuntu 16.04::
+
+  $ sudo apt install -y python3-dev gcc zlib1g-dev tabix
+  $ pip install seqrepo
+  $ sudo mkdir /usr/local/share/seqrepo
+  $ sudo chown $USER /usr/local/share/seqrepo
+  $ seqrepo pull -i 2018-11-26 
+  $ seqrepo show-status -i 2018-11-26 
+  seqrepo 0.2.3.post3.dev8+nb8298bd62283
+  root directory: /usr/local/share/seqrepo/2018-11-26, 7.9 GB
+  backends: fastadir (schema 1), seqaliasdb (schema 1) 
+  sequences: 773587 sequences, 93051609959 residues, 192 files
+  aliases: 5579572 aliases, 5480085 current, 26 namespaces, 773587 sequences
+  
+  # Simple Pythonic interface to sequences
+  >> from biocommons.seqrepo import SeqRepo
+  >> sr = SeqRepo("/usr/local/share/seqrepo/latest")
+  >> sr["NC_000001.11"][780000:780020]
+  'TGGTGGCACGCGCTTGTAGT'
+
+  # Or, use the seqrepo shell for even easier access
+  $ seqrepo start-shell -i 2018-11-26
+  In [1]: sr["NC_000001.11"][780000:780020]
+  Out[1]: 'TGGTGGCACGCGCTTGTAGT'
+  
+  # N.B. The following output is edited for simplicity
+  $ seqrepo export -i 2018-11-26 | head -n100
+  >SHA1:9a2acba3dd7603f... SEGUID:mirLo912A/MppLuS1cUyFMduLUQ Ensembl-85:GENSCAN00000003538 ...
+  MDSPLREDDSQTCARLWEAEVKRHSLEGLTVFGTAVQIHNVQRRAIRAKGTQEAQAELLCRGPRLLDRFLEDACILKEGRGTDTGQHCRGDARISSHLEA
+  SGTHIQLLALFLVSSSDTPPSLLRFCHALEHDIRYNSSFDSYYPLSPHSRHNDDLQTPSSHLGYIITVPDPTLPLTFASLYLGMAPCTSMGSSSMGIFQS
+  QRIHAFMKGKNKWDEYEGRKESWKIRSNSQTGEPTF
+  >SHA1:ca996b263102b1... SEGUID:yplrJjECsVqQufeYy0HkDD16z58 NCBI:XR_001733142.1 gi:1034683989
+  TTTACGTCTTTCTGGGAATTTATACTGGAAGTATACTTACCTCTGTGCAAAATTGCAAATATATAAGGTAATTCATTCCAGCATTGCTTATATTAGGTTG
+  AACTATGTAACATTGACATTGATGTGAATCAAAAATGGTTGAAGGCTGGCAGTTTCATATGATTCAGCCTATAATAGCAAAAGATTGAAAAAATCCATTA
+  ATACAGTGTGGTTCAAAAAAATTTGTTGTATCAAGGTAAAATAATAGCCTGAATATAATTAAGATAGTCTGTGTATACATCGATGAAAACATTGCCAATA
+
+
+See `Installation <docs/installation.rst>`__ and `Mirroring
+<docs/mirror.rst>`__ for more information.
+
+Environment Variables
+!!!!!!!!!!!!!!!!!!!!!
+
+SEQREPO_LRU_CACHE_MAXSIZE sets the lru_cache maxsize for the sqlite query response caching. It defaults to 1 million but can also be set to "none" to be unlimited.
+
+Developing
+!!!!!!!!!!
+
+Here's how to get started developing::
+
+  python3.6 -m venv
+  source venv/bin/activate
+  pip install -U setuptools pip
+  make develop
+
+
+
+
+.. |pypi_rel| image:: https://badge.fury.io/py/biocommons.seqrepo.png
+  :target: https://pypi.org/pypi?name=biocommons.seqrepo
+  :align: middle
+
+.. |ci_rel| image:: https://travis-ci.org/biocommons/biocommons.seqrepo.svg?branch=master
+  :target: https://travis-ci.org/biocommons/biocommons.seqrepo
+  :align: middle 
+
+.. |cov| image:: https://coveralls.io/repos/github/biocommons/biocommons.seqrepo/badge.svg?branch=
+  :target: https://coveralls.io/github/biocommons/biocommons.seqrepo?branch=
+
+
+
+
+%package help
+Summary:	Development documents and examples for biocommons.seqrepo
+Provides:	python3-biocommons.seqrepo-doc
+%description help
+biocommons.seqrepo
+!!!!!!!!!!!!!!!!!!
+
+Python package for writing and reading a local collection of
+biological sequences.  The repository is non-redundant, compressed,
+and journalled, making it efficient to store and transfer multiple
+snapshots.
+
+Clients refer to sequences and metadata using familiar identifiers,
+such as NM_000551.3 or GRCh38:1, or any of several hash-based
+identifiers. The interface supports fast slicing of arbitrary regions
+of large sequences.
+
+A "fully-qualified" identifier includes a namespace to disambiguate
+accessions (e.g., "1" in GRCh37 and GRCh38). If the namespace is
+provided, seqrepo uses it as-is. If the namespace is not provided and
+the unqualified identifier refers to a unique sequence, it is
+returned; otherwise, ambiguous identifiers will raise an error.
+
+SeqRepo favors identifiers from [identifiers.org](identifiers.org)
+whenever available.  Examples include
+[refseq](https://registry.identifiers.org/registry/refseq) and
+[ensembl](https://registry.identifiers.org/registry/ensembl).
+
+`seqrepo-rest-service
+<https://github.com/biocommons/seqrepo-rest-service>`__ provides a
+REST interface and docker image.
+
+Released under the Apache License, 2.0.
+
+|ci_rel| | |cov| | |pypi_rel| | `ChangeLog <https://github.com/biocommons/biocommons.seqrepo/tree/master/docs/changelog/0.5>`_
+
+Citation
+!!!!!!!!
+
+| Hart RK, Prlić A (2020)
+| SeqRepo: A system for managing local collections of biological sequences.
+| PLoS ONE 15(12): e0239883. https://doi.org/10.1371/journal.pone.0239883
+
+
+Features
+!!!!!!!!
+
+* Timestamped, read-only snapshots.
+* Space-efficient storage of sequences within a single snapshot and
+  across snapshots.
+* Bandwidth-efficient transfer incremental updates.
+* Fast fetching of sequence slices on chromosome-scale sequences.
+* Precomputed digests that may be used as sequence aliases.
+* Mappings of external aliases (i.e., accessions or identifiers like
+  NM_013305.4) to sequences.
+
+
+Deployments Scenarios
+!!!!!!!!!!!!!!!!!!!!!
+* Local read-only archive, mirrored from public site,
+  accessed via Python API (see `Mirroring documentation <docs/mirror.rst>`__)
+* Local read-write archive, maintained with command
+  line utility and/or API (see `Command Line Interface documentation
+  <docs/cli.rst>`__).
+* Docker data-only container that may be linked to application container.
+* SeqRepo and refget REST API for local or remote access (see `seqrepo-rest-service <https://github.com/biocommons/seqrepo-rest-service>`__)
+
+
+Technical Quick Peek
+!!!!!!!!!!!!!!!!!!!!
+
+Within a single snapshot, sequences are stored *non-redundantly* and
+*compressed* in an add-only journalled filesystem structure.  A
+truncated SHA-512 hash is used to assess uniquness and as an
+internal id.  (The digest is truncated for space efficiency.)
+
+Sequences are compressed using the Block GZipped Format (`BGZF
+<https://samtools.github.io/hts-specs/SAMv1.pdf>`__)), which enables
+pysam to provide fast random access to compressed sequences. (Variable
+compression typically makes random access impossible.)
+
+Sequence files are immutable, thereby enabling the use of hardlinks
+across snapshots and eliminating redundant transfers (e.g., with
+rsync).
+
+Each sequence id is associated with a namespaced alias in a sqlite
+database.  Such as ``<seguid,rvvuhY0FxFLNwf10FXFIrSQ7AvQ>``,
+``<NCBI,NP_004009.1>``, ``<gi,5032303>``,
+``<ensembl-75ENSP00000354464>``, ``<ensembl-85,ENSP00000354464.4>``.
+The sqlite database is mutable across releases.
+
+For calibration, recent releases that include 3 human genome
+assemblies (including patches), and full RefSeq sets (NM, NR, NP, NT,
+XM, and XP) consumes approximately 8GB.  The minimum marginal size for
+additional snapshots is approximately 2GB (for the sqlite database,
+which is not hardlinked).
+
+For more information, see `<docs/design.rst>`__.
+
+
+
+Requirements
+!!!!!!!!!!!!
+
+Reading a sequence repository requires several Python packages, all of
+which are available from pypi. Installation should be as simple as
+`pip install biocommons.seqrepo`.
+
+*Writing* sequence files also requires ``bgzip``, which provided in
+the `htslib <https://github.com/samtools/htslib>`__ repo. Ubuntu users
+should install the ``tabix`` package with ``sudo apt install tabix``.
+
+Development and deployments are on Ubuntu. Other systems may work but
+are not tested.  Patches to get other systems working would be
+welcomed.
+
+**Mac Developers** If you get "xcrun: error: invalid active developer
+path", you need to install XCode. See this `StackOverflow answer
+<https://apple.stackexchange.com/questions/254380/why-am-i-getting-an-invalid-active-developer-path-when-attempting-to-use-git-a>`__.
+
+
+Quick Start
+!!!!!!!!!!!
+
+On Ubuntu 16.04::
+
+  $ sudo apt install -y python3-dev gcc zlib1g-dev tabix
+  $ pip install seqrepo
+  $ sudo mkdir /usr/local/share/seqrepo
+  $ sudo chown $USER /usr/local/share/seqrepo
+  $ seqrepo pull -i 2018-11-26 
+  $ seqrepo show-status -i 2018-11-26 
+  seqrepo 0.2.3.post3.dev8+nb8298bd62283
+  root directory: /usr/local/share/seqrepo/2018-11-26, 7.9 GB
+  backends: fastadir (schema 1), seqaliasdb (schema 1) 
+  sequences: 773587 sequences, 93051609959 residues, 192 files
+  aliases: 5579572 aliases, 5480085 current, 26 namespaces, 773587 sequences
+  
+  # Simple Pythonic interface to sequences
+  >> from biocommons.seqrepo import SeqRepo
+  >> sr = SeqRepo("/usr/local/share/seqrepo/latest")
+  >> sr["NC_000001.11"][780000:780020]
+  'TGGTGGCACGCGCTTGTAGT'
+
+  # Or, use the seqrepo shell for even easier access
+  $ seqrepo start-shell -i 2018-11-26
+  In [1]: sr["NC_000001.11"][780000:780020]
+  Out[1]: 'TGGTGGCACGCGCTTGTAGT'
+  
+  # N.B. The following output is edited for simplicity
+  $ seqrepo export -i 2018-11-26 | head -n100
+  >SHA1:9a2acba3dd7603f... SEGUID:mirLo912A/MppLuS1cUyFMduLUQ Ensembl-85:GENSCAN00000003538 ...
+  MDSPLREDDSQTCARLWEAEVKRHSLEGLTVFGTAVQIHNVQRRAIRAKGTQEAQAELLCRGPRLLDRFLEDACILKEGRGTDTGQHCRGDARISSHLEA
+  SGTHIQLLALFLVSSSDTPPSLLRFCHALEHDIRYNSSFDSYYPLSPHSRHNDDLQTPSSHLGYIITVPDPTLPLTFASLYLGMAPCTSMGSSSMGIFQS
+  QRIHAFMKGKNKWDEYEGRKESWKIRSNSQTGEPTF
+  >SHA1:ca996b263102b1... SEGUID:yplrJjECsVqQufeYy0HkDD16z58 NCBI:XR_001733142.1 gi:1034683989
+  TTTACGTCTTTCTGGGAATTTATACTGGAAGTATACTTACCTCTGTGCAAAATTGCAAATATATAAGGTAATTCATTCCAGCATTGCTTATATTAGGTTG
+  AACTATGTAACATTGACATTGATGTGAATCAAAAATGGTTGAAGGCTGGCAGTTTCATATGATTCAGCCTATAATAGCAAAAGATTGAAAAAATCCATTA
+  ATACAGTGTGGTTCAAAAAAATTTGTTGTATCAAGGTAAAATAATAGCCTGAATATAATTAAGATAGTCTGTGTATACATCGATGAAAACATTGCCAATA
+
+
+See `Installation <docs/installation.rst>`__ and `Mirroring
+<docs/mirror.rst>`__ for more information.
+
+Environment Variables
+!!!!!!!!!!!!!!!!!!!!!
+
+SEQREPO_LRU_CACHE_MAXSIZE sets the lru_cache maxsize for the sqlite query response caching. It defaults to 1 million but can also be set to "none" to be unlimited.
+
+Developing
+!!!!!!!!!!
+
+Here's how to get started developing::
+
+  python3.6 -m venv
+  source venv/bin/activate
+  pip install -U setuptools pip
+  make develop
+
+
+
+
+.. |pypi_rel| image:: https://badge.fury.io/py/biocommons.seqrepo.png
+  :target: https://pypi.org/pypi?name=biocommons.seqrepo
+  :align: middle
+
+.. |ci_rel| image:: https://travis-ci.org/biocommons/biocommons.seqrepo.svg?branch=master
+  :target: https://travis-ci.org/biocommons/biocommons.seqrepo
+  :align: middle 
+
+.. |cov| image:: https://coveralls.io/repos/github/biocommons/biocommons.seqrepo/badge.svg?branch=
+  :target: https://coveralls.io/github/biocommons/biocommons.seqrepo?branch=
+
+
+
+
+%prep
+%autosetup -n biocommons.seqrepo-0.6.5
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-biocommons.seqrepo -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.6.5-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..1619caf
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+2a21f81efde5d3998eaea8ad7eb54ca3  biocommons.seqrepo-0.6.5.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-04-11 17:43:41 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-04-11 17:43:41 +0000
commit	01098c86272e2bb2d69e7a7ad54e75a9b0efde90 (patch)
tree	784c20c0f26dcebf6f2e19ce7e8c44ad460e07b1
parent	8c4e956179d65505400868a94ff8e1e54fc4c05c (diff)