automatic import of python-chembl-downloaderopeneuler20.03

author: CoprDistGit <infra@openeuler.org> 2023-05-05 03:25:27 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-05 03:25:27 +0000
commit: 7b8f3da36b371a228e634b18bccea1b8e28d2aaf (patch)
tree: 0eec743bc541d0c8ca37c25347afc19a40f864d0 /python-chembl-downloader.spec
parent: 27594c60577daecd52c57ffe68e55de92491c189 (diff)
1 files changed, 955 insertions, 0 deletions
diff --git a/python-chembl-downloader.spec b/python-chembl-downloader.spec
new file mode 100644
index 0000000..33a8ea9
--- /dev/null
+++ b/python-chembl-downloader.spec
@@ -0,0 +1,955 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-chembl-downloader
+Version:	0.4.2
+Release:	1
+Summary:	Reproducibly download, open, parse, and query ChEMBL
+License:	MIT
+URL:		https://github.com/cthoyt/chembl_downloader
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/c6/70/5f1b39298e2a1ee4f64f362967d125e254ff7b283a97a0352168c42cca54/chembl_downloader-0.4.2.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-click
+Requires:	python3-more-click
+Requires:	python3-pystow
+Requires:	python3-tqdm
+Requires:	python3-sphinx
+Requires:	python3-sphinx-rtd-theme
+Requires:	python3-sphinx-click
+Requires:	python3-sphinx-autodoc-typehints
+Requires:	python3-sphinx-automodapi
+Requires:	python3-pandas
+Requires:	python3-rdkit-pypi
+Requires:	python3-pytest
+Requires:	python3-coverage
+
+%description
+<h1 align="center">
+    chembl_downloader
+</h1>
+
+<p align="center">
+    <a href="https://pypi.org/project/chembl_downloader">
+        <img alt="PyPI" src="https://img.shields.io/pypi/v/chembl_downloader" />
+    </a>
+    <a href="https://pypi.org/project/chembl_downloader">
+        <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/chembl_downloader" />
+    </a>
+    <a href="https://github.com/cthoyt/chembl_downloader/blob/main/LICENSE">
+        <img alt="PyPI - License" src="https://img.shields.io/pypi/l/chembl_downloader" />
+    </a>
+    <a href="https://zenodo.org/badge/latestdoi/390113187">
+        <img src="https://zenodo.org/badge/390113187.svg" alt="DOI" />
+    </a>
+    <a href="https://github.com/psf/black">
+        <img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black" />
+    </a>
+    <a href='https://chembl-downloader.readthedocs.io/en/latest/?badge=latest'>
+        <img src='https://readthedocs.org/projects/chembl-downloader/badge/?version=latest' alt='Documentation Status' />
+    </a>
+</p>
+
+Don't worry about downloading/extracting ChEMBL or versioning - just use ``chembl_downloader`` to write code that knows
+how to download it and use it automatically.
+
+Install with:
+
+```bash
+$ pip install chembl-downloader
+```
+
+Full technical documentation can be found on
+[ReadTheDocs](https://chembl-downloader.readthedocs.io). Tutorials can be found
+in Jupyter notebooks in the [notebooks/](notebooks/) directory of the
+repository.
+
+## Database Usage
+
+### Download A Specific Version
+
+```python
+import chembl_downloader
+
+path = chembl_downloader.download_extract_sqlite(version='28')
+```
+
+After it's been downloaded and extracted once, it's smart and does not need to download again. It gets stored
+using [`pystow`](https://github.com/cthoyt/pystow) automatically in the `~/.data/chembl`
+directory.
+
+We'd like to implement something such that it could load directly into SQLite from the archive, but it appears this is
+a [paid feature](https://sqlite.org/purchase/zipvfs).
+
+### Download the Latest Version
+
+You can modify the previous code slightly by omitting the `version` keyword
+argument to automatically find the latest version of ChEMBL:
+
+```python
+import chembl_downloader
+
+path = chembl_downloader.download_extract_sqlite()
+```
+
+The `version` keyword argument is available for all functions in this package (e.g., including
+`connect()`, `cursor()`, and `query()`), but will be omitted below for brevity.
+
+### Automate Connection
+
+Inside the archive is a single SQLite database file. Normally, people manually untar this folder then do something with
+the resulting file. Don't do this, it's not reproducible!
+Instead, the file can be downloaded and a connection can be opened automatically with:
+
+```python
+import chembl_downloader
+
+with chembl_downloader.connect() as conn:
+    with conn.cursor() as cursor:
+        cursor.execute(...)  # run your query string
+        rows = cursor.fetchall()  # get your results
+```
+
+The `cursor()` function provides a convenient wrapper around this operation:
+
+```python
+import chembl_downloader
+
+with chembl_downloader.cursor() as cursor:
+    cursor.execute(...)  # run your query string
+    rows = cursor.fetchall()  # get your results
+```
+
+### Run a query and get a pandas DataFrame
+
+The most powerful function is `query()` which builds on the previous `connect()` function in combination
+with [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html)
+to make a query and load the results into a pandas DataFrame for any downstream use.
+
+```python
+import chembl_downloader
+
+sql = """
+SELECT
+    MOLECULE_DICTIONARY.chembl_id,
+    MOLECULE_DICTIONARY.pref_name
+FROM MOLECULE_DICTIONARY
+JOIN COMPOUND_STRUCTURES ON MOLECULE_DICTIONARY.molregno == COMPOUND_STRUCTURES.molregno
+WHERE molecule_dictionary.pref_name IS NOT NULL
+LIMIT 5
+"""
+
+df = chembl_downloader.query(sql)
+df.to_csv(..., sep='\t', index=False)
+```
+
+Suggestion 1: use `pystow` to make a reproducible file path that's portable to other people's machines
+(e.g., it doesn't have your username in the path).
+
+Suggestion 2: RDKit is now pip-installable with `pip install rdkit-pypi`, which means most users don't have to muck
+around with complicated conda environments and configurations. One of the powerful but understated tools in RDKit is
+the [rdkit.Chem.PandasTools](https://rdkit.org/docs/source/rdkit.Chem.PandasTools.html)
+module.
+
+### Access an RDKit supplier over entries in the SDF dump
+
+This example is a bit more fit-for-purpose than the last two. The `supplier()` function makes sure that the latest SDF
+dump is downloaded and loads it from the gzip file into a `rdkit.Chem.ForwardSDMolSupplier`
+using a context manager to make sure the file doesn't get closed until after parsing is done. Like the previous
+examples, it can also explicitly take a `version`.
+
+```python
+from rdkit import Chem
+
+import chembl_downloader
+
+with chembl_downloader.supplier() as suppl:
+    data = []
+    for i, mol in enumerate(suppl):
+        if mol is None or mol.GetNumAtoms() > 50:
+            continue
+        fp = Chem.PatternFingerprint(mol, fpSize=1024, tautomerFingerprints=True)
+        smi = Chem.MolToSmiles(mol)
+        data.append((smi, fp))
+```
+
+This example was adapted from Greg Landrum's RDKit blog post
+on [generalized substructure search](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/08/03/generalized-substructure-search.html).
+
+## SDF Usage
+
+### Get an RDKit substructure library
+
+Building on the `supplier()` function, the `get_substructure_library()`
+makes the preparation of a [substructure library](https://www.rdkit.org/docs/cppapi/classRDKit_1_1SubstructLibrary.html)
+automated and reproducible. Additionally, it caches the results of the build,
+which takes on the order of tens of minutes, only has to be done once and future
+loading from a pickle object takes on the order of seconds.
+
+The implementation was inspired by Greg Landrum's RDKit blog post,
+[Some new features in the SubstructLibrary](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/12/20/substructlibrary-search-order.html).
+The following example shows how it can be used to accomplish some of the first
+tasks presented in the post:
+
+```python
+from rdkit import Chem
+
+import chembl_downloader
+
+library = chembl_downloader.get_substructure_library()
+query = Chem.MolFromSmarts('[O,N]=C-c:1:c:c:n:c:c:1')
+matches = library.GetMatches(query)
+```
+
+## Morgan Fingerprints Usage
+
+### Get the Morgan Fingerprint file
+
+ChEMBL makes a file containing pre-computed 2048 bit radius 2 morgan
+fingerprints for each molecule available. It can be downloaded using:
+
+```python
+import chembl_downloader
+
+path = chembl_downloader.download_fps()
+```
+
+The `version` and other keyword arguments are also valid for this function.
+
+### Load fingerprints with [`chemfp`](https://chemfp.com/)
+
+The following wraps the `download_fps` function with `chemfp`'s fingerprint
+loader:
+
+```python
+import chembl_downloader
+
+arena = chembl_downloader.chemfp_load_fps()
+```
+
+The `version` and other keyword arguments are also valid for this function.
+More information on working with the `arena` object can be found
+[here](https://chemfp.readthedocs.io/en/latest/using-api.html#working-with-a-fingerprintarena).
+
+## Extras
+
+### Store in a Different Place
+
+If you want to store the data elsewhere using `pystow` (e.g., in [`pyobo`](https://github.com/pyobo/pyobo)
+I also keep a copy of this file), you can use the `prefix` argument.
+
+```python
+import chembl_downloader
+
+# It gets downloaded/extracted to 
+# ~/.data/pyobo/raw/chembl/29/chembl_29/chembl_29_sqlite/chembl_29.db
+path = chembl_downloader.download_extract_sqlite(prefix=['pyobo', 'raw', 'chembl'])
+```
+
+See the `pystow` [documentation](https://github.com/cthoyt/pystow#%EF%B8%8F-configuration) on configuring the storage
+location further.
+
+The `prefix` keyword argument is available for all functions in this package (e.g., including
+`connect()`, `cursor()`, and `query()`).
+
+### Download via CLI
+
+After installing, run the following CLI command to ensure it and send the path to stdout
+
+```bash
+$ chembl_downloader
+```
+
+Use `--test` to show two example queries
+
+```bash
+$ chembl_downloader --test
+```
+
+## Contributing
+
+Please read the contribution guidelines in [CONTRIBUTING.md](.github/CONTRIBUTING.md).
+
+If you'd like to contribute, there's a submodule called `chembl_downloader.queries`
+where you can add a useful SQL queries along with a description of what it does for easy
+importing and reuse.
+
+## Statistics and Compatibility
+
+`chembl-downloader` is compatible with all versions of ChEMBL. However, some files are
+not available for all versions. For example, the SQLite version of the database was first
+added in release 21 (2015-02-12).
+
+|   ChEMBL Version | Release Date   |
+|------------------|----------------|
+|               31 | 2022-07-12     |
+|               30 | 2022-02-22     |
+|               29 | 2021-07-01     |
+|               28 | 2021-01-15     |
+|               27 | 2020-05-18     |
+|               26 | 2020-02-14     |
+|               25 | 2019-02-01     |
+|             24_1 | 2018-05-01     |
+|               24 |                |
+|               23 | 2017-05-18     |
+|             22_1 | 2016-11-17     |
+|               22 |                |
+|               21 | 2015-02-12     |
+|               20 | 2015-02-03     |
+|               19 | 2014-07-2333   |
+|               18 | 2014-04-02     |
+|               17 | 2013-09-16     |
+|               16 | 2013-055555-15 |
+|               15 | 2013-01-30     |
+|               14 | 2012 -07-18    |
+|               13 | 2012-02-29     |
+|               12 | 2011-11-30     |
+|               11 | 2011-06-07     |
+|               10 | 2011-06-07     |
+|               09 | 2011-01-04     |
+|               08 | 2010-11-05     |
+|               07 | 2010-09-03     |
+|               06 | 2010-09-03     |
+|               05 | 2010-06-07     |
+|               04 | 2010-05-26     |
+|               03 | 2010-04-30     |
+|               02 | 2009-12-07     |
+|               01 | 2009-10-28     |
+
+
+%package -n python3-chembl-downloader
+Summary:	Reproducibly download, open, parse, and query ChEMBL
+Provides:	python-chembl-downloader
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-chembl-downloader
+<h1 align="center">
+    chembl_downloader
+</h1>
+
+<p align="center">
+    <a href="https://pypi.org/project/chembl_downloader">
+        <img alt="PyPI" src="https://img.shields.io/pypi/v/chembl_downloader" />
+    </a>
+    <a href="https://pypi.org/project/chembl_downloader">
+        <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/chembl_downloader" />
+    </a>
+    <a href="https://github.com/cthoyt/chembl_downloader/blob/main/LICENSE">
+        <img alt="PyPI - License" src="https://img.shields.io/pypi/l/chembl_downloader" />
+    </a>
+    <a href="https://zenodo.org/badge/latestdoi/390113187">
+        <img src="https://zenodo.org/badge/390113187.svg" alt="DOI" />
+    </a>
+    <a href="https://github.com/psf/black">
+        <img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black" />
+    </a>
+    <a href='https://chembl-downloader.readthedocs.io/en/latest/?badge=latest'>
+        <img src='https://readthedocs.org/projects/chembl-downloader/badge/?version=latest' alt='Documentation Status' />
+    </a>
+</p>
+
+Don't worry about downloading/extracting ChEMBL or versioning - just use ``chembl_downloader`` to write code that knows
+how to download it and use it automatically.
+
+Install with:
+
+```bash
+$ pip install chembl-downloader
+```
+
+Full technical documentation can be found on
+[ReadTheDocs](https://chembl-downloader.readthedocs.io). Tutorials can be found
+in Jupyter notebooks in the [notebooks/](notebooks/) directory of the
+repository.
+
+## Database Usage
+
+### Download A Specific Version
+
+```python
+import chembl_downloader
+
+path = chembl_downloader.download_extract_sqlite(version='28')
+```
+
+After it's been downloaded and extracted once, it's smart and does not need to download again. It gets stored
+using [`pystow`](https://github.com/cthoyt/pystow) automatically in the `~/.data/chembl`
+directory.
+
+We'd like to implement something such that it could load directly into SQLite from the archive, but it appears this is
+a [paid feature](https://sqlite.org/purchase/zipvfs).
+
+### Download the Latest Version
+
+You can modify the previous code slightly by omitting the `version` keyword
+argument to automatically find the latest version of ChEMBL:
+
+```python
+import chembl_downloader
+
+path = chembl_downloader.download_extract_sqlite()
+```
+
+The `version` keyword argument is available for all functions in this package (e.g., including
+`connect()`, `cursor()`, and `query()`), but will be omitted below for brevity.
+
+### Automate Connection
+
+Inside the archive is a single SQLite database file. Normally, people manually untar this folder then do something with
+the resulting file. Don't do this, it's not reproducible!
+Instead, the file can be downloaded and a connection can be opened automatically with:
+
+```python
+import chembl_downloader
+
+with chembl_downloader.connect() as conn:
+    with conn.cursor() as cursor:
+        cursor.execute(...)  # run your query string
+        rows = cursor.fetchall()  # get your results
+```
+
+The `cursor()` function provides a convenient wrapper around this operation:
+
+```python
+import chembl_downloader
+
+with chembl_downloader.cursor() as cursor:
+    cursor.execute(...)  # run your query string
+    rows = cursor.fetchall()  # get your results
+```
+
+### Run a query and get a pandas DataFrame
+
+The most powerful function is `query()` which builds on the previous `connect()` function in combination
+with [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html)
+to make a query and load the results into a pandas DataFrame for any downstream use.
+
+```python
+import chembl_downloader
+
+sql = """
+SELECT
+    MOLECULE_DICTIONARY.chembl_id,
+    MOLECULE_DICTIONARY.pref_name
+FROM MOLECULE_DICTIONARY
+JOIN COMPOUND_STRUCTURES ON MOLECULE_DICTIONARY.molregno == COMPOUND_STRUCTURES.molregno
+WHERE molecule_dictionary.pref_name IS NOT NULL
+LIMIT 5
+"""
+
+df = chembl_downloader.query(sql)
+df.to_csv(..., sep='\t', index=False)
+```
+
+Suggestion 1: use `pystow` to make a reproducible file path that's portable to other people's machines
+(e.g., it doesn't have your username in the path).
+
+Suggestion 2: RDKit is now pip-installable with `pip install rdkit-pypi`, which means most users don't have to muck
+around with complicated conda environments and configurations. One of the powerful but understated tools in RDKit is
+the [rdkit.Chem.PandasTools](https://rdkit.org/docs/source/rdkit.Chem.PandasTools.html)
+module.
+
+### Access an RDKit supplier over entries in the SDF dump
+
+This example is a bit more fit-for-purpose than the last two. The `supplier()` function makes sure that the latest SDF
+dump is downloaded and loads it from the gzip file into a `rdkit.Chem.ForwardSDMolSupplier`
+using a context manager to make sure the file doesn't get closed until after parsing is done. Like the previous
+examples, it can also explicitly take a `version`.
+
+```python
+from rdkit import Chem
+
+import chembl_downloader
+
+with chembl_downloader.supplier() as suppl:
+    data = []
+    for i, mol in enumerate(suppl):
+        if mol is None or mol.GetNumAtoms() > 50:
+            continue
+        fp = Chem.PatternFingerprint(mol, fpSize=1024, tautomerFingerprints=True)
+        smi = Chem.MolToSmiles(mol)
+        data.append((smi, fp))
+```
+
+This example was adapted from Greg Landrum's RDKit blog post
+on [generalized substructure search](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/08/03/generalized-substructure-search.html).
+
+## SDF Usage
+
+### Get an RDKit substructure library
+
+Building on the `supplier()` function, the `get_substructure_library()`
+makes the preparation of a [substructure library](https://www.rdkit.org/docs/cppapi/classRDKit_1_1SubstructLibrary.html)
+automated and reproducible. Additionally, it caches the results of the build,
+which takes on the order of tens of minutes, only has to be done once and future
+loading from a pickle object takes on the order of seconds.
+
+The implementation was inspired by Greg Landrum's RDKit blog post,
+[Some new features in the SubstructLibrary](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/12/20/substructlibrary-search-order.html).
+The following example shows how it can be used to accomplish some of the first
+tasks presented in the post:
+
+```python
+from rdkit import Chem
+
+import chembl_downloader
+
+library = chembl_downloader.get_substructure_library()
+query = Chem.MolFromSmarts('[O,N]=C-c:1:c:c:n:c:c:1')
+matches = library.GetMatches(query)
+```
+
+## Morgan Fingerprints Usage
+
+### Get the Morgan Fingerprint file
+
+ChEMBL makes a file containing pre-computed 2048 bit radius 2 morgan
+fingerprints for each molecule available. It can be downloaded using:
+
+```python
+import chembl_downloader
+
+path = chembl_downloader.download_fps()
+```
+
+The `version` and other keyword arguments are also valid for this function.
+
+### Load fingerprints with [`chemfp`](https://chemfp.com/)
+
+The following wraps the `download_fps` function with `chemfp`'s fingerprint
+loader:
+
+```python
+import chembl_downloader
+
+arena = chembl_downloader.chemfp_load_fps()
+```
+
+The `version` and other keyword arguments are also valid for this function.
+More information on working with the `arena` object can be found
+[here](https://chemfp.readthedocs.io/en/latest/using-api.html#working-with-a-fingerprintarena).
+
+## Extras
+
+### Store in a Different Place
+
+If you want to store the data elsewhere using `pystow` (e.g., in [`pyobo`](https://github.com/pyobo/pyobo)
+I also keep a copy of this file), you can use the `prefix` argument.
+
+```python
+import chembl_downloader
+
+# It gets downloaded/extracted to 
+# ~/.data/pyobo/raw/chembl/29/chembl_29/chembl_29_sqlite/chembl_29.db
+path = chembl_downloader.download_extract_sqlite(prefix=['pyobo', 'raw', 'chembl'])
+```
+
+See the `pystow` [documentation](https://github.com/cthoyt/pystow#%EF%B8%8F-configuration) on configuring the storage
+location further.
+
+The `prefix` keyword argument is available for all functions in this package (e.g., including
+`connect()`, `cursor()`, and `query()`).
+
+### Download via CLI
+
+After installing, run the following CLI command to ensure it and send the path to stdout
+
+```bash
+$ chembl_downloader
+```
+
+Use `--test` to show two example queries
+
+```bash
+$ chembl_downloader --test
+```
+
+## Contributing
+
+Please read the contribution guidelines in [CONTRIBUTING.md](.github/CONTRIBUTING.md).
+
+If you'd like to contribute, there's a submodule called `chembl_downloader.queries`
+where you can add a useful SQL queries along with a description of what it does for easy
+importing and reuse.
+
+## Statistics and Compatibility
+
+`chembl-downloader` is compatible with all versions of ChEMBL. However, some files are
+not available for all versions. For example, the SQLite version of the database was first
+added in release 21 (2015-02-12).
+
+|   ChEMBL Version | Release Date   |
+|------------------|----------------|
+|               31 | 2022-07-12     |
+|               30 | 2022-02-22     |
+|               29 | 2021-07-01     |
+|               28 | 2021-01-15     |
+|               27 | 2020-05-18     |
+|               26 | 2020-02-14     |
+|               25 | 2019-02-01     |
+|             24_1 | 2018-05-01     |
+|               24 |                |
+|               23 | 2017-05-18     |
+|             22_1 | 2016-11-17     |
+|               22 |                |
+|               21 | 2015-02-12     |
+|               20 | 2015-02-03     |
+|               19 | 2014-07-2333   |
+|               18 | 2014-04-02     |
+|               17 | 2013-09-16     |
+|               16 | 2013-055555-15 |
+|               15 | 2013-01-30     |
+|               14 | 2012 -07-18    |
+|               13 | 2012-02-29     |
+|               12 | 2011-11-30     |
+|               11 | 2011-06-07     |
+|               10 | 2011-06-07     |
+|               09 | 2011-01-04     |
+|               08 | 2010-11-05     |
+|               07 | 2010-09-03     |
+|               06 | 2010-09-03     |
+|               05 | 2010-06-07     |
+|               04 | 2010-05-26     |
+|               03 | 2010-04-30     |
+|               02 | 2009-12-07     |
+|               01 | 2009-10-28     |
+
+
+%package help
+Summary:	Development documents and examples for chembl-downloader
+Provides:	python3-chembl-downloader-doc
+%description help
+<h1 align="center">
+    chembl_downloader
+</h1>
+
+<p align="center">
+    <a href="https://pypi.org/project/chembl_downloader">
+        <img alt="PyPI" src="https://img.shields.io/pypi/v/chembl_downloader" />
+    </a>
+    <a href="https://pypi.org/project/chembl_downloader">
+        <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/chembl_downloader" />
+    </a>
+    <a href="https://github.com/cthoyt/chembl_downloader/blob/main/LICENSE">
+        <img alt="PyPI - License" src="https://img.shields.io/pypi/l/chembl_downloader" />
+    </a>
+    <a href="https://zenodo.org/badge/latestdoi/390113187">
+        <img src="https://zenodo.org/badge/390113187.svg" alt="DOI" />
+    </a>
+    <a href="https://github.com/psf/black">
+        <img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black" />
+    </a>
+    <a href='https://chembl-downloader.readthedocs.io/en/latest/?badge=latest'>
+        <img src='https://readthedocs.org/projects/chembl-downloader/badge/?version=latest' alt='Documentation Status' />
+    </a>
+</p>
+
+Don't worry about downloading/extracting ChEMBL or versioning - just use ``chembl_downloader`` to write code that knows
+how to download it and use it automatically.
+
+Install with:
+
+```bash
+$ pip install chembl-downloader
+```
+
+Full technical documentation can be found on
+[ReadTheDocs](https://chembl-downloader.readthedocs.io). Tutorials can be found
+in Jupyter notebooks in the [notebooks/](notebooks/) directory of the
+repository.
+
+## Database Usage
+
+### Download A Specific Version
+
+```python
+import chembl_downloader
+
+path = chembl_downloader.download_extract_sqlite(version='28')
+```
+
+After it's been downloaded and extracted once, it's smart and does not need to download again. It gets stored
+using [`pystow`](https://github.com/cthoyt/pystow) automatically in the `~/.data/chembl`
+directory.
+
+We'd like to implement something such that it could load directly into SQLite from the archive, but it appears this is
+a [paid feature](https://sqlite.org/purchase/zipvfs).
+
+### Download the Latest Version
+
+You can modify the previous code slightly by omitting the `version` keyword
+argument to automatically find the latest version of ChEMBL:
+
+```python
+import chembl_downloader
+
+path = chembl_downloader.download_extract_sqlite()
+```
+
+The `version` keyword argument is available for all functions in this package (e.g., including
+`connect()`, `cursor()`, and `query()`), but will be omitted below for brevity.
+
+### Automate Connection
+
+Inside the archive is a single SQLite database file. Normally, people manually untar this folder then do something with
+the resulting file. Don't do this, it's not reproducible!
+Instead, the file can be downloaded and a connection can be opened automatically with:
+
+```python
+import chembl_downloader
+
+with chembl_downloader.connect() as conn:
+    with conn.cursor() as cursor:
+        cursor.execute(...)  # run your query string
+        rows = cursor.fetchall()  # get your results
+```
+
+The `cursor()` function provides a convenient wrapper around this operation:
+
+```python
+import chembl_downloader
+
+with chembl_downloader.cursor() as cursor:
+    cursor.execute(...)  # run your query string
+    rows = cursor.fetchall()  # get your results
+```
+
+### Run a query and get a pandas DataFrame
+
+The most powerful function is `query()` which builds on the previous `connect()` function in combination
+with [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html)
+to make a query and load the results into a pandas DataFrame for any downstream use.
+
+```python
+import chembl_downloader
+
+sql = """
+SELECT
+    MOLECULE_DICTIONARY.chembl_id,
+    MOLECULE_DICTIONARY.pref_name
+FROM MOLECULE_DICTIONARY
+JOIN COMPOUND_STRUCTURES ON MOLECULE_DICTIONARY.molregno == COMPOUND_STRUCTURES.molregno
+WHERE molecule_dictionary.pref_name IS NOT NULL
+LIMIT 5
+"""
+
+df = chembl_downloader.query(sql)
+df.to_csv(..., sep='\t', index=False)
+```
+
+Suggestion 1: use `pystow` to make a reproducible file path that's portable to other people's machines
+(e.g., it doesn't have your username in the path).
+
+Suggestion 2: RDKit is now pip-installable with `pip install rdkit-pypi`, which means most users don't have to muck
+around with complicated conda environments and configurations. One of the powerful but understated tools in RDKit is
+the [rdkit.Chem.PandasTools](https://rdkit.org/docs/source/rdkit.Chem.PandasTools.html)
+module.
+
+### Access an RDKit supplier over entries in the SDF dump
+
+This example is a bit more fit-for-purpose than the last two. The `supplier()` function makes sure that the latest SDF
+dump is downloaded and loads it from the gzip file into a `rdkit.Chem.ForwardSDMolSupplier`
+using a context manager to make sure the file doesn't get closed until after parsing is done. Like the previous
+examples, it can also explicitly take a `version`.
+
+```python
+from rdkit import Chem
+
+import chembl_downloader
+
+with chembl_downloader.supplier() as suppl:
+    data = []
+    for i, mol in enumerate(suppl):
+        if mol is None or mol.GetNumAtoms() > 50:
+            continue
+        fp = Chem.PatternFingerprint(mol, fpSize=1024, tautomerFingerprints=True)
+        smi = Chem.MolToSmiles(mol)
+        data.append((smi, fp))
+```
+
+This example was adapted from Greg Landrum's RDKit blog post
+on [generalized substructure search](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/08/03/generalized-substructure-search.html).
+
+## SDF Usage
+
+### Get an RDKit substructure library
+
+Building on the `supplier()` function, the `get_substructure_library()`
+makes the preparation of a [substructure library](https://www.rdkit.org/docs/cppapi/classRDKit_1_1SubstructLibrary.html)
+automated and reproducible. Additionally, it caches the results of the build,
+which takes on the order of tens of minutes, only has to be done once and future
+loading from a pickle object takes on the order of seconds.
+
+The implementation was inspired by Greg Landrum's RDKit blog post,
+[Some new features in the SubstructLibrary](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/12/20/substructlibrary-search-order.html).
+The following example shows how it can be used to accomplish some of the first
+tasks presented in the post:
+
+```python
+from rdkit import Chem
+
+import chembl_downloader
+
+library = chembl_downloader.get_substructure_library()
+query = Chem.MolFromSmarts('[O,N]=C-c:1:c:c:n:c:c:1')
+matches = library.GetMatches(query)
+```
+
+## Morgan Fingerprints Usage
+
+### Get the Morgan Fingerprint file
+
+ChEMBL makes a file containing pre-computed 2048 bit radius 2 morgan
+fingerprints for each molecule available. It can be downloaded using:
+
+```python
+import chembl_downloader
+
+path = chembl_downloader.download_fps()
+```
+
+The `version` and other keyword arguments are also valid for this function.
+
+### Load fingerprints with [`chemfp`](https://chemfp.com/)
+
+The following wraps the `download_fps` function with `chemfp`'s fingerprint
+loader:
+
+```python
+import chembl_downloader
+
+arena = chembl_downloader.chemfp_load_fps()
+```
+
+The `version` and other keyword arguments are also valid for this function.
+More information on working with the `arena` object can be found
+[here](https://chemfp.readthedocs.io/en/latest/using-api.html#working-with-a-fingerprintarena).
+
+## Extras
+
+### Store in a Different Place
+
+If you want to store the data elsewhere using `pystow` (e.g., in [`pyobo`](https://github.com/pyobo/pyobo)
+I also keep a copy of this file), you can use the `prefix` argument.
+
+```python
+import chembl_downloader
+
+# It gets downloaded/extracted to 
+# ~/.data/pyobo/raw/chembl/29/chembl_29/chembl_29_sqlite/chembl_29.db
+path = chembl_downloader.download_extract_sqlite(prefix=['pyobo', 'raw', 'chembl'])
+```
+
+See the `pystow` [documentation](https://github.com/cthoyt/pystow#%EF%B8%8F-configuration) on configuring the storage
+location further.
+
+The `prefix` keyword argument is available for all functions in this package (e.g., including
+`connect()`, `cursor()`, and `query()`).
+
+### Download via CLI
+
+After installing, run the following CLI command to ensure it and send the path to stdout
+
+```bash
+$ chembl_downloader
+```
+
+Use `--test` to show two example queries
+
+```bash
+$ chembl_downloader --test
+```
+
+## Contributing
+
+Please read the contribution guidelines in [CONTRIBUTING.md](.github/CONTRIBUTING.md).
+
+If you'd like to contribute, there's a submodule called `chembl_downloader.queries`
+where you can add a useful SQL queries along with a description of what it does for easy
+importing and reuse.
+
+## Statistics and Compatibility
+
+`chembl-downloader` is compatible with all versions of ChEMBL. However, some files are
+not available for all versions. For example, the SQLite version of the database was first
+added in release 21 (2015-02-12).
+
+|   ChEMBL Version | Release Date   |
+|------------------|----------------|
+|               31 | 2022-07-12     |
+|               30 | 2022-02-22     |
+|               29 | 2021-07-01     |
+|               28 | 2021-01-15     |
+|               27 | 2020-05-18     |
+|               26 | 2020-02-14     |
+|               25 | 2019-02-01     |
+|             24_1 | 2018-05-01     |
+|               24 |                |
+|               23 | 2017-05-18     |
+|             22_1 | 2016-11-17     |
+|               22 |                |
+|               21 | 2015-02-12     |
+|               20 | 2015-02-03     |
+|               19 | 2014-07-2333   |
+|               18 | 2014-04-02     |
+|               17 | 2013-09-16     |
+|               16 | 2013-055555-15 |
+|               15 | 2013-01-30     |
+|               14 | 2012 -07-18    |
+|               13 | 2012-02-29     |
+|               12 | 2011-11-30     |
+|               11 | 2011-06-07     |
+|               10 | 2011-06-07     |
+|               09 | 2011-01-04     |
+|               08 | 2010-11-05     |
+|               07 | 2010-09-03     |
+|               06 | 2010-09-03     |
+|               05 | 2010-06-07     |
+|               04 | 2010-05-26     |
+|               03 | 2010-04-30     |
+|               02 | 2009-12-07     |
+|               01 | 2009-10-28     |
+
+
+%prep
+%autosetup -n chembl-downloader-0.4.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-chembl-downloader -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.4.2-1
+- Package Spec generated
author	CoprDistGit <infra@openeuler.org>	2023-05-05 03:25:27 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-05 03:25:27 +0000
commit	7b8f3da36b371a228e634b18bccea1b8e28d2aaf (patch)
tree	0eec743bc541d0c8ca37c25347afc19a40f864d0 /python-chembl-downloader.spec
parent	27594c60577daecd52c57ffe68e55de92491c189 (diff)