diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-05-05 03:25:27 +0000 |
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-05-05 03:25:27 +0000 |
| commit | 7b8f3da36b371a228e634b18bccea1b8e28d2aaf (patch) | |
| tree | 0eec743bc541d0c8ca37c25347afc19a40f864d0 /python-chembl-downloader.spec | |
| parent | 27594c60577daecd52c57ffe68e55de92491c189 (diff) | |
automatic import of python-chembl-downloaderopeneuler20.03
Diffstat (limited to 'python-chembl-downloader.spec')
| -rw-r--r-- | python-chembl-downloader.spec | 955 |
1 files changed, 955 insertions, 0 deletions
diff --git a/python-chembl-downloader.spec b/python-chembl-downloader.spec new file mode 100644 index 0000000..33a8ea9 --- /dev/null +++ b/python-chembl-downloader.spec @@ -0,0 +1,955 @@ +%global _empty_manifest_terminate_build 0 +Name: python-chembl-downloader +Version: 0.4.2 +Release: 1 +Summary: Reproducibly download, open, parse, and query ChEMBL +License: MIT +URL: https://github.com/cthoyt/chembl_downloader +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/c6/70/5f1b39298e2a1ee4f64f362967d125e254ff7b283a97a0352168c42cca54/chembl_downloader-0.4.2.tar.gz +BuildArch: noarch + +Requires: python3-click +Requires: python3-more-click +Requires: python3-pystow +Requires: python3-tqdm +Requires: python3-sphinx +Requires: python3-sphinx-rtd-theme +Requires: python3-sphinx-click +Requires: python3-sphinx-autodoc-typehints +Requires: python3-sphinx-automodapi +Requires: python3-pandas +Requires: python3-rdkit-pypi +Requires: python3-pytest +Requires: python3-coverage + +%description +<h1 align="center"> + chembl_downloader +</h1> + +<p align="center"> + <a href="https://pypi.org/project/chembl_downloader"> + <img alt="PyPI" src="https://img.shields.io/pypi/v/chembl_downloader" /> + </a> + <a href="https://pypi.org/project/chembl_downloader"> + <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/chembl_downloader" /> + </a> + <a href="https://github.com/cthoyt/chembl_downloader/blob/main/LICENSE"> + <img alt="PyPI - License" src="https://img.shields.io/pypi/l/chembl_downloader" /> + </a> + <a href="https://zenodo.org/badge/latestdoi/390113187"> + <img src="https://zenodo.org/badge/390113187.svg" alt="DOI" /> + </a> + <a href="https://github.com/psf/black"> + <img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black" /> + </a> + <a href='https://chembl-downloader.readthedocs.io/en/latest/?badge=latest'> + <img src='https://readthedocs.org/projects/chembl-downloader/badge/?version=latest' alt='Documentation Status' /> + </a> +</p> + +Don't worry about downloading/extracting ChEMBL or versioning - just use ``chembl_downloader`` to write code that knows +how to download it and use it automatically. + +Install with: + +```bash +$ pip install chembl-downloader +``` + +Full technical documentation can be found on +[ReadTheDocs](https://chembl-downloader.readthedocs.io). Tutorials can be found +in Jupyter notebooks in the [notebooks/](notebooks/) directory of the +repository. + +## Database Usage + +### Download A Specific Version + +```python +import chembl_downloader + +path = chembl_downloader.download_extract_sqlite(version='28') +``` + +After it's been downloaded and extracted once, it's smart and does not need to download again. It gets stored +using [`pystow`](https://github.com/cthoyt/pystow) automatically in the `~/.data/chembl` +directory. + +We'd like to implement something such that it could load directly into SQLite from the archive, but it appears this is +a [paid feature](https://sqlite.org/purchase/zipvfs). + +### Download the Latest Version + +You can modify the previous code slightly by omitting the `version` keyword +argument to automatically find the latest version of ChEMBL: + +```python +import chembl_downloader + +path = chembl_downloader.download_extract_sqlite() +``` + +The `version` keyword argument is available for all functions in this package (e.g., including +`connect()`, `cursor()`, and `query()`), but will be omitted below for brevity. + +### Automate Connection + +Inside the archive is a single SQLite database file. Normally, people manually untar this folder then do something with +the resulting file. Don't do this, it's not reproducible! +Instead, the file can be downloaded and a connection can be opened automatically with: + +```python +import chembl_downloader + +with chembl_downloader.connect() as conn: + with conn.cursor() as cursor: + cursor.execute(...) # run your query string + rows = cursor.fetchall() # get your results +``` + +The `cursor()` function provides a convenient wrapper around this operation: + +```python +import chembl_downloader + +with chembl_downloader.cursor() as cursor: + cursor.execute(...) # run your query string + rows = cursor.fetchall() # get your results +``` + +### Run a query and get a pandas DataFrame + +The most powerful function is `query()` which builds on the previous `connect()` function in combination +with [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) +to make a query and load the results into a pandas DataFrame for any downstream use. + +```python +import chembl_downloader + +sql = """ +SELECT + MOLECULE_DICTIONARY.chembl_id, + MOLECULE_DICTIONARY.pref_name +FROM MOLECULE_DICTIONARY +JOIN COMPOUND_STRUCTURES ON MOLECULE_DICTIONARY.molregno == COMPOUND_STRUCTURES.molregno +WHERE molecule_dictionary.pref_name IS NOT NULL +LIMIT 5 +""" + +df = chembl_downloader.query(sql) +df.to_csv(..., sep='\t', index=False) +``` + +Suggestion 1: use `pystow` to make a reproducible file path that's portable to other people's machines +(e.g., it doesn't have your username in the path). + +Suggestion 2: RDKit is now pip-installable with `pip install rdkit-pypi`, which means most users don't have to muck +around with complicated conda environments and configurations. One of the powerful but understated tools in RDKit is +the [rdkit.Chem.PandasTools](https://rdkit.org/docs/source/rdkit.Chem.PandasTools.html) +module. + +### Access an RDKit supplier over entries in the SDF dump + +This example is a bit more fit-for-purpose than the last two. The `supplier()` function makes sure that the latest SDF +dump is downloaded and loads it from the gzip file into a `rdkit.Chem.ForwardSDMolSupplier` +using a context manager to make sure the file doesn't get closed until after parsing is done. Like the previous +examples, it can also explicitly take a `version`. + +```python +from rdkit import Chem + +import chembl_downloader + +with chembl_downloader.supplier() as suppl: + data = [] + for i, mol in enumerate(suppl): + if mol is None or mol.GetNumAtoms() > 50: + continue + fp = Chem.PatternFingerprint(mol, fpSize=1024, tautomerFingerprints=True) + smi = Chem.MolToSmiles(mol) + data.append((smi, fp)) +``` + +This example was adapted from Greg Landrum's RDKit blog post +on [generalized substructure search](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/08/03/generalized-substructure-search.html). + +## SDF Usage + +### Get an RDKit substructure library + +Building on the `supplier()` function, the `get_substructure_library()` +makes the preparation of a [substructure library](https://www.rdkit.org/docs/cppapi/classRDKit_1_1SubstructLibrary.html) +automated and reproducible. Additionally, it caches the results of the build, +which takes on the order of tens of minutes, only has to be done once and future +loading from a pickle object takes on the order of seconds. + +The implementation was inspired by Greg Landrum's RDKit blog post, +[Some new features in the SubstructLibrary](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/12/20/substructlibrary-search-order.html). +The following example shows how it can be used to accomplish some of the first +tasks presented in the post: + +```python +from rdkit import Chem + +import chembl_downloader + +library = chembl_downloader.get_substructure_library() +query = Chem.MolFromSmarts('[O,N]=C-c:1:c:c:n:c:c:1') +matches = library.GetMatches(query) +``` + +## Morgan Fingerprints Usage + +### Get the Morgan Fingerprint file + +ChEMBL makes a file containing pre-computed 2048 bit radius 2 morgan +fingerprints for each molecule available. It can be downloaded using: + +```python +import chembl_downloader + +path = chembl_downloader.download_fps() +``` + +The `version` and other keyword arguments are also valid for this function. + +### Load fingerprints with [`chemfp`](https://chemfp.com/) + +The following wraps the `download_fps` function with `chemfp`'s fingerprint +loader: + +```python +import chembl_downloader + +arena = chembl_downloader.chemfp_load_fps() +``` + +The `version` and other keyword arguments are also valid for this function. +More information on working with the `arena` object can be found +[here](https://chemfp.readthedocs.io/en/latest/using-api.html#working-with-a-fingerprintarena). + +## Extras + +### Store in a Different Place + +If you want to store the data elsewhere using `pystow` (e.g., in [`pyobo`](https://github.com/pyobo/pyobo) +I also keep a copy of this file), you can use the `prefix` argument. + +```python +import chembl_downloader + +# It gets downloaded/extracted to +# ~/.data/pyobo/raw/chembl/29/chembl_29/chembl_29_sqlite/chembl_29.db +path = chembl_downloader.download_extract_sqlite(prefix=['pyobo', 'raw', 'chembl']) +``` + +See the `pystow` [documentation](https://github.com/cthoyt/pystow#%EF%B8%8F-configuration) on configuring the storage +location further. + +The `prefix` keyword argument is available for all functions in this package (e.g., including +`connect()`, `cursor()`, and `query()`). + +### Download via CLI + +After installing, run the following CLI command to ensure it and send the path to stdout + +```bash +$ chembl_downloader +``` + +Use `--test` to show two example queries + +```bash +$ chembl_downloader --test +``` + +## Contributing + +Please read the contribution guidelines in [CONTRIBUTING.md](.github/CONTRIBUTING.md). + +If you'd like to contribute, there's a submodule called `chembl_downloader.queries` +where you can add a useful SQL queries along with a description of what it does for easy +importing and reuse. + +## Statistics and Compatibility + +`chembl-downloader` is compatible with all versions of ChEMBL. However, some files are +not available for all versions. For example, the SQLite version of the database was first +added in release 21 (2015-02-12). + +| ChEMBL Version | Release Date | +|------------------|----------------| +| 31 | 2022-07-12 | +| 30 | 2022-02-22 | +| 29 | 2021-07-01 | +| 28 | 2021-01-15 | +| 27 | 2020-05-18 | +| 26 | 2020-02-14 | +| 25 | 2019-02-01 | +| 24_1 | 2018-05-01 | +| 24 | | +| 23 | 2017-05-18 | +| 22_1 | 2016-11-17 | +| 22 | | +| 21 | 2015-02-12 | +| 20 | 2015-02-03 | +| 19 | 2014-07-2333 | +| 18 | 2014-04-02 | +| 17 | 2013-09-16 | +| 16 | 2013-055555-15 | +| 15 | 2013-01-30 | +| 14 | 2012 -07-18 | +| 13 | 2012-02-29 | +| 12 | 2011-11-30 | +| 11 | 2011-06-07 | +| 10 | 2011-06-07 | +| 09 | 2011-01-04 | +| 08 | 2010-11-05 | +| 07 | 2010-09-03 | +| 06 | 2010-09-03 | +| 05 | 2010-06-07 | +| 04 | 2010-05-26 | +| 03 | 2010-04-30 | +| 02 | 2009-12-07 | +| 01 | 2009-10-28 | + + +%package -n python3-chembl-downloader +Summary: Reproducibly download, open, parse, and query ChEMBL +Provides: python-chembl-downloader +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-chembl-downloader +<h1 align="center"> + chembl_downloader +</h1> + +<p align="center"> + <a href="https://pypi.org/project/chembl_downloader"> + <img alt="PyPI" src="https://img.shields.io/pypi/v/chembl_downloader" /> + </a> + <a href="https://pypi.org/project/chembl_downloader"> + <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/chembl_downloader" /> + </a> + <a href="https://github.com/cthoyt/chembl_downloader/blob/main/LICENSE"> + <img alt="PyPI - License" src="https://img.shields.io/pypi/l/chembl_downloader" /> + </a> + <a href="https://zenodo.org/badge/latestdoi/390113187"> + <img src="https://zenodo.org/badge/390113187.svg" alt="DOI" /> + </a> + <a href="https://github.com/psf/black"> + <img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black" /> + </a> + <a href='https://chembl-downloader.readthedocs.io/en/latest/?badge=latest'> + <img src='https://readthedocs.org/projects/chembl-downloader/badge/?version=latest' alt='Documentation Status' /> + </a> +</p> + +Don't worry about downloading/extracting ChEMBL or versioning - just use ``chembl_downloader`` to write code that knows +how to download it and use it automatically. + +Install with: + +```bash +$ pip install chembl-downloader +``` + +Full technical documentation can be found on +[ReadTheDocs](https://chembl-downloader.readthedocs.io). Tutorials can be found +in Jupyter notebooks in the [notebooks/](notebooks/) directory of the +repository. + +## Database Usage + +### Download A Specific Version + +```python +import chembl_downloader + +path = chembl_downloader.download_extract_sqlite(version='28') +``` + +After it's been downloaded and extracted once, it's smart and does not need to download again. It gets stored +using [`pystow`](https://github.com/cthoyt/pystow) automatically in the `~/.data/chembl` +directory. + +We'd like to implement something such that it could load directly into SQLite from the archive, but it appears this is +a [paid feature](https://sqlite.org/purchase/zipvfs). + +### Download the Latest Version + +You can modify the previous code slightly by omitting the `version` keyword +argument to automatically find the latest version of ChEMBL: + +```python +import chembl_downloader + +path = chembl_downloader.download_extract_sqlite() +``` + +The `version` keyword argument is available for all functions in this package (e.g., including +`connect()`, `cursor()`, and `query()`), but will be omitted below for brevity. + +### Automate Connection + +Inside the archive is a single SQLite database file. Normally, people manually untar this folder then do something with +the resulting file. Don't do this, it's not reproducible! +Instead, the file can be downloaded and a connection can be opened automatically with: + +```python +import chembl_downloader + +with chembl_downloader.connect() as conn: + with conn.cursor() as cursor: + cursor.execute(...) # run your query string + rows = cursor.fetchall() # get your results +``` + +The `cursor()` function provides a convenient wrapper around this operation: + +```python +import chembl_downloader + +with chembl_downloader.cursor() as cursor: + cursor.execute(...) # run your query string + rows = cursor.fetchall() # get your results +``` + +### Run a query and get a pandas DataFrame + +The most powerful function is `query()` which builds on the previous `connect()` function in combination +with [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) +to make a query and load the results into a pandas DataFrame for any downstream use. + +```python +import chembl_downloader + +sql = """ +SELECT + MOLECULE_DICTIONARY.chembl_id, + MOLECULE_DICTIONARY.pref_name +FROM MOLECULE_DICTIONARY +JOIN COMPOUND_STRUCTURES ON MOLECULE_DICTIONARY.molregno == COMPOUND_STRUCTURES.molregno +WHERE molecule_dictionary.pref_name IS NOT NULL +LIMIT 5 +""" + +df = chembl_downloader.query(sql) +df.to_csv(..., sep='\t', index=False) +``` + +Suggestion 1: use `pystow` to make a reproducible file path that's portable to other people's machines +(e.g., it doesn't have your username in the path). + +Suggestion 2: RDKit is now pip-installable with `pip install rdkit-pypi`, which means most users don't have to muck +around with complicated conda environments and configurations. One of the powerful but understated tools in RDKit is +the [rdkit.Chem.PandasTools](https://rdkit.org/docs/source/rdkit.Chem.PandasTools.html) +module. + +### Access an RDKit supplier over entries in the SDF dump + +This example is a bit more fit-for-purpose than the last two. The `supplier()` function makes sure that the latest SDF +dump is downloaded and loads it from the gzip file into a `rdkit.Chem.ForwardSDMolSupplier` +using a context manager to make sure the file doesn't get closed until after parsing is done. Like the previous +examples, it can also explicitly take a `version`. + +```python +from rdkit import Chem + +import chembl_downloader + +with chembl_downloader.supplier() as suppl: + data = [] + for i, mol in enumerate(suppl): + if mol is None or mol.GetNumAtoms() > 50: + continue + fp = Chem.PatternFingerprint(mol, fpSize=1024, tautomerFingerprints=True) + smi = Chem.MolToSmiles(mol) + data.append((smi, fp)) +``` + +This example was adapted from Greg Landrum's RDKit blog post +on [generalized substructure search](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/08/03/generalized-substructure-search.html). + +## SDF Usage + +### Get an RDKit substructure library + +Building on the `supplier()` function, the `get_substructure_library()` +makes the preparation of a [substructure library](https://www.rdkit.org/docs/cppapi/classRDKit_1_1SubstructLibrary.html) +automated and reproducible. Additionally, it caches the results of the build, +which takes on the order of tens of minutes, only has to be done once and future +loading from a pickle object takes on the order of seconds. + +The implementation was inspired by Greg Landrum's RDKit blog post, +[Some new features in the SubstructLibrary](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/12/20/substructlibrary-search-order.html). +The following example shows how it can be used to accomplish some of the first +tasks presented in the post: + +```python +from rdkit import Chem + +import chembl_downloader + +library = chembl_downloader.get_substructure_library() +query = Chem.MolFromSmarts('[O,N]=C-c:1:c:c:n:c:c:1') +matches = library.GetMatches(query) +``` + +## Morgan Fingerprints Usage + +### Get the Morgan Fingerprint file + +ChEMBL makes a file containing pre-computed 2048 bit radius 2 morgan +fingerprints for each molecule available. It can be downloaded using: + +```python +import chembl_downloader + +path = chembl_downloader.download_fps() +``` + +The `version` and other keyword arguments are also valid for this function. + +### Load fingerprints with [`chemfp`](https://chemfp.com/) + +The following wraps the `download_fps` function with `chemfp`'s fingerprint +loader: + +```python +import chembl_downloader + +arena = chembl_downloader.chemfp_load_fps() +``` + +The `version` and other keyword arguments are also valid for this function. +More information on working with the `arena` object can be found +[here](https://chemfp.readthedocs.io/en/latest/using-api.html#working-with-a-fingerprintarena). + +## Extras + +### Store in a Different Place + +If you want to store the data elsewhere using `pystow` (e.g., in [`pyobo`](https://github.com/pyobo/pyobo) +I also keep a copy of this file), you can use the `prefix` argument. + +```python +import chembl_downloader + +# It gets downloaded/extracted to +# ~/.data/pyobo/raw/chembl/29/chembl_29/chembl_29_sqlite/chembl_29.db +path = chembl_downloader.download_extract_sqlite(prefix=['pyobo', 'raw', 'chembl']) +``` + +See the `pystow` [documentation](https://github.com/cthoyt/pystow#%EF%B8%8F-configuration) on configuring the storage +location further. + +The `prefix` keyword argument is available for all functions in this package (e.g., including +`connect()`, `cursor()`, and `query()`). + +### Download via CLI + +After installing, run the following CLI command to ensure it and send the path to stdout + +```bash +$ chembl_downloader +``` + +Use `--test` to show two example queries + +```bash +$ chembl_downloader --test +``` + +## Contributing + +Please read the contribution guidelines in [CONTRIBUTING.md](.github/CONTRIBUTING.md). + +If you'd like to contribute, there's a submodule called `chembl_downloader.queries` +where you can add a useful SQL queries along with a description of what it does for easy +importing and reuse. + +## Statistics and Compatibility + +`chembl-downloader` is compatible with all versions of ChEMBL. However, some files are +not available for all versions. For example, the SQLite version of the database was first +added in release 21 (2015-02-12). + +| ChEMBL Version | Release Date | +|------------------|----------------| +| 31 | 2022-07-12 | +| 30 | 2022-02-22 | +| 29 | 2021-07-01 | +| 28 | 2021-01-15 | +| 27 | 2020-05-18 | +| 26 | 2020-02-14 | +| 25 | 2019-02-01 | +| 24_1 | 2018-05-01 | +| 24 | | +| 23 | 2017-05-18 | +| 22_1 | 2016-11-17 | +| 22 | | +| 21 | 2015-02-12 | +| 20 | 2015-02-03 | +| 19 | 2014-07-2333 | +| 18 | 2014-04-02 | +| 17 | 2013-09-16 | +| 16 | 2013-055555-15 | +| 15 | 2013-01-30 | +| 14 | 2012 -07-18 | +| 13 | 2012-02-29 | +| 12 | 2011-11-30 | +| 11 | 2011-06-07 | +| 10 | 2011-06-07 | +| 09 | 2011-01-04 | +| 08 | 2010-11-05 | +| 07 | 2010-09-03 | +| 06 | 2010-09-03 | +| 05 | 2010-06-07 | +| 04 | 2010-05-26 | +| 03 | 2010-04-30 | +| 02 | 2009-12-07 | +| 01 | 2009-10-28 | + + +%package help +Summary: Development documents and examples for chembl-downloader +Provides: python3-chembl-downloader-doc +%description help +<h1 align="center"> + chembl_downloader +</h1> + +<p align="center"> + <a href="https://pypi.org/project/chembl_downloader"> + <img alt="PyPI" src="https://img.shields.io/pypi/v/chembl_downloader" /> + </a> + <a href="https://pypi.org/project/chembl_downloader"> + <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/chembl_downloader" /> + </a> + <a href="https://github.com/cthoyt/chembl_downloader/blob/main/LICENSE"> + <img alt="PyPI - License" src="https://img.shields.io/pypi/l/chembl_downloader" /> + </a> + <a href="https://zenodo.org/badge/latestdoi/390113187"> + <img src="https://zenodo.org/badge/390113187.svg" alt="DOI" /> + </a> + <a href="https://github.com/psf/black"> + <img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black" /> + </a> + <a href='https://chembl-downloader.readthedocs.io/en/latest/?badge=latest'> + <img src='https://readthedocs.org/projects/chembl-downloader/badge/?version=latest' alt='Documentation Status' /> + </a> +</p> + +Don't worry about downloading/extracting ChEMBL or versioning - just use ``chembl_downloader`` to write code that knows +how to download it and use it automatically. + +Install with: + +```bash +$ pip install chembl-downloader +``` + +Full technical documentation can be found on +[ReadTheDocs](https://chembl-downloader.readthedocs.io). Tutorials can be found +in Jupyter notebooks in the [notebooks/](notebooks/) directory of the +repository. + +## Database Usage + +### Download A Specific Version + +```python +import chembl_downloader + +path = chembl_downloader.download_extract_sqlite(version='28') +``` + +After it's been downloaded and extracted once, it's smart and does not need to download again. It gets stored +using [`pystow`](https://github.com/cthoyt/pystow) automatically in the `~/.data/chembl` +directory. + +We'd like to implement something such that it could load directly into SQLite from the archive, but it appears this is +a [paid feature](https://sqlite.org/purchase/zipvfs). + +### Download the Latest Version + +You can modify the previous code slightly by omitting the `version` keyword +argument to automatically find the latest version of ChEMBL: + +```python +import chembl_downloader + +path = chembl_downloader.download_extract_sqlite() +``` + +The `version` keyword argument is available for all functions in this package (e.g., including +`connect()`, `cursor()`, and `query()`), but will be omitted below for brevity. + +### Automate Connection + +Inside the archive is a single SQLite database file. Normally, people manually untar this folder then do something with +the resulting file. Don't do this, it's not reproducible! +Instead, the file can be downloaded and a connection can be opened automatically with: + +```python +import chembl_downloader + +with chembl_downloader.connect() as conn: + with conn.cursor() as cursor: + cursor.execute(...) # run your query string + rows = cursor.fetchall() # get your results +``` + +The `cursor()` function provides a convenient wrapper around this operation: + +```python +import chembl_downloader + +with chembl_downloader.cursor() as cursor: + cursor.execute(...) # run your query string + rows = cursor.fetchall() # get your results +``` + +### Run a query and get a pandas DataFrame + +The most powerful function is `query()` which builds on the previous `connect()` function in combination +with [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) +to make a query and load the results into a pandas DataFrame for any downstream use. + +```python +import chembl_downloader + +sql = """ +SELECT + MOLECULE_DICTIONARY.chembl_id, + MOLECULE_DICTIONARY.pref_name +FROM MOLECULE_DICTIONARY +JOIN COMPOUND_STRUCTURES ON MOLECULE_DICTIONARY.molregno == COMPOUND_STRUCTURES.molregno +WHERE molecule_dictionary.pref_name IS NOT NULL +LIMIT 5 +""" + +df = chembl_downloader.query(sql) +df.to_csv(..., sep='\t', index=False) +``` + +Suggestion 1: use `pystow` to make a reproducible file path that's portable to other people's machines +(e.g., it doesn't have your username in the path). + +Suggestion 2: RDKit is now pip-installable with `pip install rdkit-pypi`, which means most users don't have to muck +around with complicated conda environments and configurations. One of the powerful but understated tools in RDKit is +the [rdkit.Chem.PandasTools](https://rdkit.org/docs/source/rdkit.Chem.PandasTools.html) +module. + +### Access an RDKit supplier over entries in the SDF dump + +This example is a bit more fit-for-purpose than the last two. The `supplier()` function makes sure that the latest SDF +dump is downloaded and loads it from the gzip file into a `rdkit.Chem.ForwardSDMolSupplier` +using a context manager to make sure the file doesn't get closed until after parsing is done. Like the previous +examples, it can also explicitly take a `version`. + +```python +from rdkit import Chem + +import chembl_downloader + +with chembl_downloader.supplier() as suppl: + data = [] + for i, mol in enumerate(suppl): + if mol is None or mol.GetNumAtoms() > 50: + continue + fp = Chem.PatternFingerprint(mol, fpSize=1024, tautomerFingerprints=True) + smi = Chem.MolToSmiles(mol) + data.append((smi, fp)) +``` + +This example was adapted from Greg Landrum's RDKit blog post +on [generalized substructure search](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/08/03/generalized-substructure-search.html). + +## SDF Usage + +### Get an RDKit substructure library + +Building on the `supplier()` function, the `get_substructure_library()` +makes the preparation of a [substructure library](https://www.rdkit.org/docs/cppapi/classRDKit_1_1SubstructLibrary.html) +automated and reproducible. Additionally, it caches the results of the build, +which takes on the order of tens of minutes, only has to be done once and future +loading from a pickle object takes on the order of seconds. + +The implementation was inspired by Greg Landrum's RDKit blog post, +[Some new features in the SubstructLibrary](https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/12/20/substructlibrary-search-order.html). +The following example shows how it can be used to accomplish some of the first +tasks presented in the post: + +```python +from rdkit import Chem + +import chembl_downloader + +library = chembl_downloader.get_substructure_library() +query = Chem.MolFromSmarts('[O,N]=C-c:1:c:c:n:c:c:1') +matches = library.GetMatches(query) +``` + +## Morgan Fingerprints Usage + +### Get the Morgan Fingerprint file + +ChEMBL makes a file containing pre-computed 2048 bit radius 2 morgan +fingerprints for each molecule available. It can be downloaded using: + +```python +import chembl_downloader + +path = chembl_downloader.download_fps() +``` + +The `version` and other keyword arguments are also valid for this function. + +### Load fingerprints with [`chemfp`](https://chemfp.com/) + +The following wraps the `download_fps` function with `chemfp`'s fingerprint +loader: + +```python +import chembl_downloader + +arena = chembl_downloader.chemfp_load_fps() +``` + +The `version` and other keyword arguments are also valid for this function. +More information on working with the `arena` object can be found +[here](https://chemfp.readthedocs.io/en/latest/using-api.html#working-with-a-fingerprintarena). + +## Extras + +### Store in a Different Place + +If you want to store the data elsewhere using `pystow` (e.g., in [`pyobo`](https://github.com/pyobo/pyobo) +I also keep a copy of this file), you can use the `prefix` argument. + +```python +import chembl_downloader + +# It gets downloaded/extracted to +# ~/.data/pyobo/raw/chembl/29/chembl_29/chembl_29_sqlite/chembl_29.db +path = chembl_downloader.download_extract_sqlite(prefix=['pyobo', 'raw', 'chembl']) +``` + +See the `pystow` [documentation](https://github.com/cthoyt/pystow#%EF%B8%8F-configuration) on configuring the storage +location further. + +The `prefix` keyword argument is available for all functions in this package (e.g., including +`connect()`, `cursor()`, and `query()`). + +### Download via CLI + +After installing, run the following CLI command to ensure it and send the path to stdout + +```bash +$ chembl_downloader +``` + +Use `--test` to show two example queries + +```bash +$ chembl_downloader --test +``` + +## Contributing + +Please read the contribution guidelines in [CONTRIBUTING.md](.github/CONTRIBUTING.md). + +If you'd like to contribute, there's a submodule called `chembl_downloader.queries` +where you can add a useful SQL queries along with a description of what it does for easy +importing and reuse. + +## Statistics and Compatibility + +`chembl-downloader` is compatible with all versions of ChEMBL. However, some files are +not available for all versions. For example, the SQLite version of the database was first +added in release 21 (2015-02-12). + +| ChEMBL Version | Release Date | +|------------------|----------------| +| 31 | 2022-07-12 | +| 30 | 2022-02-22 | +| 29 | 2021-07-01 | +| 28 | 2021-01-15 | +| 27 | 2020-05-18 | +| 26 | 2020-02-14 | +| 25 | 2019-02-01 | +| 24_1 | 2018-05-01 | +| 24 | | +| 23 | 2017-05-18 | +| 22_1 | 2016-11-17 | +| 22 | | +| 21 | 2015-02-12 | +| 20 | 2015-02-03 | +| 19 | 2014-07-2333 | +| 18 | 2014-04-02 | +| 17 | 2013-09-16 | +| 16 | 2013-055555-15 | +| 15 | 2013-01-30 | +| 14 | 2012 -07-18 | +| 13 | 2012-02-29 | +| 12 | 2011-11-30 | +| 11 | 2011-06-07 | +| 10 | 2011-06-07 | +| 09 | 2011-01-04 | +| 08 | 2010-11-05 | +| 07 | 2010-09-03 | +| 06 | 2010-09-03 | +| 05 | 2010-06-07 | +| 04 | 2010-05-26 | +| 03 | 2010-04-30 | +| 02 | 2009-12-07 | +| 01 | 2009-10-28 | + + +%prep +%autosetup -n chembl-downloader-0.4.2 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-chembl-downloader -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.4.2-1 +- Package Spec generated |
