diff options
author | CoprDistGit <infra@openeuler.org> | 2023-05-15 08:39:21 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-05-15 08:39:21 +0000 |
commit | cce559bd58cdd092934daecad56b8cbdbf66031b (patch) | |
tree | be8320a5c9bfe170e5376c0c6fde7e490e1833ee | |
parent | 6ce1de1a9d1f980c75150e94bf0be23d1ad919f5 (diff) |
automatic import of python-deduplipy
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-deduplipy.spec | 362 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 364 insertions, 0 deletions
@@ -0,0 +1 @@ +/DedupliPy-0.7.10.tar.gz diff --git a/python-deduplipy.spec b/python-deduplipy.spec new file mode 100644 index 0000000..9dfc097 --- /dev/null +++ b/python-deduplipy.spec @@ -0,0 +1,362 @@ +%global _empty_manifest_terminate_build 0 +Name: python-DedupliPy +Version: 0.7.10 +Release: 1 +Summary: End-to-end deduplication solution +License: MIT License +URL: https://github.com/fritshermans/deduplipy +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/20/b1/72cc8af1c02eba9a072ea7e7eeb5177610970909b767e928e58f66ffe50d/DedupliPy-0.7.10.tar.gz +BuildArch: noarch + +Requires: python3-pandas +Requires: python3-numpy +Requires: python3-scipy +Requires: python3-scikit-learn +Requires: python3-networkx +Requires: python3-Levenshtein +Requires: python3-thefuzz +Requires: python3-modAL +Requires: python3-openpyxl +Requires: python3-pytest +Requires: python3-fancyimpute +Requires: python3-pyminhash +Requires: python3-pandas +Requires: python3-numpy +Requires: python3-scipy +Requires: python3-scikit-learn +Requires: python3-networkx +Requires: python3-Levenshtein +Requires: python3-thefuzz +Requires: python3-modAL +Requires: python3-openpyxl +Requires: python3-pytest +Requires: python3-fancyimpute +Requires: python3-pyminhash +Requires: python3-pandas +Requires: python3-numpy +Requires: python3-scipy +Requires: python3-scikit-learn +Requires: python3-networkx +Requires: python3-Levenshtein +Requires: python3-thefuzz +Requires: python3-modAL +Requires: python3-openpyxl +Requires: python3-pytest +Requires: python3-fancyimpute +Requires: python3-pyminhash +Requires: python3-matplotlib +Requires: python3-jupyterlab +Requires: python3-sphinx +Requires: python3-nbsphinx +Requires: python3-sphinx-rtd-theme +Requires: python3-sphinx +Requires: python3-nbsphinx +Requires: python3-sphinx-rtd-theme + +%description +<!--- BADGES: START ---> +[](https://pypi.org/project/deduplipy/) + +[](https://pepy.tech/project/deduplipy) +[][#conda-forge-package] +[][#conda-forge-package] +[][#conda-forge-feedstock] +[][#docs-package] + +[#pypi-package]: https://pypi.org/project/deduplipy/ +[#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy +[#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock +[#docs-package]: https://deduplipy.readthedocs.io/en/latest/ +<!--- BADGES: END ---> + +# DedupliPy + +<a href="https://deduplipy.readthedocs.io/en/latest/"><img src="https://deduplipy.readthedocs.io/en/latest/_images/logo.png" width="15%" height="15%" align="left" /></a> + +Deduplication is the task to combine different representations of the same real world entity. This package implements +deduplication using active learning. Active learning allows for rapid training without having to provide a large, +manually labelled dataset. + +DedupliPy is an end-to-end solution with advantages over existing solutions: + +- active learning; no large manually labelled dataset required +- during active learning, the user gets notified when the model converged and training may be finished +- works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics, + interaction features) + +Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/) + +## Documentation + +Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/) + +## Installation + +### Normal installation + +**With pip** + +Install directly from PyPI. + +``` +pip install deduplipy +``` + +**With conda** + +Install using conda from conda-forge channel. + +``` +conda install -c conda-forge deduplipy +``` + + +### Install to contribute + +Clone this Github repo and install in editable mode: + +``` +python -m pip install -e ".[dev]" +python setup.py develop +``` + +## Usage + +Apply deduplication your Pandas dataframe `df` as follows: + +```python +myDedupliPy = Deduplicator(col_names=['name', 'address']) +myDedupliPy.fit(df) +``` + +This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n). +During active learning you will get the message that training may be finished once algorithm training has converged. +Predictions on (new) data are obtained as follows: + +```python +result = myDedupliPy.predict(df) +``` + + +%package -n python3-DedupliPy +Summary: End-to-end deduplication solution +Provides: python-DedupliPy +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-DedupliPy +<!--- BADGES: START ---> +[](https://pypi.org/project/deduplipy/) + +[](https://pepy.tech/project/deduplipy) +[][#conda-forge-package] +[][#conda-forge-package] +[][#conda-forge-feedstock] +[][#docs-package] + +[#pypi-package]: https://pypi.org/project/deduplipy/ +[#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy +[#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock +[#docs-package]: https://deduplipy.readthedocs.io/en/latest/ +<!--- BADGES: END ---> + +# DedupliPy + +<a href="https://deduplipy.readthedocs.io/en/latest/"><img src="https://deduplipy.readthedocs.io/en/latest/_images/logo.png" width="15%" height="15%" align="left" /></a> + +Deduplication is the task to combine different representations of the same real world entity. This package implements +deduplication using active learning. Active learning allows for rapid training without having to provide a large, +manually labelled dataset. + +DedupliPy is an end-to-end solution with advantages over existing solutions: + +- active learning; no large manually labelled dataset required +- during active learning, the user gets notified when the model converged and training may be finished +- works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics, + interaction features) + +Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/) + +## Documentation + +Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/) + +## Installation + +### Normal installation + +**With pip** + +Install directly from PyPI. + +``` +pip install deduplipy +``` + +**With conda** + +Install using conda from conda-forge channel. + +``` +conda install -c conda-forge deduplipy +``` + + +### Install to contribute + +Clone this Github repo and install in editable mode: + +``` +python -m pip install -e ".[dev]" +python setup.py develop +``` + +## Usage + +Apply deduplication your Pandas dataframe `df` as follows: + +```python +myDedupliPy = Deduplicator(col_names=['name', 'address']) +myDedupliPy.fit(df) +``` + +This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n). +During active learning you will get the message that training may be finished once algorithm training has converged. +Predictions on (new) data are obtained as follows: + +```python +result = myDedupliPy.predict(df) +``` + + +%package help +Summary: Development documents and examples for DedupliPy +Provides: python3-DedupliPy-doc +%description help +<!--- BADGES: START ---> +[](https://pypi.org/project/deduplipy/) + +[](https://pepy.tech/project/deduplipy) +[][#conda-forge-package] +[][#conda-forge-package] +[][#conda-forge-feedstock] +[][#docs-package] + +[#pypi-package]: https://pypi.org/project/deduplipy/ +[#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy +[#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock +[#docs-package]: https://deduplipy.readthedocs.io/en/latest/ +<!--- BADGES: END ---> + +# DedupliPy + +<a href="https://deduplipy.readthedocs.io/en/latest/"><img src="https://deduplipy.readthedocs.io/en/latest/_images/logo.png" width="15%" height="15%" align="left" /></a> + +Deduplication is the task to combine different representations of the same real world entity. This package implements +deduplication using active learning. Active learning allows for rapid training without having to provide a large, +manually labelled dataset. + +DedupliPy is an end-to-end solution with advantages over existing solutions: + +- active learning; no large manually labelled dataset required +- during active learning, the user gets notified when the model converged and training may be finished +- works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics, + interaction features) + +Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/) + +## Documentation + +Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/) + +## Installation + +### Normal installation + +**With pip** + +Install directly from PyPI. + +``` +pip install deduplipy +``` + +**With conda** + +Install using conda from conda-forge channel. + +``` +conda install -c conda-forge deduplipy +``` + + +### Install to contribute + +Clone this Github repo and install in editable mode: + +``` +python -m pip install -e ".[dev]" +python setup.py develop +``` + +## Usage + +Apply deduplication your Pandas dataframe `df` as follows: + +```python +myDedupliPy = Deduplicator(col_names=['name', 'address']) +myDedupliPy.fit(df) +``` + +This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n). +During active learning you will get the message that training may be finished once algorithm training has converged. +Predictions on (new) data are obtained as follows: + +```python +result = myDedupliPy.predict(df) +``` + + +%prep +%autosetup -n DedupliPy-0.7.10 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-DedupliPy -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Mon May 15 2023 Python_Bot <Python_Bot@openeuler.org> - 0.7.10-1 +- Package Spec generated @@ -0,0 +1 @@ +3bd6c614bad0baee37ac3854b4c290e7 DedupliPy-0.7.10.tar.gz |