%global _empty_manifest_terminate_build 0 Name: python-DedupliPy Version: 0.7.10 Release: 1 Summary: End-to-end deduplication solution License: MIT License URL: https://github.com/fritshermans/deduplipy Source0: https://mirrors.nju.edu.cn/pypi/web/packages/20/b1/72cc8af1c02eba9a072ea7e7eeb5177610970909b767e928e58f66ffe50d/DedupliPy-0.7.10.tar.gz BuildArch: noarch Requires: python3-pandas Requires: python3-numpy Requires: python3-scipy Requires: python3-scikit-learn Requires: python3-networkx Requires: python3-Levenshtein Requires: python3-thefuzz Requires: python3-modAL Requires: python3-openpyxl Requires: python3-pytest Requires: python3-fancyimpute Requires: python3-pyminhash Requires: python3-pandas Requires: python3-numpy Requires: python3-scipy Requires: python3-scikit-learn Requires: python3-networkx Requires: python3-Levenshtein Requires: python3-thefuzz Requires: python3-modAL Requires: python3-openpyxl Requires: python3-pytest Requires: python3-fancyimpute Requires: python3-pyminhash Requires: python3-pandas Requires: python3-numpy Requires: python3-scipy Requires: python3-scikit-learn Requires: python3-networkx Requires: python3-Levenshtein Requires: python3-thefuzz Requires: python3-modAL Requires: python3-openpyxl Requires: python3-pytest Requires: python3-fancyimpute Requires: python3-pyminhash Requires: python3-matplotlib Requires: python3-jupyterlab Requires: python3-sphinx Requires: python3-nbsphinx Requires: python3-sphinx-rtd-theme Requires: python3-sphinx Requires: python3-nbsphinx Requires: python3-sphinx-rtd-theme %description [![Version](https://img.shields.io/pypi/v/deduplipy)](https://pypi.org/project/deduplipy/) ![](https://img.shields.io/github/license/fritshermans/deduplipy) [![Downloads](https://pepy.tech/badge/deduplipy)](https://pepy.tech/project/deduplipy) [![Conda - Platform](https://img.shields.io/conda/pn/conda-forge/deduplipy?logo=anaconda&style=flat)][#conda-forge-package] [![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/deduplipy?logo=anaconda&style=flat&color=orange)][#conda-forge-package] [![Conda Recipe](https://img.shields.io/static/v1?logo=conda-forge&style=flat&color=green&label=recipe&message=deduplipy)][#conda-forge-feedstock] [![Docs - GitHub.io](https://img.shields.io/static/v1?logo=readthdocs&style=flat&color=pink&label=docs&message=deduplipy)][#docs-package] [#pypi-package]: https://pypi.org/project/deduplipy/ [#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy [#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock [#docs-package]: https://deduplipy.readthedocs.io/en/latest/ # DedupliPy Deduplication is the task to combine different representations of the same real world entity. This package implements deduplication using active learning. Active learning allows for rapid training without having to provide a large, manually labelled dataset. DedupliPy is an end-to-end solution with advantages over existing solutions: - active learning; no large manually labelled dataset required - during active learning, the user gets notified when the model converged and training may be finished - works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics, interaction features) Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/) ## Documentation Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/) ## Installation ### Normal installation **With pip** Install directly from PyPI. ``` pip install deduplipy ``` **With conda** Install using conda from conda-forge channel. ``` conda install -c conda-forge deduplipy ``` ### Install to contribute Clone this Github repo and install in editable mode: ``` python -m pip install -e ".[dev]" python setup.py develop ``` ## Usage Apply deduplication your Pandas dataframe `df` as follows: ```python myDedupliPy = Deduplicator(col_names=['name', 'address']) myDedupliPy.fit(df) ``` This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n). During active learning you will get the message that training may be finished once algorithm training has converged. Predictions on (new) data are obtained as follows: ```python result = myDedupliPy.predict(df) ``` %package -n python3-DedupliPy Summary: End-to-end deduplication solution Provides: python-DedupliPy BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-DedupliPy [![Version](https://img.shields.io/pypi/v/deduplipy)](https://pypi.org/project/deduplipy/) ![](https://img.shields.io/github/license/fritshermans/deduplipy) [![Downloads](https://pepy.tech/badge/deduplipy)](https://pepy.tech/project/deduplipy) [![Conda - Platform](https://img.shields.io/conda/pn/conda-forge/deduplipy?logo=anaconda&style=flat)][#conda-forge-package] [![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/deduplipy?logo=anaconda&style=flat&color=orange)][#conda-forge-package] [![Conda Recipe](https://img.shields.io/static/v1?logo=conda-forge&style=flat&color=green&label=recipe&message=deduplipy)][#conda-forge-feedstock] [![Docs - GitHub.io](https://img.shields.io/static/v1?logo=readthdocs&style=flat&color=pink&label=docs&message=deduplipy)][#docs-package] [#pypi-package]: https://pypi.org/project/deduplipy/ [#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy [#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock [#docs-package]: https://deduplipy.readthedocs.io/en/latest/ # DedupliPy Deduplication is the task to combine different representations of the same real world entity. This package implements deduplication using active learning. Active learning allows for rapid training without having to provide a large, manually labelled dataset. DedupliPy is an end-to-end solution with advantages over existing solutions: - active learning; no large manually labelled dataset required - during active learning, the user gets notified when the model converged and training may be finished - works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics, interaction features) Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/) ## Documentation Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/) ## Installation ### Normal installation **With pip** Install directly from PyPI. ``` pip install deduplipy ``` **With conda** Install using conda from conda-forge channel. ``` conda install -c conda-forge deduplipy ``` ### Install to contribute Clone this Github repo and install in editable mode: ``` python -m pip install -e ".[dev]" python setup.py develop ``` ## Usage Apply deduplication your Pandas dataframe `df` as follows: ```python myDedupliPy = Deduplicator(col_names=['name', 'address']) myDedupliPy.fit(df) ``` This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n). During active learning you will get the message that training may be finished once algorithm training has converged. Predictions on (new) data are obtained as follows: ```python result = myDedupliPy.predict(df) ``` %package help Summary: Development documents and examples for DedupliPy Provides: python3-DedupliPy-doc %description help [![Version](https://img.shields.io/pypi/v/deduplipy)](https://pypi.org/project/deduplipy/) ![](https://img.shields.io/github/license/fritshermans/deduplipy) [![Downloads](https://pepy.tech/badge/deduplipy)](https://pepy.tech/project/deduplipy) [![Conda - Platform](https://img.shields.io/conda/pn/conda-forge/deduplipy?logo=anaconda&style=flat)][#conda-forge-package] [![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/deduplipy?logo=anaconda&style=flat&color=orange)][#conda-forge-package] [![Conda Recipe](https://img.shields.io/static/v1?logo=conda-forge&style=flat&color=green&label=recipe&message=deduplipy)][#conda-forge-feedstock] [![Docs - GitHub.io](https://img.shields.io/static/v1?logo=readthdocs&style=flat&color=pink&label=docs&message=deduplipy)][#docs-package] [#pypi-package]: https://pypi.org/project/deduplipy/ [#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy [#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock [#docs-package]: https://deduplipy.readthedocs.io/en/latest/ # DedupliPy Deduplication is the task to combine different representations of the same real world entity. This package implements deduplication using active learning. Active learning allows for rapid training without having to provide a large, manually labelled dataset. DedupliPy is an end-to-end solution with advantages over existing solutions: - active learning; no large manually labelled dataset required - during active learning, the user gets notified when the model converged and training may be finished - works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics, interaction features) Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/) ## Documentation Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/) ## Installation ### Normal installation **With pip** Install directly from PyPI. ``` pip install deduplipy ``` **With conda** Install using conda from conda-forge channel. ``` conda install -c conda-forge deduplipy ``` ### Install to contribute Clone this Github repo and install in editable mode: ``` python -m pip install -e ".[dev]" python setup.py develop ``` ## Usage Apply deduplication your Pandas dataframe `df` as follows: ```python myDedupliPy = Deduplicator(col_names=['name', 'address']) myDedupliPy.fit(df) ``` This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n). During active learning you will get the message that training may be finished once algorithm training has converged. Predictions on (new) data are obtained as follows: ```python result = myDedupliPy.predict(df) ``` %prep %autosetup -n DedupliPy-0.7.10 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-DedupliPy -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue May 30 2023 Python_Bot - 0.7.10-1 - Package Spec generated