%global _empty_manifest_terminate_build 0
Name: python-DedupliPy
Version: 0.7.10
Release: 1
Summary: End-to-end deduplication solution
License: MIT License
URL: https://github.com/fritshermans/deduplipy
Source0: https://mirrors.nju.edu.cn/pypi/web/packages/20/b1/72cc8af1c02eba9a072ea7e7eeb5177610970909b767e928e58f66ffe50d/DedupliPy-0.7.10.tar.gz
BuildArch: noarch
Requires: python3-pandas
Requires: python3-numpy
Requires: python3-scipy
Requires: python3-scikit-learn
Requires: python3-networkx
Requires: python3-Levenshtein
Requires: python3-thefuzz
Requires: python3-modAL
Requires: python3-openpyxl
Requires: python3-pytest
Requires: python3-fancyimpute
Requires: python3-pyminhash
Requires: python3-pandas
Requires: python3-numpy
Requires: python3-scipy
Requires: python3-scikit-learn
Requires: python3-networkx
Requires: python3-Levenshtein
Requires: python3-thefuzz
Requires: python3-modAL
Requires: python3-openpyxl
Requires: python3-pytest
Requires: python3-fancyimpute
Requires: python3-pyminhash
Requires: python3-pandas
Requires: python3-numpy
Requires: python3-scipy
Requires: python3-scikit-learn
Requires: python3-networkx
Requires: python3-Levenshtein
Requires: python3-thefuzz
Requires: python3-modAL
Requires: python3-openpyxl
Requires: python3-pytest
Requires: python3-fancyimpute
Requires: python3-pyminhash
Requires: python3-matplotlib
Requires: python3-jupyterlab
Requires: python3-sphinx
Requires: python3-nbsphinx
Requires: python3-sphinx-rtd-theme
Requires: python3-sphinx
Requires: python3-nbsphinx
Requires: python3-sphinx-rtd-theme
%description
[![Version](https://img.shields.io/pypi/v/deduplipy)](https://pypi.org/project/deduplipy/)
![](https://img.shields.io/github/license/fritshermans/deduplipy)
[![Downloads](https://pepy.tech/badge/deduplipy)](https://pepy.tech/project/deduplipy)
[![Conda - Platform](https://img.shields.io/conda/pn/conda-forge/deduplipy?logo=anaconda&style=flat)][#conda-forge-package]
[![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/deduplipy?logo=anaconda&style=flat&color=orange)][#conda-forge-package]
[![Conda Recipe](https://img.shields.io/static/v1?logo=conda-forge&style=flat&color=green&label=recipe&message=deduplipy)][#conda-forge-feedstock]
[![Docs - GitHub.io](https://img.shields.io/static/v1?logo=readthdocs&style=flat&color=pink&label=docs&message=deduplipy)][#docs-package]
[#pypi-package]: https://pypi.org/project/deduplipy/
[#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy
[#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock
[#docs-package]: https://deduplipy.readthedocs.io/en/latest/
# DedupliPy
Deduplication is the task to combine different representations of the same real world entity. This package implements
deduplication using active learning. Active learning allows for rapid training without having to provide a large,
manually labelled dataset.
DedupliPy is an end-to-end solution with advantages over existing solutions:
- active learning; no large manually labelled dataset required
- during active learning, the user gets notified when the model converged and training may be finished
- works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics,
interaction features)
Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/)
## Documentation
Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/)
## Installation
### Normal installation
**With pip**
Install directly from PyPI.
```
pip install deduplipy
```
**With conda**
Install using conda from conda-forge channel.
```
conda install -c conda-forge deduplipy
```
### Install to contribute
Clone this Github repo and install in editable mode:
```
python -m pip install -e ".[dev]"
python setup.py develop
```
## Usage
Apply deduplication your Pandas dataframe `df` as follows:
```python
myDedupliPy = Deduplicator(col_names=['name', 'address'])
myDedupliPy.fit(df)
```
This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n).
During active learning you will get the message that training may be finished once algorithm training has converged.
Predictions on (new) data are obtained as follows:
```python
result = myDedupliPy.predict(df)
```
%package -n python3-DedupliPy
Summary: End-to-end deduplication solution
Provides: python-DedupliPy
BuildRequires: python3-devel
BuildRequires: python3-setuptools
BuildRequires: python3-pip
%description -n python3-DedupliPy
[![Version](https://img.shields.io/pypi/v/deduplipy)](https://pypi.org/project/deduplipy/)
![](https://img.shields.io/github/license/fritshermans/deduplipy)
[![Downloads](https://pepy.tech/badge/deduplipy)](https://pepy.tech/project/deduplipy)
[![Conda - Platform](https://img.shields.io/conda/pn/conda-forge/deduplipy?logo=anaconda&style=flat)][#conda-forge-package]
[![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/deduplipy?logo=anaconda&style=flat&color=orange)][#conda-forge-package]
[![Conda Recipe](https://img.shields.io/static/v1?logo=conda-forge&style=flat&color=green&label=recipe&message=deduplipy)][#conda-forge-feedstock]
[![Docs - GitHub.io](https://img.shields.io/static/v1?logo=readthdocs&style=flat&color=pink&label=docs&message=deduplipy)][#docs-package]
[#pypi-package]: https://pypi.org/project/deduplipy/
[#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy
[#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock
[#docs-package]: https://deduplipy.readthedocs.io/en/latest/
# DedupliPy
Deduplication is the task to combine different representations of the same real world entity. This package implements
deduplication using active learning. Active learning allows for rapid training without having to provide a large,
manually labelled dataset.
DedupliPy is an end-to-end solution with advantages over existing solutions:
- active learning; no large manually labelled dataset required
- during active learning, the user gets notified when the model converged and training may be finished
- works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics,
interaction features)
Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/)
## Documentation
Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/)
## Installation
### Normal installation
**With pip**
Install directly from PyPI.
```
pip install deduplipy
```
**With conda**
Install using conda from conda-forge channel.
```
conda install -c conda-forge deduplipy
```
### Install to contribute
Clone this Github repo and install in editable mode:
```
python -m pip install -e ".[dev]"
python setup.py develop
```
## Usage
Apply deduplication your Pandas dataframe `df` as follows:
```python
myDedupliPy = Deduplicator(col_names=['name', 'address'])
myDedupliPy.fit(df)
```
This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n).
During active learning you will get the message that training may be finished once algorithm training has converged.
Predictions on (new) data are obtained as follows:
```python
result = myDedupliPy.predict(df)
```
%package help
Summary: Development documents and examples for DedupliPy
Provides: python3-DedupliPy-doc
%description help
[![Version](https://img.shields.io/pypi/v/deduplipy)](https://pypi.org/project/deduplipy/)
![](https://img.shields.io/github/license/fritshermans/deduplipy)
[![Downloads](https://pepy.tech/badge/deduplipy)](https://pepy.tech/project/deduplipy)
[![Conda - Platform](https://img.shields.io/conda/pn/conda-forge/deduplipy?logo=anaconda&style=flat)][#conda-forge-package]
[![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/deduplipy?logo=anaconda&style=flat&color=orange)][#conda-forge-package]
[![Conda Recipe](https://img.shields.io/static/v1?logo=conda-forge&style=flat&color=green&label=recipe&message=deduplipy)][#conda-forge-feedstock]
[![Docs - GitHub.io](https://img.shields.io/static/v1?logo=readthdocs&style=flat&color=pink&label=docs&message=deduplipy)][#docs-package]
[#pypi-package]: https://pypi.org/project/deduplipy/
[#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy
[#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock
[#docs-package]: https://deduplipy.readthedocs.io/en/latest/
# DedupliPy
Deduplication is the task to combine different representations of the same real world entity. This package implements
deduplication using active learning. Active learning allows for rapid training without having to provide a large,
manually labelled dataset.
DedupliPy is an end-to-end solution with advantages over existing solutions:
- active learning; no large manually labelled dataset required
- during active learning, the user gets notified when the model converged and training may be finished
- works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics,
interaction features)
Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/)
## Documentation
Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/)
## Installation
### Normal installation
**With pip**
Install directly from PyPI.
```
pip install deduplipy
```
**With conda**
Install using conda from conda-forge channel.
```
conda install -c conda-forge deduplipy
```
### Install to contribute
Clone this Github repo and install in editable mode:
```
python -m pip install -e ".[dev]"
python setup.py develop
```
## Usage
Apply deduplication your Pandas dataframe `df` as follows:
```python
myDedupliPy = Deduplicator(col_names=['name', 'address'])
myDedupliPy.fit(df)
```
This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n).
During active learning you will get the message that training may be finished once algorithm training has converged.
Predictions on (new) data are obtained as follows:
```python
result = myDedupliPy.predict(df)
```
%prep
%autosetup -n DedupliPy-0.7.10
%build
%py3_build
%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .
%files -n python3-DedupliPy -f filelist.lst
%dir %{python3_sitelib}/*
%files help -f doclist.lst
%{_docdir}/*
%changelog
* Tue May 30 2023 Python_Bot - 0.7.10-1
- Package Spec generated