summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-15 08:39:21 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-15 08:39:21 +0000
commitcce559bd58cdd092934daecad56b8cbdbf66031b (patch)
treebe8320a5c9bfe170e5376c0c6fde7e490e1833ee
parent6ce1de1a9d1f980c75150e94bf0be23d1ad919f5 (diff)
automatic import of python-deduplipy
-rw-r--r--.gitignore1
-rw-r--r--python-deduplipy.spec362
-rw-r--r--sources1
3 files changed, 364 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..8630371 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/DedupliPy-0.7.10.tar.gz
diff --git a/python-deduplipy.spec b/python-deduplipy.spec
new file mode 100644
index 0000000..9dfc097
--- /dev/null
+++ b/python-deduplipy.spec
@@ -0,0 +1,362 @@
+%global _empty_manifest_terminate_build 0
+Name: python-DedupliPy
+Version: 0.7.10
+Release: 1
+Summary: End-to-end deduplication solution
+License: MIT License
+URL: https://github.com/fritshermans/deduplipy
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/20/b1/72cc8af1c02eba9a072ea7e7eeb5177610970909b767e928e58f66ffe50d/DedupliPy-0.7.10.tar.gz
+BuildArch: noarch
+
+Requires: python3-pandas
+Requires: python3-numpy
+Requires: python3-scipy
+Requires: python3-scikit-learn
+Requires: python3-networkx
+Requires: python3-Levenshtein
+Requires: python3-thefuzz
+Requires: python3-modAL
+Requires: python3-openpyxl
+Requires: python3-pytest
+Requires: python3-fancyimpute
+Requires: python3-pyminhash
+Requires: python3-pandas
+Requires: python3-numpy
+Requires: python3-scipy
+Requires: python3-scikit-learn
+Requires: python3-networkx
+Requires: python3-Levenshtein
+Requires: python3-thefuzz
+Requires: python3-modAL
+Requires: python3-openpyxl
+Requires: python3-pytest
+Requires: python3-fancyimpute
+Requires: python3-pyminhash
+Requires: python3-pandas
+Requires: python3-numpy
+Requires: python3-scipy
+Requires: python3-scikit-learn
+Requires: python3-networkx
+Requires: python3-Levenshtein
+Requires: python3-thefuzz
+Requires: python3-modAL
+Requires: python3-openpyxl
+Requires: python3-pytest
+Requires: python3-fancyimpute
+Requires: python3-pyminhash
+Requires: python3-matplotlib
+Requires: python3-jupyterlab
+Requires: python3-sphinx
+Requires: python3-nbsphinx
+Requires: python3-sphinx-rtd-theme
+Requires: python3-sphinx
+Requires: python3-nbsphinx
+Requires: python3-sphinx-rtd-theme
+
+%description
+<!--- BADGES: START --->
+[![Version](https://img.shields.io/pypi/v/deduplipy)](https://pypi.org/project/deduplipy/)
+![](https://img.shields.io/github/license/fritshermans/deduplipy)
+[![Downloads](https://pepy.tech/badge/deduplipy)](https://pepy.tech/project/deduplipy)
+[![Conda - Platform](https://img.shields.io/conda/pn/conda-forge/deduplipy?logo=anaconda&style=flat)][#conda-forge-package]
+[![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/deduplipy?logo=anaconda&style=flat&color=orange)][#conda-forge-package]
+[![Conda Recipe](https://img.shields.io/static/v1?logo=conda-forge&style=flat&color=green&label=recipe&message=deduplipy)][#conda-forge-feedstock]
+[![Docs - GitHub.io](https://img.shields.io/static/v1?logo=readthdocs&style=flat&color=pink&label=docs&message=deduplipy)][#docs-package]
+
+[#pypi-package]: https://pypi.org/project/deduplipy/
+[#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy
+[#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock
+[#docs-package]: https://deduplipy.readthedocs.io/en/latest/
+<!--- BADGES: END --->
+
+# DedupliPy
+
+<a href="https://deduplipy.readthedocs.io/en/latest/"><img src="https://deduplipy.readthedocs.io/en/latest/_images/logo.png" width="15%" height="15%" align="left" /></a>
+
+Deduplication is the task to combine different representations of the same real world entity. This package implements
+deduplication using active learning. Active learning allows for rapid training without having to provide a large,
+manually labelled dataset.
+
+DedupliPy is an end-to-end solution with advantages over existing solutions:
+
+- active learning; no large manually labelled dataset required
+- during active learning, the user gets notified when the model converged and training may be finished
+- works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics,
+ interaction features)
+
+Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/)
+
+## Documentation
+
+Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/)
+
+## Installation
+
+### Normal installation
+
+**With pip**
+
+Install directly from PyPI.
+
+```
+pip install deduplipy
+```
+
+**With conda**
+
+Install using conda from conda-forge channel.
+
+```
+conda install -c conda-forge deduplipy
+```
+
+
+### Install to contribute
+
+Clone this Github repo and install in editable mode:
+
+```
+python -m pip install -e ".[dev]"
+python setup.py develop
+```
+
+## Usage
+
+Apply deduplication your Pandas dataframe `df` as follows:
+
+```python
+myDedupliPy = Deduplicator(col_names=['name', 'address'])
+myDedupliPy.fit(df)
+```
+
+This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n).
+During active learning you will get the message that training may be finished once algorithm training has converged.
+Predictions on (new) data are obtained as follows:
+
+```python
+result = myDedupliPy.predict(df)
+```
+
+
+%package -n python3-DedupliPy
+Summary: End-to-end deduplication solution
+Provides: python-DedupliPy
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-DedupliPy
+<!--- BADGES: START --->
+[![Version](https://img.shields.io/pypi/v/deduplipy)](https://pypi.org/project/deduplipy/)
+![](https://img.shields.io/github/license/fritshermans/deduplipy)
+[![Downloads](https://pepy.tech/badge/deduplipy)](https://pepy.tech/project/deduplipy)
+[![Conda - Platform](https://img.shields.io/conda/pn/conda-forge/deduplipy?logo=anaconda&style=flat)][#conda-forge-package]
+[![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/deduplipy?logo=anaconda&style=flat&color=orange)][#conda-forge-package]
+[![Conda Recipe](https://img.shields.io/static/v1?logo=conda-forge&style=flat&color=green&label=recipe&message=deduplipy)][#conda-forge-feedstock]
+[![Docs - GitHub.io](https://img.shields.io/static/v1?logo=readthdocs&style=flat&color=pink&label=docs&message=deduplipy)][#docs-package]
+
+[#pypi-package]: https://pypi.org/project/deduplipy/
+[#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy
+[#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock
+[#docs-package]: https://deduplipy.readthedocs.io/en/latest/
+<!--- BADGES: END --->
+
+# DedupliPy
+
+<a href="https://deduplipy.readthedocs.io/en/latest/"><img src="https://deduplipy.readthedocs.io/en/latest/_images/logo.png" width="15%" height="15%" align="left" /></a>
+
+Deduplication is the task to combine different representations of the same real world entity. This package implements
+deduplication using active learning. Active learning allows for rapid training without having to provide a large,
+manually labelled dataset.
+
+DedupliPy is an end-to-end solution with advantages over existing solutions:
+
+- active learning; no large manually labelled dataset required
+- during active learning, the user gets notified when the model converged and training may be finished
+- works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics,
+ interaction features)
+
+Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/)
+
+## Documentation
+
+Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/)
+
+## Installation
+
+### Normal installation
+
+**With pip**
+
+Install directly from PyPI.
+
+```
+pip install deduplipy
+```
+
+**With conda**
+
+Install using conda from conda-forge channel.
+
+```
+conda install -c conda-forge deduplipy
+```
+
+
+### Install to contribute
+
+Clone this Github repo and install in editable mode:
+
+```
+python -m pip install -e ".[dev]"
+python setup.py develop
+```
+
+## Usage
+
+Apply deduplication your Pandas dataframe `df` as follows:
+
+```python
+myDedupliPy = Deduplicator(col_names=['name', 'address'])
+myDedupliPy.fit(df)
+```
+
+This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n).
+During active learning you will get the message that training may be finished once algorithm training has converged.
+Predictions on (new) data are obtained as follows:
+
+```python
+result = myDedupliPy.predict(df)
+```
+
+
+%package help
+Summary: Development documents and examples for DedupliPy
+Provides: python3-DedupliPy-doc
+%description help
+<!--- BADGES: START --->
+[![Version](https://img.shields.io/pypi/v/deduplipy)](https://pypi.org/project/deduplipy/)
+![](https://img.shields.io/github/license/fritshermans/deduplipy)
+[![Downloads](https://pepy.tech/badge/deduplipy)](https://pepy.tech/project/deduplipy)
+[![Conda - Platform](https://img.shields.io/conda/pn/conda-forge/deduplipy?logo=anaconda&style=flat)][#conda-forge-package]
+[![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/deduplipy?logo=anaconda&style=flat&color=orange)][#conda-forge-package]
+[![Conda Recipe](https://img.shields.io/static/v1?logo=conda-forge&style=flat&color=green&label=recipe&message=deduplipy)][#conda-forge-feedstock]
+[![Docs - GitHub.io](https://img.shields.io/static/v1?logo=readthdocs&style=flat&color=pink&label=docs&message=deduplipy)][#docs-package]
+
+[#pypi-package]: https://pypi.org/project/deduplipy/
+[#conda-forge-package]: https://anaconda.org/conda-forge/deduplipy
+[#conda-forge-feedstock]: https://github.com/conda-forge/deduplipy-feedstock
+[#docs-package]: https://deduplipy.readthedocs.io/en/latest/
+<!--- BADGES: END --->
+
+# DedupliPy
+
+<a href="https://deduplipy.readthedocs.io/en/latest/"><img src="https://deduplipy.readthedocs.io/en/latest/_images/logo.png" width="15%" height="15%" align="left" /></a>
+
+Deduplication is the task to combine different representations of the same real world entity. This package implements
+deduplication using active learning. Active learning allows for rapid training without having to provide a large,
+manually labelled dataset.
+
+DedupliPy is an end-to-end solution with advantages over existing solutions:
+
+- active learning; no large manually labelled dataset required
+- during active learning, the user gets notified when the model converged and training may be finished
+- works out of the box, advanced users can choose settings as desired (custom blocking rules, custom metrics,
+ interaction features)
+
+Developed by [Frits Hermans](https://www.linkedin.com/in/frits-hermans-data-scientist/)
+
+## Documentation
+
+Documentation can be found [here](https://deduplipy.readthedocs.io/en/latest/)
+
+## Installation
+
+### Normal installation
+
+**With pip**
+
+Install directly from PyPI.
+
+```
+pip install deduplipy
+```
+
+**With conda**
+
+Install using conda from conda-forge channel.
+
+```
+conda install -c conda-forge deduplipy
+```
+
+
+### Install to contribute
+
+Clone this Github repo and install in editable mode:
+
+```
+python -m pip install -e ".[dev]"
+python setup.py develop
+```
+
+## Usage
+
+Apply deduplication your Pandas dataframe `df` as follows:
+
+```python
+myDedupliPy = Deduplicator(col_names=['name', 'address'])
+myDedupliPy.fit(df)
+```
+
+This will start the interactive learning session in which you provide input on whether a pair is a match (y) or not (n).
+During active learning you will get the message that training may be finished once algorithm training has converged.
+Predictions on (new) data are obtained as follows:
+
+```python
+result = myDedupliPy.predict(df)
+```
+
+
+%prep
+%autosetup -n DedupliPy-0.7.10
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-DedupliPy -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 15 2023 Python_Bot <Python_Bot@openeuler.org> - 0.7.10-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..03f3b31
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+3bd6c614bad0baee37ac3854b4c290e7 DedupliPy-0.7.10.tar.gz