summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-kpplus.spec289
-rw-r--r--sources1
3 files changed, 291 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..931b1f3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/kpplus-0.0.3.tar.gz
diff --git a/python-kpplus.spec b/python-kpplus.spec
new file mode 100644
index 0000000..571bcd2
--- /dev/null
+++ b/python-kpplus.spec
@@ -0,0 +1,289 @@
+%global _empty_manifest_terminate_build 0
+Name: python-kpplus
+Version: 0.0.3
+Release: 1
+Summary: A JIT optimized K-Prototype algorithm
+License: MIT License
+URL: https://github.com/youbao88/KPrototypes_plus
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/a2/5c/df60622dab8168d875947c28cee33c63e72f47c6559af6baccdabac5c97f/kpplus-0.0.3.tar.gz
+BuildArch: noarch
+
+Requires: python3-pandas
+Requires: python3-numpy
+Requires: python3-numba
+Requires: python3-joblib
+
+%description
+# KPrototype plus (kpplus)
+[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-informational.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity) [![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/) [![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://pypi.org/project/kpplus/)
+
+## Description
+
+K-prototype is a clustering method invented to support both categorical and numerical variables[1]
+
+**KPrototype plus (kpplus)** is a Python 3 package that is designed to increase the performance of [nivoc's KPrototypes function](https://github.com/nicodv/kmodes) by using [Numba](http://numba.pydata.org/).
+
+This code is part of [Stockholms diabetespreventiva program](https://www.folkhalsoguiden.se/amnesomraden1/analys-och-kartlaggning/sdpp/).
+
+### Performance improvement
+As an [example](example/example.ipynb), I used one of the [Heart Disease Data Sets](https://archive.ics.uci.edu/ml/datasets/Heart+Disease) from [UCI](https://archive.ics.uci.edu/ml/index.php) to test the performance.
+This data set contains 4455 rows, 7 categorical variables, and 5 numerical variables.
+We compare the performance between nicodv's kprototype function and k_prototype_plus.
+
+~~~~
+< nicodv's kprototype >
+CPU times: user 2.14 s, sys: 18.2 ms, total: 2.16 s
+Wall time: 1min 41s
+~~~~
+~~~~
+< k_prototype_plus >
+CPU times: user 298 ms, sys: 9.24 ms, total: 308 ms
+Wall time: 13.4 s
+~~~~
+
+**Notice:** Only Cao initiation is supported as the initiation method[2].
+
+## System requirement
+[![Generic badge](https://img.shields.io/badge/Python-3.7.1-informational.svg)](https://www.python.org/) [![Generic badge](https://img.shields.io/badge/Pandas-0.25.3-informational.svg)](https://pandas.pydata.org/) [![Generic badge](https://img.shields.io/badge/Numpy-1.17.0-informational.svg)](https://numpy.org/) [![Generic badge](https://img.shields.io/badge/Joblib-0.13.2-informational.svg)](https://joblib.readthedocs.io/en/latest/) [![Generic badge](https://img.shields.io/badge/Numba-0.45.1-informational.svg)](http://numba.pydata.org/)
+
+## Installiation
+
+```
+pip install kpplus
+```
+
+## Usage
+```python
+from kpplus import KPrototypes_plus
+model = KPrototypes_plus(n_clusters = 3, n_init = 4, gamma = None, n_jobs = -1) #initialize the model
+model.fit_predict(X=df, categorical = [0,1]) #fit the data and categorical into the mdoel
+
+model.labels_ #return the cluster_labels
+model.cluster_centroids_ #return the cluster centroid points(prototypes)
+model.n_iter_ #return the number of iterations
+model.cost_ #return the costs
+```
+**n_clusters:** the number of clusters
+
+**n_init:** the number of parallel oprations by using different initializations
+
+**gamma (optional):** A value that controls how algorithm favours categorical variables. (By default, it is the mean std of all numeric variables)
+
+**n_jobs (optional, default=-1):** The number of parallel processors. ('-1' means using all the processor)
+
+**X:** 2-D numpy array (dataset)
+
+**types:** A numpy array that indicates if the variable is categorical or numerical.
+
+For example: ```types = [1,1,0,0,0,0]``` means the first two variables are categorical and the last four variables are numerical.
+
+## Acknowledgement
+I'm extremely grateful to [Dr. Diego Yacaman Mendez](https://staff.ki.se/people/dieyac?_ga=2.70810192.1199119869.1588953123-1873461028.1579027503) and [Dr. David Ebbevi](https://www.linkedin.com/in/debbevi/?originalSubdomain=se) for their support. They are two brilliant researchers who started this project with excellent knowledge of medical science, epidemiology, statistics and programming.
+
+## Reference
+[1] Huang Z. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery. 1998;2(3):283-304.
+[2] Cao F, Liang J, Bai LJESwA. A new initialization method for categorical data clustering. 2009;36(7):10223-8.
+
+
+
+
+%package -n python3-kpplus
+Summary: A JIT optimized K-Prototype algorithm
+Provides: python-kpplus
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-kpplus
+# KPrototype plus (kpplus)
+[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-informational.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity) [![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/) [![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://pypi.org/project/kpplus/)
+
+## Description
+
+K-prototype is a clustering method invented to support both categorical and numerical variables[1]
+
+**KPrototype plus (kpplus)** is a Python 3 package that is designed to increase the performance of [nivoc's KPrototypes function](https://github.com/nicodv/kmodes) by using [Numba](http://numba.pydata.org/).
+
+This code is part of [Stockholms diabetespreventiva program](https://www.folkhalsoguiden.se/amnesomraden1/analys-och-kartlaggning/sdpp/).
+
+### Performance improvement
+As an [example](example/example.ipynb), I used one of the [Heart Disease Data Sets](https://archive.ics.uci.edu/ml/datasets/Heart+Disease) from [UCI](https://archive.ics.uci.edu/ml/index.php) to test the performance.
+This data set contains 4455 rows, 7 categorical variables, and 5 numerical variables.
+We compare the performance between nicodv's kprototype function and k_prototype_plus.
+
+~~~~
+< nicodv's kprototype >
+CPU times: user 2.14 s, sys: 18.2 ms, total: 2.16 s
+Wall time: 1min 41s
+~~~~
+~~~~
+< k_prototype_plus >
+CPU times: user 298 ms, sys: 9.24 ms, total: 308 ms
+Wall time: 13.4 s
+~~~~
+
+**Notice:** Only Cao initiation is supported as the initiation method[2].
+
+## System requirement
+[![Generic badge](https://img.shields.io/badge/Python-3.7.1-informational.svg)](https://www.python.org/) [![Generic badge](https://img.shields.io/badge/Pandas-0.25.3-informational.svg)](https://pandas.pydata.org/) [![Generic badge](https://img.shields.io/badge/Numpy-1.17.0-informational.svg)](https://numpy.org/) [![Generic badge](https://img.shields.io/badge/Joblib-0.13.2-informational.svg)](https://joblib.readthedocs.io/en/latest/) [![Generic badge](https://img.shields.io/badge/Numba-0.45.1-informational.svg)](http://numba.pydata.org/)
+
+## Installiation
+
+```
+pip install kpplus
+```
+
+## Usage
+```python
+from kpplus import KPrototypes_plus
+model = KPrototypes_plus(n_clusters = 3, n_init = 4, gamma = None, n_jobs = -1) #initialize the model
+model.fit_predict(X=df, categorical = [0,1]) #fit the data and categorical into the mdoel
+
+model.labels_ #return the cluster_labels
+model.cluster_centroids_ #return the cluster centroid points(prototypes)
+model.n_iter_ #return the number of iterations
+model.cost_ #return the costs
+```
+**n_clusters:** the number of clusters
+
+**n_init:** the number of parallel oprations by using different initializations
+
+**gamma (optional):** A value that controls how algorithm favours categorical variables. (By default, it is the mean std of all numeric variables)
+
+**n_jobs (optional, default=-1):** The number of parallel processors. ('-1' means using all the processor)
+
+**X:** 2-D numpy array (dataset)
+
+**types:** A numpy array that indicates if the variable is categorical or numerical.
+
+For example: ```types = [1,1,0,0,0,0]``` means the first two variables are categorical and the last four variables are numerical.
+
+## Acknowledgement
+I'm extremely grateful to [Dr. Diego Yacaman Mendez](https://staff.ki.se/people/dieyac?_ga=2.70810192.1199119869.1588953123-1873461028.1579027503) and [Dr. David Ebbevi](https://www.linkedin.com/in/debbevi/?originalSubdomain=se) for their support. They are two brilliant researchers who started this project with excellent knowledge of medical science, epidemiology, statistics and programming.
+
+## Reference
+[1] Huang Z. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery. 1998;2(3):283-304.
+[2] Cao F, Liang J, Bai LJESwA. A new initialization method for categorical data clustering. 2009;36(7):10223-8.
+
+
+
+
+%package help
+Summary: Development documents and examples for kpplus
+Provides: python3-kpplus-doc
+%description help
+# KPrototype plus (kpplus)
+[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-informational.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity) [![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/) [![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://pypi.org/project/kpplus/)
+
+## Description
+
+K-prototype is a clustering method invented to support both categorical and numerical variables[1]
+
+**KPrototype plus (kpplus)** is a Python 3 package that is designed to increase the performance of [nivoc's KPrototypes function](https://github.com/nicodv/kmodes) by using [Numba](http://numba.pydata.org/).
+
+This code is part of [Stockholms diabetespreventiva program](https://www.folkhalsoguiden.se/amnesomraden1/analys-och-kartlaggning/sdpp/).
+
+### Performance improvement
+As an [example](example/example.ipynb), I used one of the [Heart Disease Data Sets](https://archive.ics.uci.edu/ml/datasets/Heart+Disease) from [UCI](https://archive.ics.uci.edu/ml/index.php) to test the performance.
+This data set contains 4455 rows, 7 categorical variables, and 5 numerical variables.
+We compare the performance between nicodv's kprototype function and k_prototype_plus.
+
+~~~~
+< nicodv's kprototype >
+CPU times: user 2.14 s, sys: 18.2 ms, total: 2.16 s
+Wall time: 1min 41s
+~~~~
+~~~~
+< k_prototype_plus >
+CPU times: user 298 ms, sys: 9.24 ms, total: 308 ms
+Wall time: 13.4 s
+~~~~
+
+**Notice:** Only Cao initiation is supported as the initiation method[2].
+
+## System requirement
+[![Generic badge](https://img.shields.io/badge/Python-3.7.1-informational.svg)](https://www.python.org/) [![Generic badge](https://img.shields.io/badge/Pandas-0.25.3-informational.svg)](https://pandas.pydata.org/) [![Generic badge](https://img.shields.io/badge/Numpy-1.17.0-informational.svg)](https://numpy.org/) [![Generic badge](https://img.shields.io/badge/Joblib-0.13.2-informational.svg)](https://joblib.readthedocs.io/en/latest/) [![Generic badge](https://img.shields.io/badge/Numba-0.45.1-informational.svg)](http://numba.pydata.org/)
+
+## Installiation
+
+```
+pip install kpplus
+```
+
+## Usage
+```python
+from kpplus import KPrototypes_plus
+model = KPrototypes_plus(n_clusters = 3, n_init = 4, gamma = None, n_jobs = -1) #initialize the model
+model.fit_predict(X=df, categorical = [0,1]) #fit the data and categorical into the mdoel
+
+model.labels_ #return the cluster_labels
+model.cluster_centroids_ #return the cluster centroid points(prototypes)
+model.n_iter_ #return the number of iterations
+model.cost_ #return the costs
+```
+**n_clusters:** the number of clusters
+
+**n_init:** the number of parallel oprations by using different initializations
+
+**gamma (optional):** A value that controls how algorithm favours categorical variables. (By default, it is the mean std of all numeric variables)
+
+**n_jobs (optional, default=-1):** The number of parallel processors. ('-1' means using all the processor)
+
+**X:** 2-D numpy array (dataset)
+
+**types:** A numpy array that indicates if the variable is categorical or numerical.
+
+For example: ```types = [1,1,0,0,0,0]``` means the first two variables are categorical and the last four variables are numerical.
+
+## Acknowledgement
+I'm extremely grateful to [Dr. Diego Yacaman Mendez](https://staff.ki.se/people/dieyac?_ga=2.70810192.1199119869.1588953123-1873461028.1579027503) and [Dr. David Ebbevi](https://www.linkedin.com/in/debbevi/?originalSubdomain=se) for their support. They are two brilliant researchers who started this project with excellent knowledge of medical science, epidemiology, statistics and programming.
+
+## Reference
+[1] Huang Z. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery. 1998;2(3):283-304.
+[2] Cao F, Liang J, Bai LJESwA. A new initialization method for categorical data clustering. 2009;36(7):10223-8.
+
+
+
+
+%prep
+%autosetup -n kpplus-0.0.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-kpplus -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 17 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..aec7cdf
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+f7d5db2bd686ce7c6282a5f661eef8b5 kpplus-0.0.3.tar.gz