automatic import of python-scikit-spark

author: CoprDistGit <infra@openeuler.org> 2023-04-11 05:22:24 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-04-11 05:22:24 +0000
commit: 4ef82f9e26cbd6a5889b9efc9c7884edcf383411 (patch)
tree: eeed89ab73c2457c13154d5ca21b92a5b3674173 /python-scikit-spark.spec
parent: e3fb228c44a6a77787c535e2a0599da40d533c86 (diff)
1 files changed, 341 insertions, 0 deletions
diff --git a/python-scikit-spark.spec b/python-scikit-spark.spec
new file mode 100644
index 0000000..ed0b456
--- /dev/null
+++ b/python-scikit-spark.spec
@@ -0,0 +1,341 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-scikit-spark
+Version:	0.4.0
+Release:	1
+Summary:	Spark acceleration for Scikit-Learn cross validation techniques
+License:	Apache 2.0
+URL:		https://github.com/scikit-spark/scikit-spark
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/79/a6/376f8dc174655538d50f8b19441bf82b6c7f327dbb8261a54e6affc5433a/scikit-spark-0.4.0.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-numpy
+Requires:	python3-six
+
+%description
+# Spark acceleration for Scikit-Learn
+
+This project is a major re-write of the 
+[spark-sklearn](https://github.com/databricks/spark-sklearn) project, which 
+seems to no longer be under development. It focuses specifically on the 
+acceleration of Scikit-Learn's cross validation functionality using PySpark.
+
+### Improvements over spark-sklearn
+`scikit-spark` supports `scikit-learn` versions past 0.19, `spark-sklearn` [have stated that they are probably not 
+going to support newer versions](https://github.com/databricks/spark-sklearn/issues/113).
+
+The functionality in `scikit-spark` is based on `sklearn.model_selection` module rather than the 
+deprecated and soon to be removed `sklearn.grid_search`. The new `model_selection` versions 
+contain several nicer features and `scikit-spark` maintains full compatibility.
+
+## Installation
+The package can be installed through pip:
+```bash
+pip install scikit-spark
+```
+
+It has so far only been tested with Spark 2.2.0 and up, but may work with 
+older versions. 
+
+### Supported scikit-learn versions
+- 0.18 untested, likely doesn't work
+- 0.19 supported
+- 0.20 supported
+- 0.21 supported (Python 3 only)
+- 0.22 supported (Python 3 only)
+
+## Usage
+
+The functionality here is meant to as closely resemble using Scikit-Learn as 
+possible. By default (with `spark=True`) the `SparkSession` is obtained
+internally by calling `SparkSession.builder.getOrCreate()`, so the instantiation
+and calling of the functions is the same (You will preferably have already 
+created a `SparkSession`). 
+
+This example is adapted from the Scikit-Learn documentation. It instantiates
+a local `SparkSession`, and distributes the cross validation folds and 
+iterations using this. In actual use, to get the benefit of this package it 
+should be used distributed across several machines with Spark as running it 
+locally is slower than the `Scikit-Learn` parallelisation implementation.
+
+```python
+from sklearn import svm, datasets
+from pyspark.sql import SparkSession
+
+iris = datasets.load_iris()
+parameters = {'kernel':('linear', 'rbf'), 'C':[0.01, 0.1, 1, 10, 100]}
+svc = svm.SVC()
+
+spark = SparkSession.builder\
+    .master("local[*]")\
+    .appName("skspark-grid-search-doctests")\
+    .getOrCreate()
+
+# How to run grid search
+from skspark.model_selection import GridSearchCV
+
+gs = GridSearchCV(svc, parameters)
+gs.fit(iris.data, iris.target)
+
+# How to run random search
+from skspark.model_selection import RandomizedSearchCV
+
+rs = RandomizedSearchCV(spark, svc, parameters)
+rs.fit(iris.data, iris.target)
+```
+
+## Current and upcoming functionality
+- Current
+    - model_selection.RandomizedSearchCV
+    - model_selection.GridSearchCV
+- Upcoming
+    - model_selection.cross_val_predict
+    - model_selection.cross_val_score
+
+*The docstrings are modifications of the Scikit-Learn ones and are still being
+converted to specifically refer to this project.* 
+
+## Performance optimisations 
+
+### Reducing RAM usage 
+*Coming soon*
+
+
+
+
+
+%package -n python3-scikit-spark
+Summary:	Spark acceleration for Scikit-Learn cross validation techniques
+Provides:	python-scikit-spark
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-scikit-spark
+# Spark acceleration for Scikit-Learn
+
+This project is a major re-write of the 
+[spark-sklearn](https://github.com/databricks/spark-sklearn) project, which 
+seems to no longer be under development. It focuses specifically on the 
+acceleration of Scikit-Learn's cross validation functionality using PySpark.
+
+### Improvements over spark-sklearn
+`scikit-spark` supports `scikit-learn` versions past 0.19, `spark-sklearn` [have stated that they are probably not 
+going to support newer versions](https://github.com/databricks/spark-sklearn/issues/113).
+
+The functionality in `scikit-spark` is based on `sklearn.model_selection` module rather than the 
+deprecated and soon to be removed `sklearn.grid_search`. The new `model_selection` versions 
+contain several nicer features and `scikit-spark` maintains full compatibility.
+
+## Installation
+The package can be installed through pip:
+```bash
+pip install scikit-spark
+```
+
+It has so far only been tested with Spark 2.2.0 and up, but may work with 
+older versions. 
+
+### Supported scikit-learn versions
+- 0.18 untested, likely doesn't work
+- 0.19 supported
+- 0.20 supported
+- 0.21 supported (Python 3 only)
+- 0.22 supported (Python 3 only)
+
+## Usage
+
+The functionality here is meant to as closely resemble using Scikit-Learn as 
+possible. By default (with `spark=True`) the `SparkSession` is obtained
+internally by calling `SparkSession.builder.getOrCreate()`, so the instantiation
+and calling of the functions is the same (You will preferably have already 
+created a `SparkSession`). 
+
+This example is adapted from the Scikit-Learn documentation. It instantiates
+a local `SparkSession`, and distributes the cross validation folds and 
+iterations using this. In actual use, to get the benefit of this package it 
+should be used distributed across several machines with Spark as running it 
+locally is slower than the `Scikit-Learn` parallelisation implementation.
+
+```python
+from sklearn import svm, datasets
+from pyspark.sql import SparkSession
+
+iris = datasets.load_iris()
+parameters = {'kernel':('linear', 'rbf'), 'C':[0.01, 0.1, 1, 10, 100]}
+svc = svm.SVC()
+
+spark = SparkSession.builder\
+    .master("local[*]")\
+    .appName("skspark-grid-search-doctests")\
+    .getOrCreate()
+
+# How to run grid search
+from skspark.model_selection import GridSearchCV
+
+gs = GridSearchCV(svc, parameters)
+gs.fit(iris.data, iris.target)
+
+# How to run random search
+from skspark.model_selection import RandomizedSearchCV
+
+rs = RandomizedSearchCV(spark, svc, parameters)
+rs.fit(iris.data, iris.target)
+```
+
+## Current and upcoming functionality
+- Current
+    - model_selection.RandomizedSearchCV
+    - model_selection.GridSearchCV
+- Upcoming
+    - model_selection.cross_val_predict
+    - model_selection.cross_val_score
+
+*The docstrings are modifications of the Scikit-Learn ones and are still being
+converted to specifically refer to this project.* 
+
+## Performance optimisations 
+
+### Reducing RAM usage 
+*Coming soon*
+
+
+
+
+
+%package help
+Summary:	Development documents and examples for scikit-spark
+Provides:	python3-scikit-spark-doc
+%description help
+# Spark acceleration for Scikit-Learn
+
+This project is a major re-write of the 
+[spark-sklearn](https://github.com/databricks/spark-sklearn) project, which 
+seems to no longer be under development. It focuses specifically on the 
+acceleration of Scikit-Learn's cross validation functionality using PySpark.
+
+### Improvements over spark-sklearn
+`scikit-spark` supports `scikit-learn` versions past 0.19, `spark-sklearn` [have stated that they are probably not 
+going to support newer versions](https://github.com/databricks/spark-sklearn/issues/113).
+
+The functionality in `scikit-spark` is based on `sklearn.model_selection` module rather than the 
+deprecated and soon to be removed `sklearn.grid_search`. The new `model_selection` versions 
+contain several nicer features and `scikit-spark` maintains full compatibility.
+
+## Installation
+The package can be installed through pip:
+```bash
+pip install scikit-spark
+```
+
+It has so far only been tested with Spark 2.2.0 and up, but may work with 
+older versions. 
+
+### Supported scikit-learn versions
+- 0.18 untested, likely doesn't work
+- 0.19 supported
+- 0.20 supported
+- 0.21 supported (Python 3 only)
+- 0.22 supported (Python 3 only)
+
+## Usage
+
+The functionality here is meant to as closely resemble using Scikit-Learn as 
+possible. By default (with `spark=True`) the `SparkSession` is obtained
+internally by calling `SparkSession.builder.getOrCreate()`, so the instantiation
+and calling of the functions is the same (You will preferably have already 
+created a `SparkSession`). 
+
+This example is adapted from the Scikit-Learn documentation. It instantiates
+a local `SparkSession`, and distributes the cross validation folds and 
+iterations using this. In actual use, to get the benefit of this package it 
+should be used distributed across several machines with Spark as running it 
+locally is slower than the `Scikit-Learn` parallelisation implementation.
+
+```python
+from sklearn import svm, datasets
+from pyspark.sql import SparkSession
+
+iris = datasets.load_iris()
+parameters = {'kernel':('linear', 'rbf'), 'C':[0.01, 0.1, 1, 10, 100]}
+svc = svm.SVC()
+
+spark = SparkSession.builder\
+    .master("local[*]")\
+    .appName("skspark-grid-search-doctests")\
+    .getOrCreate()
+
+# How to run grid search
+from skspark.model_selection import GridSearchCV
+
+gs = GridSearchCV(svc, parameters)
+gs.fit(iris.data, iris.target)
+
+# How to run random search
+from skspark.model_selection import RandomizedSearchCV
+
+rs = RandomizedSearchCV(spark, svc, parameters)
+rs.fit(iris.data, iris.target)
+```
+
+## Current and upcoming functionality
+- Current
+    - model_selection.RandomizedSearchCV
+    - model_selection.GridSearchCV
+- Upcoming
+    - model_selection.cross_val_predict
+    - model_selection.cross_val_score
+
+*The docstrings are modifications of the Scikit-Learn ones and are still being
+converted to specifically refer to this project.* 
+
+## Performance optimisations 
+
+### Reducing RAM usage 
+*Coming soon*
+
+
+
+
+
+%prep
+%autosetup -n scikit-spark-0.4.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-scikit-spark -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.4.0-1
+- Package Spec generated
author	CoprDistGit <infra@openeuler.org>	2023-04-11 05:22:24 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-04-11 05:22:24 +0000
commit	4ef82f9e26cbd6a5889b9efc9c7884edcf383411 (patch)
tree	eeed89ab73c2457c13154d5ca21b92a5b3674173 /python-scikit-spark.spec
parent	e3fb228c44a6a77787c535e2a0599da40d533c86 (diff)