summaryrefslogtreecommitdiff
path: root/python-scikit-spark.spec
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-04-11 05:22:24 +0000
committerCoprDistGit <infra@openeuler.org>2023-04-11 05:22:24 +0000
commit4ef82f9e26cbd6a5889b9efc9c7884edcf383411 (patch)
treeeeed89ab73c2457c13154d5ca21b92a5b3674173 /python-scikit-spark.spec
parente3fb228c44a6a77787c535e2a0599da40d533c86 (diff)
automatic import of python-scikit-spark
Diffstat (limited to 'python-scikit-spark.spec')
-rw-r--r--python-scikit-spark.spec341
1 files changed, 341 insertions, 0 deletions
diff --git a/python-scikit-spark.spec b/python-scikit-spark.spec
new file mode 100644
index 0000000..ed0b456
--- /dev/null
+++ b/python-scikit-spark.spec
@@ -0,0 +1,341 @@
+%global _empty_manifest_terminate_build 0
+Name: python-scikit-spark
+Version: 0.4.0
+Release: 1
+Summary: Spark acceleration for Scikit-Learn cross validation techniques
+License: Apache 2.0
+URL: https://github.com/scikit-spark/scikit-spark
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/79/a6/376f8dc174655538d50f8b19441bf82b6c7f327dbb8261a54e6affc5433a/scikit-spark-0.4.0.tar.gz
+BuildArch: noarch
+
+Requires: python3-numpy
+Requires: python3-six
+
+%description
+# Spark acceleration for Scikit-Learn
+
+This project is a major re-write of the
+[spark-sklearn](https://github.com/databricks/spark-sklearn) project, which
+seems to no longer be under development. It focuses specifically on the
+acceleration of Scikit-Learn's cross validation functionality using PySpark.
+
+### Improvements over spark-sklearn
+`scikit-spark` supports `scikit-learn` versions past 0.19, `spark-sklearn` [have stated that they are probably not
+going to support newer versions](https://github.com/databricks/spark-sklearn/issues/113).
+
+The functionality in `scikit-spark` is based on `sklearn.model_selection` module rather than the
+deprecated and soon to be removed `sklearn.grid_search`. The new `model_selection` versions
+contain several nicer features and `scikit-spark` maintains full compatibility.
+
+## Installation
+The package can be installed through pip:
+```bash
+pip install scikit-spark
+```
+
+It has so far only been tested with Spark 2.2.0 and up, but may work with
+older versions.
+
+### Supported scikit-learn versions
+- 0.18 untested, likely doesn't work
+- 0.19 supported
+- 0.20 supported
+- 0.21 supported (Python 3 only)
+- 0.22 supported (Python 3 only)
+
+## Usage
+
+The functionality here is meant to as closely resemble using Scikit-Learn as
+possible. By default (with `spark=True`) the `SparkSession` is obtained
+internally by calling `SparkSession.builder.getOrCreate()`, so the instantiation
+and calling of the functions is the same (You will preferably have already
+created a `SparkSession`).
+
+This example is adapted from the Scikit-Learn documentation. It instantiates
+a local `SparkSession`, and distributes the cross validation folds and
+iterations using this. In actual use, to get the benefit of this package it
+should be used distributed across several machines with Spark as running it
+locally is slower than the `Scikit-Learn` parallelisation implementation.
+
+```python
+from sklearn import svm, datasets
+from pyspark.sql import SparkSession
+
+iris = datasets.load_iris()
+parameters = {'kernel':('linear', 'rbf'), 'C':[0.01, 0.1, 1, 10, 100]}
+svc = svm.SVC()
+
+spark = SparkSession.builder\
+ .master("local[*]")\
+ .appName("skspark-grid-search-doctests")\
+ .getOrCreate()
+
+# How to run grid search
+from skspark.model_selection import GridSearchCV
+
+gs = GridSearchCV(svc, parameters)
+gs.fit(iris.data, iris.target)
+
+# How to run random search
+from skspark.model_selection import RandomizedSearchCV
+
+rs = RandomizedSearchCV(spark, svc, parameters)
+rs.fit(iris.data, iris.target)
+```
+
+## Current and upcoming functionality
+- Current
+ - model_selection.RandomizedSearchCV
+ - model_selection.GridSearchCV
+- Upcoming
+ - model_selection.cross_val_predict
+ - model_selection.cross_val_score
+
+*The docstrings are modifications of the Scikit-Learn ones and are still being
+converted to specifically refer to this project.*
+
+## Performance optimisations
+
+### Reducing RAM usage
+*Coming soon*
+
+
+
+
+
+%package -n python3-scikit-spark
+Summary: Spark acceleration for Scikit-Learn cross validation techniques
+Provides: python-scikit-spark
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-scikit-spark
+# Spark acceleration for Scikit-Learn
+
+This project is a major re-write of the
+[spark-sklearn](https://github.com/databricks/spark-sklearn) project, which
+seems to no longer be under development. It focuses specifically on the
+acceleration of Scikit-Learn's cross validation functionality using PySpark.
+
+### Improvements over spark-sklearn
+`scikit-spark` supports `scikit-learn` versions past 0.19, `spark-sklearn` [have stated that they are probably not
+going to support newer versions](https://github.com/databricks/spark-sklearn/issues/113).
+
+The functionality in `scikit-spark` is based on `sklearn.model_selection` module rather than the
+deprecated and soon to be removed `sklearn.grid_search`. The new `model_selection` versions
+contain several nicer features and `scikit-spark` maintains full compatibility.
+
+## Installation
+The package can be installed through pip:
+```bash
+pip install scikit-spark
+```
+
+It has so far only been tested with Spark 2.2.0 and up, but may work with
+older versions.
+
+### Supported scikit-learn versions
+- 0.18 untested, likely doesn't work
+- 0.19 supported
+- 0.20 supported
+- 0.21 supported (Python 3 only)
+- 0.22 supported (Python 3 only)
+
+## Usage
+
+The functionality here is meant to as closely resemble using Scikit-Learn as
+possible. By default (with `spark=True`) the `SparkSession` is obtained
+internally by calling `SparkSession.builder.getOrCreate()`, so the instantiation
+and calling of the functions is the same (You will preferably have already
+created a `SparkSession`).
+
+This example is adapted from the Scikit-Learn documentation. It instantiates
+a local `SparkSession`, and distributes the cross validation folds and
+iterations using this. In actual use, to get the benefit of this package it
+should be used distributed across several machines with Spark as running it
+locally is slower than the `Scikit-Learn` parallelisation implementation.
+
+```python
+from sklearn import svm, datasets
+from pyspark.sql import SparkSession
+
+iris = datasets.load_iris()
+parameters = {'kernel':('linear', 'rbf'), 'C':[0.01, 0.1, 1, 10, 100]}
+svc = svm.SVC()
+
+spark = SparkSession.builder\
+ .master("local[*]")\
+ .appName("skspark-grid-search-doctests")\
+ .getOrCreate()
+
+# How to run grid search
+from skspark.model_selection import GridSearchCV
+
+gs = GridSearchCV(svc, parameters)
+gs.fit(iris.data, iris.target)
+
+# How to run random search
+from skspark.model_selection import RandomizedSearchCV
+
+rs = RandomizedSearchCV(spark, svc, parameters)
+rs.fit(iris.data, iris.target)
+```
+
+## Current and upcoming functionality
+- Current
+ - model_selection.RandomizedSearchCV
+ - model_selection.GridSearchCV
+- Upcoming
+ - model_selection.cross_val_predict
+ - model_selection.cross_val_score
+
+*The docstrings are modifications of the Scikit-Learn ones and are still being
+converted to specifically refer to this project.*
+
+## Performance optimisations
+
+### Reducing RAM usage
+*Coming soon*
+
+
+
+
+
+%package help
+Summary: Development documents and examples for scikit-spark
+Provides: python3-scikit-spark-doc
+%description help
+# Spark acceleration for Scikit-Learn
+
+This project is a major re-write of the
+[spark-sklearn](https://github.com/databricks/spark-sklearn) project, which
+seems to no longer be under development. It focuses specifically on the
+acceleration of Scikit-Learn's cross validation functionality using PySpark.
+
+### Improvements over spark-sklearn
+`scikit-spark` supports `scikit-learn` versions past 0.19, `spark-sklearn` [have stated that they are probably not
+going to support newer versions](https://github.com/databricks/spark-sklearn/issues/113).
+
+The functionality in `scikit-spark` is based on `sklearn.model_selection` module rather than the
+deprecated and soon to be removed `sklearn.grid_search`. The new `model_selection` versions
+contain several nicer features and `scikit-spark` maintains full compatibility.
+
+## Installation
+The package can be installed through pip:
+```bash
+pip install scikit-spark
+```
+
+It has so far only been tested with Spark 2.2.0 and up, but may work with
+older versions.
+
+### Supported scikit-learn versions
+- 0.18 untested, likely doesn't work
+- 0.19 supported
+- 0.20 supported
+- 0.21 supported (Python 3 only)
+- 0.22 supported (Python 3 only)
+
+## Usage
+
+The functionality here is meant to as closely resemble using Scikit-Learn as
+possible. By default (with `spark=True`) the `SparkSession` is obtained
+internally by calling `SparkSession.builder.getOrCreate()`, so the instantiation
+and calling of the functions is the same (You will preferably have already
+created a `SparkSession`).
+
+This example is adapted from the Scikit-Learn documentation. It instantiates
+a local `SparkSession`, and distributes the cross validation folds and
+iterations using this. In actual use, to get the benefit of this package it
+should be used distributed across several machines with Spark as running it
+locally is slower than the `Scikit-Learn` parallelisation implementation.
+
+```python
+from sklearn import svm, datasets
+from pyspark.sql import SparkSession
+
+iris = datasets.load_iris()
+parameters = {'kernel':('linear', 'rbf'), 'C':[0.01, 0.1, 1, 10, 100]}
+svc = svm.SVC()
+
+spark = SparkSession.builder\
+ .master("local[*]")\
+ .appName("skspark-grid-search-doctests")\
+ .getOrCreate()
+
+# How to run grid search
+from skspark.model_selection import GridSearchCV
+
+gs = GridSearchCV(svc, parameters)
+gs.fit(iris.data, iris.target)
+
+# How to run random search
+from skspark.model_selection import RandomizedSearchCV
+
+rs = RandomizedSearchCV(spark, svc, parameters)
+rs.fit(iris.data, iris.target)
+```
+
+## Current and upcoming functionality
+- Current
+ - model_selection.RandomizedSearchCV
+ - model_selection.GridSearchCV
+- Upcoming
+ - model_selection.cross_val_predict
+ - model_selection.cross_val_score
+
+*The docstrings are modifications of the Scikit-Learn ones and are still being
+converted to specifically refer to this project.*
+
+## Performance optimisations
+
+### Reducing RAM usage
+*Coming soon*
+
+
+
+
+
+%prep
+%autosetup -n scikit-spark-0.4.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-scikit-spark -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.4.0-1
+- Package Spec generated