diff options
Diffstat (limited to 'python-scikit-spark.spec')
| -rw-r--r-- | python-scikit-spark.spec | 341 |
1 files changed, 341 insertions, 0 deletions
diff --git a/python-scikit-spark.spec b/python-scikit-spark.spec new file mode 100644 index 0000000..ed0b456 --- /dev/null +++ b/python-scikit-spark.spec @@ -0,0 +1,341 @@ +%global _empty_manifest_terminate_build 0 +Name: python-scikit-spark +Version: 0.4.0 +Release: 1 +Summary: Spark acceleration for Scikit-Learn cross validation techniques +License: Apache 2.0 +URL: https://github.com/scikit-spark/scikit-spark +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/79/a6/376f8dc174655538d50f8b19441bf82b6c7f327dbb8261a54e6affc5433a/scikit-spark-0.4.0.tar.gz +BuildArch: noarch + +Requires: python3-numpy +Requires: python3-six + +%description +# Spark acceleration for Scikit-Learn + +This project is a major re-write of the +[spark-sklearn](https://github.com/databricks/spark-sklearn) project, which +seems to no longer be under development. It focuses specifically on the +acceleration of Scikit-Learn's cross validation functionality using PySpark. + +### Improvements over spark-sklearn +`scikit-spark` supports `scikit-learn` versions past 0.19, `spark-sklearn` [have stated that they are probably not +going to support newer versions](https://github.com/databricks/spark-sklearn/issues/113). + +The functionality in `scikit-spark` is based on `sklearn.model_selection` module rather than the +deprecated and soon to be removed `sklearn.grid_search`. The new `model_selection` versions +contain several nicer features and `scikit-spark` maintains full compatibility. + +## Installation +The package can be installed through pip: +```bash +pip install scikit-spark +``` + +It has so far only been tested with Spark 2.2.0 and up, but may work with +older versions. + +### Supported scikit-learn versions +- 0.18 untested, likely doesn't work +- 0.19 supported +- 0.20 supported +- 0.21 supported (Python 3 only) +- 0.22 supported (Python 3 only) + +## Usage + +The functionality here is meant to as closely resemble using Scikit-Learn as +possible. By default (with `spark=True`) the `SparkSession` is obtained +internally by calling `SparkSession.builder.getOrCreate()`, so the instantiation +and calling of the functions is the same (You will preferably have already +created a `SparkSession`). + +This example is adapted from the Scikit-Learn documentation. It instantiates +a local `SparkSession`, and distributes the cross validation folds and +iterations using this. In actual use, to get the benefit of this package it +should be used distributed across several machines with Spark as running it +locally is slower than the `Scikit-Learn` parallelisation implementation. + +```python +from sklearn import svm, datasets +from pyspark.sql import SparkSession + +iris = datasets.load_iris() +parameters = {'kernel':('linear', 'rbf'), 'C':[0.01, 0.1, 1, 10, 100]} +svc = svm.SVC() + +spark = SparkSession.builder\ + .master("local[*]")\ + .appName("skspark-grid-search-doctests")\ + .getOrCreate() + +# How to run grid search +from skspark.model_selection import GridSearchCV + +gs = GridSearchCV(svc, parameters) +gs.fit(iris.data, iris.target) + +# How to run random search +from skspark.model_selection import RandomizedSearchCV + +rs = RandomizedSearchCV(spark, svc, parameters) +rs.fit(iris.data, iris.target) +``` + +## Current and upcoming functionality +- Current + - model_selection.RandomizedSearchCV + - model_selection.GridSearchCV +- Upcoming + - model_selection.cross_val_predict + - model_selection.cross_val_score + +*The docstrings are modifications of the Scikit-Learn ones and are still being +converted to specifically refer to this project.* + +## Performance optimisations + +### Reducing RAM usage +*Coming soon* + + + + + +%package -n python3-scikit-spark +Summary: Spark acceleration for Scikit-Learn cross validation techniques +Provides: python-scikit-spark +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-scikit-spark +# Spark acceleration for Scikit-Learn + +This project is a major re-write of the +[spark-sklearn](https://github.com/databricks/spark-sklearn) project, which +seems to no longer be under development. It focuses specifically on the +acceleration of Scikit-Learn's cross validation functionality using PySpark. + +### Improvements over spark-sklearn +`scikit-spark` supports `scikit-learn` versions past 0.19, `spark-sklearn` [have stated that they are probably not +going to support newer versions](https://github.com/databricks/spark-sklearn/issues/113). + +The functionality in `scikit-spark` is based on `sklearn.model_selection` module rather than the +deprecated and soon to be removed `sklearn.grid_search`. The new `model_selection` versions +contain several nicer features and `scikit-spark` maintains full compatibility. + +## Installation +The package can be installed through pip: +```bash +pip install scikit-spark +``` + +It has so far only been tested with Spark 2.2.0 and up, but may work with +older versions. + +### Supported scikit-learn versions +- 0.18 untested, likely doesn't work +- 0.19 supported +- 0.20 supported +- 0.21 supported (Python 3 only) +- 0.22 supported (Python 3 only) + +## Usage + +The functionality here is meant to as closely resemble using Scikit-Learn as +possible. By default (with `spark=True`) the `SparkSession` is obtained +internally by calling `SparkSession.builder.getOrCreate()`, so the instantiation +and calling of the functions is the same (You will preferably have already +created a `SparkSession`). + +This example is adapted from the Scikit-Learn documentation. It instantiates +a local `SparkSession`, and distributes the cross validation folds and +iterations using this. In actual use, to get the benefit of this package it +should be used distributed across several machines with Spark as running it +locally is slower than the `Scikit-Learn` parallelisation implementation. + +```python +from sklearn import svm, datasets +from pyspark.sql import SparkSession + +iris = datasets.load_iris() +parameters = {'kernel':('linear', 'rbf'), 'C':[0.01, 0.1, 1, 10, 100]} +svc = svm.SVC() + +spark = SparkSession.builder\ + .master("local[*]")\ + .appName("skspark-grid-search-doctests")\ + .getOrCreate() + +# How to run grid search +from skspark.model_selection import GridSearchCV + +gs = GridSearchCV(svc, parameters) +gs.fit(iris.data, iris.target) + +# How to run random search +from skspark.model_selection import RandomizedSearchCV + +rs = RandomizedSearchCV(spark, svc, parameters) +rs.fit(iris.data, iris.target) +``` + +## Current and upcoming functionality +- Current + - model_selection.RandomizedSearchCV + - model_selection.GridSearchCV +- Upcoming + - model_selection.cross_val_predict + - model_selection.cross_val_score + +*The docstrings are modifications of the Scikit-Learn ones and are still being +converted to specifically refer to this project.* + +## Performance optimisations + +### Reducing RAM usage +*Coming soon* + + + + + +%package help +Summary: Development documents and examples for scikit-spark +Provides: python3-scikit-spark-doc +%description help +# Spark acceleration for Scikit-Learn + +This project is a major re-write of the +[spark-sklearn](https://github.com/databricks/spark-sklearn) project, which +seems to no longer be under development. It focuses specifically on the +acceleration of Scikit-Learn's cross validation functionality using PySpark. + +### Improvements over spark-sklearn +`scikit-spark` supports `scikit-learn` versions past 0.19, `spark-sklearn` [have stated that they are probably not +going to support newer versions](https://github.com/databricks/spark-sklearn/issues/113). + +The functionality in `scikit-spark` is based on `sklearn.model_selection` module rather than the +deprecated and soon to be removed `sklearn.grid_search`. The new `model_selection` versions +contain several nicer features and `scikit-spark` maintains full compatibility. + +## Installation +The package can be installed through pip: +```bash +pip install scikit-spark +``` + +It has so far only been tested with Spark 2.2.0 and up, but may work with +older versions. + +### Supported scikit-learn versions +- 0.18 untested, likely doesn't work +- 0.19 supported +- 0.20 supported +- 0.21 supported (Python 3 only) +- 0.22 supported (Python 3 only) + +## Usage + +The functionality here is meant to as closely resemble using Scikit-Learn as +possible. By default (with `spark=True`) the `SparkSession` is obtained +internally by calling `SparkSession.builder.getOrCreate()`, so the instantiation +and calling of the functions is the same (You will preferably have already +created a `SparkSession`). + +This example is adapted from the Scikit-Learn documentation. It instantiates +a local `SparkSession`, and distributes the cross validation folds and +iterations using this. In actual use, to get the benefit of this package it +should be used distributed across several machines with Spark as running it +locally is slower than the `Scikit-Learn` parallelisation implementation. + +```python +from sklearn import svm, datasets +from pyspark.sql import SparkSession + +iris = datasets.load_iris() +parameters = {'kernel':('linear', 'rbf'), 'C':[0.01, 0.1, 1, 10, 100]} +svc = svm.SVC() + +spark = SparkSession.builder\ + .master("local[*]")\ + .appName("skspark-grid-search-doctests")\ + .getOrCreate() + +# How to run grid search +from skspark.model_selection import GridSearchCV + +gs = GridSearchCV(svc, parameters) +gs.fit(iris.data, iris.target) + +# How to run random search +from skspark.model_selection import RandomizedSearchCV + +rs = RandomizedSearchCV(spark, svc, parameters) +rs.fit(iris.data, iris.target) +``` + +## Current and upcoming functionality +- Current + - model_selection.RandomizedSearchCV + - model_selection.GridSearchCV +- Upcoming + - model_selection.cross_val_predict + - model_selection.cross_val_score + +*The docstrings are modifications of the Scikit-Learn ones and are still being +converted to specifically refer to this project.* + +## Performance optimisations + +### Reducing RAM usage +*Coming soon* + + + + + +%prep +%autosetup -n scikit-spark-0.4.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-scikit-spark -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.4.0-1 +- Package Spec generated |
