summaryrefslogtreecommitdiff
path: root/python-spark-sklearn.spec
blob: 65051027f7782d9c07b9d9ea8e199ee602520f1c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
%global _empty_manifest_terminate_build 0
Name:		python-spark-sklearn
Version:	0.3.0
Release:	1
Summary:	Integration tools for running scikit-learn on Spark
License:	Apache 2.0
URL:		https://github.com/databricks/spark-sklearn
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/b0/3f/34b8dec7d2cfcfe0ba99d637b4f2d306c1ca0b404107c07c829e085f6b38/spark-sklearn-0.3.0.tar.gz
BuildArch:	noarch


%description
This package contains some tools to integrate the `Spark computing framework <https://spark.apache.org/>`_
with the popular `scikit-learn machine library <https://scikit-learn.org/stable/>`_. Among other things, it can:
- train and evaluate multiple scikit-learn models in parallel. It is a distributed analog to the
  `multicore implementation <https://pythonhosted.org/joblib/parallel.html>`_ included by default in ``scikit-learn``
- convert Spark's Dataframes seamlessly into numpy ``ndarray`` or sparse matrices
- (experimental) distribute Scipy's sparse matrices as a dataset of sparse vectors
It focuses on problems that have a small amount of data and that can be run in parallel.
For small datasets, it distributes the search for estimator parameters (``GridSearchCV`` in scikit-learn),
using Spark. For datasets that do not fit in memory, we recommend using the `distributed implementation in
`Spark MLlib <https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html>`_.
This package distributes simple tasks like grid-search cross-validation.
It does not distribute individual learning algorithms (unlike Spark MLlib).

%package -n python3-spark-sklearn
Summary:	Integration tools for running scikit-learn on Spark
Provides:	python-spark-sklearn
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-spark-sklearn
This package contains some tools to integrate the `Spark computing framework <https://spark.apache.org/>`_
with the popular `scikit-learn machine library <https://scikit-learn.org/stable/>`_. Among other things, it can:
- train and evaluate multiple scikit-learn models in parallel. It is a distributed analog to the
  `multicore implementation <https://pythonhosted.org/joblib/parallel.html>`_ included by default in ``scikit-learn``
- convert Spark's Dataframes seamlessly into numpy ``ndarray`` or sparse matrices
- (experimental) distribute Scipy's sparse matrices as a dataset of sparse vectors
It focuses on problems that have a small amount of data and that can be run in parallel.
For small datasets, it distributes the search for estimator parameters (``GridSearchCV`` in scikit-learn),
using Spark. For datasets that do not fit in memory, we recommend using the `distributed implementation in
`Spark MLlib <https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html>`_.
This package distributes simple tasks like grid-search cross-validation.
It does not distribute individual learning algorithms (unlike Spark MLlib).

%package help
Summary:	Development documents and examples for spark-sklearn
Provides:	python3-spark-sklearn-doc
%description help
This package contains some tools to integrate the `Spark computing framework <https://spark.apache.org/>`_
with the popular `scikit-learn machine library <https://scikit-learn.org/stable/>`_. Among other things, it can:
- train and evaluate multiple scikit-learn models in parallel. It is a distributed analog to the
  `multicore implementation <https://pythonhosted.org/joblib/parallel.html>`_ included by default in ``scikit-learn``
- convert Spark's Dataframes seamlessly into numpy ``ndarray`` or sparse matrices
- (experimental) distribute Scipy's sparse matrices as a dataset of sparse vectors
It focuses on problems that have a small amount of data and that can be run in parallel.
For small datasets, it distributes the search for estimator parameters (``GridSearchCV`` in scikit-learn),
using Spark. For datasets that do not fit in memory, we recommend using the `distributed implementation in
`Spark MLlib <https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html>`_.
This package distributes simple tasks like grid-search cross-validation.
It does not distribute individual learning algorithms (unlike Spark MLlib).

%prep
%autosetup -n spark-sklearn-0.3.0

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-spark-sklearn -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Fri Apr 21 2023 Python_Bot <Python_Bot@openeuler.org> - 0.3.0-1
- Package Spec generated