%global _empty_manifest_terminate_build 0
Name:		python-diego
Version:	0.2.7
Release:	1
Summary:	Diego: Data IntElliGence Out.
License:	MIT
URL:		https://github.com/lai-bluejay/diego
Source0:	https://mirrors.aliyun.com/pypi/web/packages/61/b7/9f7bff11aeec39d4af912194c650107660e4a5ddbc3dee7d7fc65a799207/diego-0.2.7.tar.gz
BuildArch:	noarch

Requires:	python3-numpy
Requires:	python3-scipy
Requires:	python3-scikit-learn
Requires:	python3-deap
Requires:	python3-update-checker
Requires:	python3-tqdm
Requires:	python3-stopit
Requires:	python3-pandas
Requires:	python3-xgboost
Requires:	python3-pyrfr
Requires:	python3-distributed
Requires:	python3-dask
Requires:	python3-smac
Requires:	python3-ConfigSpace
Requires:	python3-auto-sklearn
Requires:	python3-liac-arff
Requires:	python3-sklearn-contrib-lightning

%description


# Diego

Diego: Data in,  IntElliGence Out.

[简体中文](README_zh_CN.md)

A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (`Study`) and generate correlated trials (`Trial`). Then run the code and get a machine learning model. Implemented using Scikit-learn API [glossary](https://scikit-learn.org/stable/glossary.html), using Bayesian optimization and genetic algorithms for automated machine learning.

Inspired by [Fast.ai](https://github.com/fastai/fastai) and [MicroSoft nni](https://github.com/Microsoft/nni).

[![Build Status](https://travis-ci.org/lai-bluejay/diego.svg?branch=master)](https://travis-ci.org/lai-bluejay/diego)
![PyPI](https://img.shields.io/pypi/v/diego.svg?style=flat)
![GitHub](https://img.shields.io/github/license/lai-bluejay/diego.svg)
![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/lai-bluejay/diego.svg)

- [x] the classifier trained by a Study.
- [x] AutoML classifier with support for scikit-learn api. Support for exporting models and use them directly.
- [x] Hyperparametric optimization using Bayesian optimization and genetic algorithms
- [x] Supports bucketing/binning algorithm and LUS sampling method for preprocessing
- [ ] Supports scikit-learn api classifier custom classifier for parameter search and super parameter optimization


## Installation

You need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation

```shell
conda install --yes pip gcc swig libgcc=5.2.0
pip install diego
```

After installation, start with 6 lines of code to solve a machine learning classification problem.

## Usage

Each task is considered to be a `Study`, and each Study consists of multiple `Trial`.
It is recommended to create a Study first and then generate a Trial from the Study:

```python
from diego.study import create_study
import sklearn.datasets
digits = sklearn.datasets.load_digits()
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25)

s = create_study(X_train, y_train)
# can use default trials in Study

# or generate one
# s.generate_trials(mode='fast')
s.optimize(X_test, y_test)
# all_trials = s.get_all_trials()
# for t in all_trials:
#     print(t.__dict__)
#     print(t.clf.score(X_test, y_test))

```

## RoadMap
ideas for releases in the future
- [ ] 回归。
- [ ] add documents.
- [ ] 不同类型的Trial。TPE， BayesOpt， RandomSearch
- [ ] 自定义的Trial。Trials by custom Classifier (like sklearn, xgboost)
- [ ] 模型保存。model persistence
- [ ] 模型输出。model output
- [ ] basic Classifier
- [ ] fix mac os hanged in optimize pipeline
- [ ] add preprocessor
- [ ] add FeatureTools for automated feature engineering


## 

## Project Structure

### study, trials
Study: 

Trial:

### 如果在OS X或者Linux多进程被 hang/crash/freeze

Since n_jobs>1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux)

In Python 3.4+, one solution is to directly configure `multiprocessing` to use `forkserver` or `spawn` to start process pool management (instead of the default `fork`). For example, the `forkserver` mode is enabled globally directly in the code.

```python
import multiprocessing
# other imports, custom code, load data, define model...
if __name__ == '__main__':
    multiprocessing.set_start_method('forkserver')

    # call scikit-learn utils with n_jobs > 1 here
```

more info :[multiprocessing document](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)

### core

#### storage

For each study, the data storage and parameters, and the model is additionally stored in the `Storage` object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results.

#### update result

When creating `Study`, you need to specify the direction of optimization `maximize` or `minimize`. Also specify the metrics for optimization when creating `Trials`. The default is `maximize accuracy`.

## auto ml 补完计划

[overview](https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a)

### bayes opt

1. [fmfn/bayes](https://github.com/fmfn/BayesianOptimization)
2. [auto-sklearn](https://github.com/automl/auto-sklearn)

### grid search

1. H2O.ai

### tree parzen

1. hyperopt
2. mlbox

### metaheuristics grid search

1. pybrain

### generation

1.tpot

### dl

1. ms nni

## issues

## updates

### TODO 文档更新。


%package -n python3-diego
Summary:	Diego: Data IntElliGence Out.
Provides:	python-diego
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-diego


# Diego

Diego: Data in,  IntElliGence Out.

[简体中文](README_zh_CN.md)

A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (`Study`) and generate correlated trials (`Trial`). Then run the code and get a machine learning model. Implemented using Scikit-learn API [glossary](https://scikit-learn.org/stable/glossary.html), using Bayesian optimization and genetic algorithms for automated machine learning.

Inspired by [Fast.ai](https://github.com/fastai/fastai) and [MicroSoft nni](https://github.com/Microsoft/nni).

[![Build Status](https://travis-ci.org/lai-bluejay/diego.svg?branch=master)](https://travis-ci.org/lai-bluejay/diego)
![PyPI](https://img.shields.io/pypi/v/diego.svg?style=flat)
![GitHub](https://img.shields.io/github/license/lai-bluejay/diego.svg)
![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/lai-bluejay/diego.svg)

- [x] the classifier trained by a Study.
- [x] AutoML classifier with support for scikit-learn api. Support for exporting models and use them directly.
- [x] Hyperparametric optimization using Bayesian optimization and genetic algorithms
- [x] Supports bucketing/binning algorithm and LUS sampling method for preprocessing
- [ ] Supports scikit-learn api classifier custom classifier for parameter search and super parameter optimization


## Installation

You need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation

```shell
conda install --yes pip gcc swig libgcc=5.2.0
pip install diego
```

After installation, start with 6 lines of code to solve a machine learning classification problem.

## Usage

Each task is considered to be a `Study`, and each Study consists of multiple `Trial`.
It is recommended to create a Study first and then generate a Trial from the Study:

```python
from diego.study import create_study
import sklearn.datasets
digits = sklearn.datasets.load_digits()
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25)

s = create_study(X_train, y_train)
# can use default trials in Study

# or generate one
# s.generate_trials(mode='fast')
s.optimize(X_test, y_test)
# all_trials = s.get_all_trials()
# for t in all_trials:
#     print(t.__dict__)
#     print(t.clf.score(X_test, y_test))

```

## RoadMap
ideas for releases in the future
- [ ] 回归。
- [ ] add documents.
- [ ] 不同类型的Trial。TPE， BayesOpt， RandomSearch
- [ ] 自定义的Trial。Trials by custom Classifier (like sklearn, xgboost)
- [ ] 模型保存。model persistence
- [ ] 模型输出。model output
- [ ] basic Classifier
- [ ] fix mac os hanged in optimize pipeline
- [ ] add preprocessor
- [ ] add FeatureTools for automated feature engineering


## 

## Project Structure

### study, trials
Study: 

Trial:

### 如果在OS X或者Linux多进程被 hang/crash/freeze

Since n_jobs>1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux)

In Python 3.4+, one solution is to directly configure `multiprocessing` to use `forkserver` or `spawn` to start process pool management (instead of the default `fork`). For example, the `forkserver` mode is enabled globally directly in the code.

```python
import multiprocessing
# other imports, custom code, load data, define model...
if __name__ == '__main__':
    multiprocessing.set_start_method('forkserver')

    # call scikit-learn utils with n_jobs > 1 here
```

more info :[multiprocessing document](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)

### core

#### storage

For each study, the data storage and parameters, and the model is additionally stored in the `Storage` object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results.

#### update result

When creating `Study`, you need to specify the direction of optimization `maximize` or `minimize`. Also specify the metrics for optimization when creating `Trials`. The default is `maximize accuracy`.

## auto ml 补完计划

[overview](https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a)

### bayes opt

1. [fmfn/bayes](https://github.com/fmfn/BayesianOptimization)
2. [auto-sklearn](https://github.com/automl/auto-sklearn)

### grid search

1. H2O.ai

### tree parzen

1. hyperopt
2. mlbox

### metaheuristics grid search

1. pybrain

### generation

1.tpot

### dl

1. ms nni

## issues

## updates

### TODO 文档更新。


%package help
Summary:	Development documents and examples for diego
Provides:	python3-diego-doc
%description help


# Diego

Diego: Data in,  IntElliGence Out.

[简体中文](README_zh_CN.md)

A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (`Study`) and generate correlated trials (`Trial`). Then run the code and get a machine learning model. Implemented using Scikit-learn API [glossary](https://scikit-learn.org/stable/glossary.html), using Bayesian optimization and genetic algorithms for automated machine learning.

Inspired by [Fast.ai](https://github.com/fastai/fastai) and [MicroSoft nni](https://github.com/Microsoft/nni).

[![Build Status](https://travis-ci.org/lai-bluejay/diego.svg?branch=master)](https://travis-ci.org/lai-bluejay/diego)
![PyPI](https://img.shields.io/pypi/v/diego.svg?style=flat)
![GitHub](https://img.shields.io/github/license/lai-bluejay/diego.svg)
![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/lai-bluejay/diego.svg)

- [x] the classifier trained by a Study.
- [x] AutoML classifier with support for scikit-learn api. Support for exporting models and use them directly.
- [x] Hyperparametric optimization using Bayesian optimization and genetic algorithms
- [x] Supports bucketing/binning algorithm and LUS sampling method for preprocessing
- [ ] Supports scikit-learn api classifier custom classifier for parameter search and super parameter optimization


## Installation

You need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation

```shell
conda install --yes pip gcc swig libgcc=5.2.0
pip install diego
```

After installation, start with 6 lines of code to solve a machine learning classification problem.

## Usage

Each task is considered to be a `Study`, and each Study consists of multiple `Trial`.
It is recommended to create a Study first and then generate a Trial from the Study:

```python
from diego.study import create_study
import sklearn.datasets
digits = sklearn.datasets.load_digits()
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25)

s = create_study(X_train, y_train)
# can use default trials in Study

# or generate one
# s.generate_trials(mode='fast')
s.optimize(X_test, y_test)
# all_trials = s.get_all_trials()
# for t in all_trials:
#     print(t.__dict__)
#     print(t.clf.score(X_test, y_test))

```

## RoadMap
ideas for releases in the future
- [ ] 回归。
- [ ] add documents.
- [ ] 不同类型的Trial。TPE， BayesOpt， RandomSearch
- [ ] 自定义的Trial。Trials by custom Classifier (like sklearn, xgboost)
- [ ] 模型保存。model persistence
- [ ] 模型输出。model output
- [ ] basic Classifier
- [ ] fix mac os hanged in optimize pipeline
- [ ] add preprocessor
- [ ] add FeatureTools for automated feature engineering


## 

## Project Structure

### study, trials
Study: 

Trial:

### 如果在OS X或者Linux多进程被 hang/crash/freeze

Since n_jobs>1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux)

In Python 3.4+, one solution is to directly configure `multiprocessing` to use `forkserver` or `spawn` to start process pool management (instead of the default `fork`). For example, the `forkserver` mode is enabled globally directly in the code.

```python
import multiprocessing
# other imports, custom code, load data, define model...
if __name__ == '__main__':
    multiprocessing.set_start_method('forkserver')

    # call scikit-learn utils with n_jobs > 1 here
```

more info :[multiprocessing document](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)

### core

#### storage

For each study, the data storage and parameters, and the model is additionally stored in the `Storage` object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results.

#### update result

When creating `Study`, you need to specify the direction of optimization `maximize` or `minimize`. Also specify the metrics for optimization when creating `Trials`. The default is `maximize accuracy`.

## auto ml 补完计划

[overview](https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a)

### bayes opt

1. [fmfn/bayes](https://github.com/fmfn/BayesianOptimization)
2. [auto-sklearn](https://github.com/automl/auto-sklearn)

### grid search

1. H2O.ai

### tree parzen

1. hyperopt
2. mlbox

### metaheuristics grid search

1. pybrain

### generation

1.tpot

### dl

1. ms nni

## issues

## updates

### TODO 文档更新。


%prep
%autosetup -n diego-0.2.7

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-diego -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 0.2.7-1
- Package Spec generated