summaryrefslogtreecommitdiff
path: root/python-diego.spec
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-06-20 05:40:51 +0000
committerCoprDistGit <infra@openeuler.org>2023-06-20 05:40:51 +0000
commitfe2dd91bfbd02d3fed9b54402de453fbc24fa9a4 (patch)
tree482c3ede6a5bc6a27a1a1925b3c8caec16dc9156 /python-diego.spec
parent9352ed59b42ee1b7451f49bd654be9e7bd2487b5 (diff)
automatic import of python-diegoopeneuler20.03
Diffstat (limited to 'python-diego.spec')
-rw-r--r--python-diego.spec530
1 files changed, 530 insertions, 0 deletions
diff --git a/python-diego.spec b/python-diego.spec
new file mode 100644
index 0000000..ad39a5b
--- /dev/null
+++ b/python-diego.spec
@@ -0,0 +1,530 @@
+%global _empty_manifest_terminate_build 0
+Name: python-diego
+Version: 0.2.7
+Release: 1
+Summary: Diego: Data IntElliGence Out.
+License: MIT
+URL: https://github.com/lai-bluejay/diego
+Source0: https://mirrors.aliyun.com/pypi/web/packages/61/b7/9f7bff11aeec39d4af912194c650107660e4a5ddbc3dee7d7fc65a799207/diego-0.2.7.tar.gz
+BuildArch: noarch
+
+Requires: python3-numpy
+Requires: python3-scipy
+Requires: python3-scikit-learn
+Requires: python3-deap
+Requires: python3-update-checker
+Requires: python3-tqdm
+Requires: python3-stopit
+Requires: python3-pandas
+Requires: python3-xgboost
+Requires: python3-pyrfr
+Requires: python3-distributed
+Requires: python3-dask
+Requires: python3-smac
+Requires: python3-ConfigSpace
+Requires: python3-auto-sklearn
+Requires: python3-liac-arff
+Requires: python3-sklearn-contrib-lightning
+
+%description
+
+
+# Diego
+
+Diego: Data in, IntElliGence Out.
+
+[简体中文](README_zh_CN.md)
+
+A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (`Study`) and generate correlated trials (`Trial`). Then run the code and get a machine learning model. Implemented using Scikit-learn API [glossary](https://scikit-learn.org/stable/glossary.html), using Bayesian optimization and genetic algorithms for automated machine learning.
+
+Inspired by [Fast.ai](https://github.com/fastai/fastai) and [MicroSoft nni](https://github.com/Microsoft/nni).
+
+[![Build Status](https://travis-ci.org/lai-bluejay/diego.svg?branch=master)](https://travis-ci.org/lai-bluejay/diego)
+![PyPI](https://img.shields.io/pypi/v/diego.svg?style=flat)
+![GitHub](https://img.shields.io/github/license/lai-bluejay/diego.svg)
+![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/lai-bluejay/diego.svg)
+
+- [x] the classifier trained by a Study.
+- [x] AutoML classifier with support for scikit-learn api. Support for exporting models and use them directly.
+- [x] Hyperparametric optimization using Bayesian optimization and genetic algorithms
+- [x] Supports bucketing/binning algorithm and LUS sampling method for preprocessing
+- [ ] Supports scikit-learn api classifier custom classifier for parameter search and super parameter optimization
+
+
+## Installation
+
+You need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation
+
+```shell
+conda install --yes pip gcc swig libgcc=5.2.0
+pip install diego
+```
+
+After installation, start with 6 lines of code to solve a machine learning classification problem.
+
+## Usage
+
+Each task is considered to be a `Study`, and each Study consists of multiple `Trial`.
+It is recommended to create a Study first and then generate a Trial from the Study:
+
+```python
+from diego.study import create_study
+import sklearn.datasets
+digits = sklearn.datasets.load_digits()
+X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25)
+
+s = create_study(X_train, y_train)
+# can use default trials in Study
+
+# or generate one
+# s.generate_trials(mode='fast')
+s.optimize(X_test, y_test)
+# all_trials = s.get_all_trials()
+# for t in all_trials:
+# print(t.__dict__)
+# print(t.clf.score(X_test, y_test))
+
+```
+
+## RoadMap
+ideas for releases in the future
+- [ ] 回归。
+- [ ] add documents.
+- [ ] 不同类型的Trial。TPE, BayesOpt, RandomSearch
+- [ ] 自定义的Trial。Trials by custom Classifier (like sklearn, xgboost)
+- [ ] 模型保存。model persistence
+- [ ] 模型输出。model output
+- [ ] basic Classifier
+- [ ] fix mac os hanged in optimize pipeline
+- [ ] add preprocessor
+- [ ] add FeatureTools for automated feature engineering
+
+
+##
+
+## Project Structure
+
+### study, trials
+Study:
+
+Trial:
+
+### 如果在OS X或者Linux多进程被 hang/crash/freeze
+
+Since n_jobs>1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux)
+
+In Python 3.4+, one solution is to directly configure `multiprocessing` to use `forkserver` or `spawn` to start process pool management (instead of the default `fork`). For example, the `forkserver` mode is enabled globally directly in the code.
+
+```python
+import multiprocessing
+# other imports, custom code, load data, define model...
+if __name__ == '__main__':
+ multiprocessing.set_start_method('forkserver')
+
+ # call scikit-learn utils with n_jobs > 1 here
+```
+
+more info :[multiprocessing document](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)
+
+### core
+
+#### storage
+
+For each study, the data storage and parameters, and the model is additionally stored in the `Storage` object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results.
+
+#### update result
+
+When creating `Study`, you need to specify the direction of optimization `maximize` or `minimize`. Also specify the metrics for optimization when creating `Trials`. The default is `maximize accuracy`.
+
+## auto ml 补完计划
+
+[overview](https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a)
+
+### bayes opt
+
+1. [fmfn/bayes](https://github.com/fmfn/BayesianOptimization)
+2. [auto-sklearn](https://github.com/automl/auto-sklearn)
+
+### grid search
+
+1. H2O.ai
+
+### tree parzen
+
+1. hyperopt
+2. mlbox
+
+### metaheuristics grid search
+
+1. pybrain
+
+### generation
+
+1.tpot
+
+### dl
+
+1. ms nni
+
+## issues
+
+## updates
+
+### TODO 文档更新。
+
+
+
+
+
+%package -n python3-diego
+Summary: Diego: Data IntElliGence Out.
+Provides: python-diego
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-diego
+
+
+# Diego
+
+Diego: Data in, IntElliGence Out.
+
+[简体中文](README_zh_CN.md)
+
+A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (`Study`) and generate correlated trials (`Trial`). Then run the code and get a machine learning model. Implemented using Scikit-learn API [glossary](https://scikit-learn.org/stable/glossary.html), using Bayesian optimization and genetic algorithms for automated machine learning.
+
+Inspired by [Fast.ai](https://github.com/fastai/fastai) and [MicroSoft nni](https://github.com/Microsoft/nni).
+
+[![Build Status](https://travis-ci.org/lai-bluejay/diego.svg?branch=master)](https://travis-ci.org/lai-bluejay/diego)
+![PyPI](https://img.shields.io/pypi/v/diego.svg?style=flat)
+![GitHub](https://img.shields.io/github/license/lai-bluejay/diego.svg)
+![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/lai-bluejay/diego.svg)
+
+- [x] the classifier trained by a Study.
+- [x] AutoML classifier with support for scikit-learn api. Support for exporting models and use them directly.
+- [x] Hyperparametric optimization using Bayesian optimization and genetic algorithms
+- [x] Supports bucketing/binning algorithm and LUS sampling method for preprocessing
+- [ ] Supports scikit-learn api classifier custom classifier for parameter search and super parameter optimization
+
+
+## Installation
+
+You need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation
+
+```shell
+conda install --yes pip gcc swig libgcc=5.2.0
+pip install diego
+```
+
+After installation, start with 6 lines of code to solve a machine learning classification problem.
+
+## Usage
+
+Each task is considered to be a `Study`, and each Study consists of multiple `Trial`.
+It is recommended to create a Study first and then generate a Trial from the Study:
+
+```python
+from diego.study import create_study
+import sklearn.datasets
+digits = sklearn.datasets.load_digits()
+X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25)
+
+s = create_study(X_train, y_train)
+# can use default trials in Study
+
+# or generate one
+# s.generate_trials(mode='fast')
+s.optimize(X_test, y_test)
+# all_trials = s.get_all_trials()
+# for t in all_trials:
+# print(t.__dict__)
+# print(t.clf.score(X_test, y_test))
+
+```
+
+## RoadMap
+ideas for releases in the future
+- [ ] 回归。
+- [ ] add documents.
+- [ ] 不同类型的Trial。TPE, BayesOpt, RandomSearch
+- [ ] 自定义的Trial。Trials by custom Classifier (like sklearn, xgboost)
+- [ ] 模型保存。model persistence
+- [ ] 模型输出。model output
+- [ ] basic Classifier
+- [ ] fix mac os hanged in optimize pipeline
+- [ ] add preprocessor
+- [ ] add FeatureTools for automated feature engineering
+
+
+##
+
+## Project Structure
+
+### study, trials
+Study:
+
+Trial:
+
+### 如果在OS X或者Linux多进程被 hang/crash/freeze
+
+Since n_jobs>1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux)
+
+In Python 3.4+, one solution is to directly configure `multiprocessing` to use `forkserver` or `spawn` to start process pool management (instead of the default `fork`). For example, the `forkserver` mode is enabled globally directly in the code.
+
+```python
+import multiprocessing
+# other imports, custom code, load data, define model...
+if __name__ == '__main__':
+ multiprocessing.set_start_method('forkserver')
+
+ # call scikit-learn utils with n_jobs > 1 here
+```
+
+more info :[multiprocessing document](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)
+
+### core
+
+#### storage
+
+For each study, the data storage and parameters, and the model is additionally stored in the `Storage` object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results.
+
+#### update result
+
+When creating `Study`, you need to specify the direction of optimization `maximize` or `minimize`. Also specify the metrics for optimization when creating `Trials`. The default is `maximize accuracy`.
+
+## auto ml 补完计划
+
+[overview](https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a)
+
+### bayes opt
+
+1. [fmfn/bayes](https://github.com/fmfn/BayesianOptimization)
+2. [auto-sklearn](https://github.com/automl/auto-sklearn)
+
+### grid search
+
+1. H2O.ai
+
+### tree parzen
+
+1. hyperopt
+2. mlbox
+
+### metaheuristics grid search
+
+1. pybrain
+
+### generation
+
+1.tpot
+
+### dl
+
+1. ms nni
+
+## issues
+
+## updates
+
+### TODO 文档更新。
+
+
+
+
+
+%package help
+Summary: Development documents and examples for diego
+Provides: python3-diego-doc
+%description help
+
+
+# Diego
+
+Diego: Data in, IntElliGence Out.
+
+[简体中文](README_zh_CN.md)
+
+A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (`Study`) and generate correlated trials (`Trial`). Then run the code and get a machine learning model. Implemented using Scikit-learn API [glossary](https://scikit-learn.org/stable/glossary.html), using Bayesian optimization and genetic algorithms for automated machine learning.
+
+Inspired by [Fast.ai](https://github.com/fastai/fastai) and [MicroSoft nni](https://github.com/Microsoft/nni).
+
+[![Build Status](https://travis-ci.org/lai-bluejay/diego.svg?branch=master)](https://travis-ci.org/lai-bluejay/diego)
+![PyPI](https://img.shields.io/pypi/v/diego.svg?style=flat)
+![GitHub](https://img.shields.io/github/license/lai-bluejay/diego.svg)
+![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/lai-bluejay/diego.svg)
+
+- [x] the classifier trained by a Study.
+- [x] AutoML classifier with support for scikit-learn api. Support for exporting models and use them directly.
+- [x] Hyperparametric optimization using Bayesian optimization and genetic algorithms
+- [x] Supports bucketing/binning algorithm and LUS sampling method for preprocessing
+- [ ] Supports scikit-learn api classifier custom classifier for parameter search and super parameter optimization
+
+
+## Installation
+
+You need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation
+
+```shell
+conda install --yes pip gcc swig libgcc=5.2.0
+pip install diego
+```
+
+After installation, start with 6 lines of code to solve a machine learning classification problem.
+
+## Usage
+
+Each task is considered to be a `Study`, and each Study consists of multiple `Trial`.
+It is recommended to create a Study first and then generate a Trial from the Study:
+
+```python
+from diego.study import create_study
+import sklearn.datasets
+digits = sklearn.datasets.load_digits()
+X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25)
+
+s = create_study(X_train, y_train)
+# can use default trials in Study
+
+# or generate one
+# s.generate_trials(mode='fast')
+s.optimize(X_test, y_test)
+# all_trials = s.get_all_trials()
+# for t in all_trials:
+# print(t.__dict__)
+# print(t.clf.score(X_test, y_test))
+
+```
+
+## RoadMap
+ideas for releases in the future
+- [ ] 回归。
+- [ ] add documents.
+- [ ] 不同类型的Trial。TPE, BayesOpt, RandomSearch
+- [ ] 自定义的Trial。Trials by custom Classifier (like sklearn, xgboost)
+- [ ] 模型保存。model persistence
+- [ ] 模型输出。model output
+- [ ] basic Classifier
+- [ ] fix mac os hanged in optimize pipeline
+- [ ] add preprocessor
+- [ ] add FeatureTools for automated feature engineering
+
+
+##
+
+## Project Structure
+
+### study, trials
+Study:
+
+Trial:
+
+### 如果在OS X或者Linux多进程被 hang/crash/freeze
+
+Since n_jobs>1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux)
+
+In Python 3.4+, one solution is to directly configure `multiprocessing` to use `forkserver` or `spawn` to start process pool management (instead of the default `fork`). For example, the `forkserver` mode is enabled globally directly in the code.
+
+```python
+import multiprocessing
+# other imports, custom code, load data, define model...
+if __name__ == '__main__':
+ multiprocessing.set_start_method('forkserver')
+
+ # call scikit-learn utils with n_jobs > 1 here
+```
+
+more info :[multiprocessing document](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)
+
+### core
+
+#### storage
+
+For each study, the data storage and parameters, and the model is additionally stored in the `Storage` object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results.
+
+#### update result
+
+When creating `Study`, you need to specify the direction of optimization `maximize` or `minimize`. Also specify the metrics for optimization when creating `Trials`. The default is `maximize accuracy`.
+
+## auto ml 补完计划
+
+[overview](https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a)
+
+### bayes opt
+
+1. [fmfn/bayes](https://github.com/fmfn/BayesianOptimization)
+2. [auto-sklearn](https://github.com/automl/auto-sklearn)
+
+### grid search
+
+1. H2O.ai
+
+### tree parzen
+
+1. hyperopt
+2. mlbox
+
+### metaheuristics grid search
+
+1. pybrain
+
+### generation
+
+1.tpot
+
+### dl
+
+1. ms nni
+
+## issues
+
+## updates
+
+### TODO 文档更新。
+
+
+
+
+
+%prep
+%autosetup -n diego-0.2.7
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-diego -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 0.2.7-1
+- Package Spec generated