diff options
Diffstat (limited to 'python-diego.spec')
-rw-r--r-- | python-diego.spec | 530 |
1 files changed, 530 insertions, 0 deletions
diff --git a/python-diego.spec b/python-diego.spec new file mode 100644 index 0000000..ad39a5b --- /dev/null +++ b/python-diego.spec @@ -0,0 +1,530 @@ +%global _empty_manifest_terminate_build 0 +Name: python-diego +Version: 0.2.7 +Release: 1 +Summary: Diego: Data IntElliGence Out. +License: MIT +URL: https://github.com/lai-bluejay/diego +Source0: https://mirrors.aliyun.com/pypi/web/packages/61/b7/9f7bff11aeec39d4af912194c650107660e4a5ddbc3dee7d7fc65a799207/diego-0.2.7.tar.gz +BuildArch: noarch + +Requires: python3-numpy +Requires: python3-scipy +Requires: python3-scikit-learn +Requires: python3-deap +Requires: python3-update-checker +Requires: python3-tqdm +Requires: python3-stopit +Requires: python3-pandas +Requires: python3-xgboost +Requires: python3-pyrfr +Requires: python3-distributed +Requires: python3-dask +Requires: python3-smac +Requires: python3-ConfigSpace +Requires: python3-auto-sklearn +Requires: python3-liac-arff +Requires: python3-sklearn-contrib-lightning + +%description + + +# Diego + +Diego: Data in, IntElliGence Out. + +[简体中文](README_zh_CN.md) + +A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (`Study`) and generate correlated trials (`Trial`). Then run the code and get a machine learning model. Implemented using Scikit-learn API [glossary](https://scikit-learn.org/stable/glossary.html), using Bayesian optimization and genetic algorithms for automated machine learning. + +Inspired by [Fast.ai](https://github.com/fastai/fastai) and [MicroSoft nni](https://github.com/Microsoft/nni). + +[](https://travis-ci.org/lai-bluejay/diego) + + + + +- [x] the classifier trained by a Study. +- [x] AutoML classifier with support for scikit-learn api. Support for exporting models and use them directly. +- [x] Hyperparametric optimization using Bayesian optimization and genetic algorithms +- [x] Supports bucketing/binning algorithm and LUS sampling method for preprocessing +- [ ] Supports scikit-learn api classifier custom classifier for parameter search and super parameter optimization + + +## Installation + +You need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation + +```shell +conda install --yes pip gcc swig libgcc=5.2.0 +pip install diego +``` + +After installation, start with 6 lines of code to solve a machine learning classification problem. + +## Usage + +Each task is considered to be a `Study`, and each Study consists of multiple `Trial`. +It is recommended to create a Study first and then generate a Trial from the Study: + +```python +from diego.study import create_study +import sklearn.datasets +digits = sklearn.datasets.load_digits() +X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25) + +s = create_study(X_train, y_train) +# can use default trials in Study + +# or generate one +# s.generate_trials(mode='fast') +s.optimize(X_test, y_test) +# all_trials = s.get_all_trials() +# for t in all_trials: +# print(t.__dict__) +# print(t.clf.score(X_test, y_test)) + +``` + +## RoadMap +ideas for releases in the future +- [ ] 回归。 +- [ ] add documents. +- [ ] 不同类型的Trial。TPE, BayesOpt, RandomSearch +- [ ] 自定义的Trial。Trials by custom Classifier (like sklearn, xgboost) +- [ ] 模型保存。model persistence +- [ ] 模型输出。model output +- [ ] basic Classifier +- [ ] fix mac os hanged in optimize pipeline +- [ ] add preprocessor +- [ ] add FeatureTools for automated feature engineering + + +## + +## Project Structure + +### study, trials +Study: + +Trial: + +### 如果在OS X或者Linux多进程被 hang/crash/freeze + +Since n_jobs>1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux) + +In Python 3.4+, one solution is to directly configure `multiprocessing` to use `forkserver` or `spawn` to start process pool management (instead of the default `fork`). For example, the `forkserver` mode is enabled globally directly in the code. + +```python +import multiprocessing +# other imports, custom code, load data, define model... +if __name__ == '__main__': + multiprocessing.set_start_method('forkserver') + + # call scikit-learn utils with n_jobs > 1 here +``` + +more info :[multiprocessing document](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) + +### core + +#### storage + +For each study, the data storage and parameters, and the model is additionally stored in the `Storage` object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results. + +#### update result + +When creating `Study`, you need to specify the direction of optimization `maximize` or `minimize`. Also specify the metrics for optimization when creating `Trials`. The default is `maximize accuracy`. + +## auto ml 补完计划 + +[overview](https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a) + +### bayes opt + +1. [fmfn/bayes](https://github.com/fmfn/BayesianOptimization) +2. [auto-sklearn](https://github.com/automl/auto-sklearn) + +### grid search + +1. H2O.ai + +### tree parzen + +1. hyperopt +2. mlbox + +### metaheuristics grid search + +1. pybrain + +### generation + +1.tpot + +### dl + +1. ms nni + +## issues + +## updates + +### TODO 文档更新。 + + + + + +%package -n python3-diego +Summary: Diego: Data IntElliGence Out. +Provides: python-diego +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-diego + + +# Diego + +Diego: Data in, IntElliGence Out. + +[简体中文](README_zh_CN.md) + +A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (`Study`) and generate correlated trials (`Trial`). Then run the code and get a machine learning model. Implemented using Scikit-learn API [glossary](https://scikit-learn.org/stable/glossary.html), using Bayesian optimization and genetic algorithms for automated machine learning. + +Inspired by [Fast.ai](https://github.com/fastai/fastai) and [MicroSoft nni](https://github.com/Microsoft/nni). + +[](https://travis-ci.org/lai-bluejay/diego) + + + + +- [x] the classifier trained by a Study. +- [x] AutoML classifier with support for scikit-learn api. Support for exporting models and use them directly. +- [x] Hyperparametric optimization using Bayesian optimization and genetic algorithms +- [x] Supports bucketing/binning algorithm and LUS sampling method for preprocessing +- [ ] Supports scikit-learn api classifier custom classifier for parameter search and super parameter optimization + + +## Installation + +You need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation + +```shell +conda install --yes pip gcc swig libgcc=5.2.0 +pip install diego +``` + +After installation, start with 6 lines of code to solve a machine learning classification problem. + +## Usage + +Each task is considered to be a `Study`, and each Study consists of multiple `Trial`. +It is recommended to create a Study first and then generate a Trial from the Study: + +```python +from diego.study import create_study +import sklearn.datasets +digits = sklearn.datasets.load_digits() +X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25) + +s = create_study(X_train, y_train) +# can use default trials in Study + +# or generate one +# s.generate_trials(mode='fast') +s.optimize(X_test, y_test) +# all_trials = s.get_all_trials() +# for t in all_trials: +# print(t.__dict__) +# print(t.clf.score(X_test, y_test)) + +``` + +## RoadMap +ideas for releases in the future +- [ ] 回归。 +- [ ] add documents. +- [ ] 不同类型的Trial。TPE, BayesOpt, RandomSearch +- [ ] 自定义的Trial。Trials by custom Classifier (like sklearn, xgboost) +- [ ] 模型保存。model persistence +- [ ] 模型输出。model output +- [ ] basic Classifier +- [ ] fix mac os hanged in optimize pipeline +- [ ] add preprocessor +- [ ] add FeatureTools for automated feature engineering + + +## + +## Project Structure + +### study, trials +Study: + +Trial: + +### 如果在OS X或者Linux多进程被 hang/crash/freeze + +Since n_jobs>1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux) + +In Python 3.4+, one solution is to directly configure `multiprocessing` to use `forkserver` or `spawn` to start process pool management (instead of the default `fork`). For example, the `forkserver` mode is enabled globally directly in the code. + +```python +import multiprocessing +# other imports, custom code, load data, define model... +if __name__ == '__main__': + multiprocessing.set_start_method('forkserver') + + # call scikit-learn utils with n_jobs > 1 here +``` + +more info :[multiprocessing document](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) + +### core + +#### storage + +For each study, the data storage and parameters, and the model is additionally stored in the `Storage` object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results. + +#### update result + +When creating `Study`, you need to specify the direction of optimization `maximize` or `minimize`. Also specify the metrics for optimization when creating `Trials`. The default is `maximize accuracy`. + +## auto ml 补完计划 + +[overview](https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a) + +### bayes opt + +1. [fmfn/bayes](https://github.com/fmfn/BayesianOptimization) +2. [auto-sklearn](https://github.com/automl/auto-sklearn) + +### grid search + +1. H2O.ai + +### tree parzen + +1. hyperopt +2. mlbox + +### metaheuristics grid search + +1. pybrain + +### generation + +1.tpot + +### dl + +1. ms nni + +## issues + +## updates + +### TODO 文档更新。 + + + + + +%package help +Summary: Development documents and examples for diego +Provides: python3-diego-doc +%description help + + +# Diego + +Diego: Data in, IntElliGence Out. + +[简体中文](README_zh_CN.md) + +A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (`Study`) and generate correlated trials (`Trial`). Then run the code and get a machine learning model. Implemented using Scikit-learn API [glossary](https://scikit-learn.org/stable/glossary.html), using Bayesian optimization and genetic algorithms for automated machine learning. + +Inspired by [Fast.ai](https://github.com/fastai/fastai) and [MicroSoft nni](https://github.com/Microsoft/nni). + +[](https://travis-ci.org/lai-bluejay/diego) + + + + +- [x] the classifier trained by a Study. +- [x] AutoML classifier with support for scikit-learn api. Support for exporting models and use them directly. +- [x] Hyperparametric optimization using Bayesian optimization and genetic algorithms +- [x] Supports bucketing/binning algorithm and LUS sampling method for preprocessing +- [ ] Supports scikit-learn api classifier custom classifier for parameter search and super parameter optimization + + +## Installation + +You need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation + +```shell +conda install --yes pip gcc swig libgcc=5.2.0 +pip install diego +``` + +After installation, start with 6 lines of code to solve a machine learning classification problem. + +## Usage + +Each task is considered to be a `Study`, and each Study consists of multiple `Trial`. +It is recommended to create a Study first and then generate a Trial from the Study: + +```python +from diego.study import create_study +import sklearn.datasets +digits = sklearn.datasets.load_digits() +X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25) + +s = create_study(X_train, y_train) +# can use default trials in Study + +# or generate one +# s.generate_trials(mode='fast') +s.optimize(X_test, y_test) +# all_trials = s.get_all_trials() +# for t in all_trials: +# print(t.__dict__) +# print(t.clf.score(X_test, y_test)) + +``` + +## RoadMap +ideas for releases in the future +- [ ] 回归。 +- [ ] add documents. +- [ ] 不同类型的Trial。TPE, BayesOpt, RandomSearch +- [ ] 自定义的Trial。Trials by custom Classifier (like sklearn, xgboost) +- [ ] 模型保存。model persistence +- [ ] 模型输出。model output +- [ ] basic Classifier +- [ ] fix mac os hanged in optimize pipeline +- [ ] add preprocessor +- [ ] add FeatureTools for automated feature engineering + + +## + +## Project Structure + +### study, trials +Study: + +Trial: + +### 如果在OS X或者Linux多进程被 hang/crash/freeze + +Since n_jobs>1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux) + +In Python 3.4+, one solution is to directly configure `multiprocessing` to use `forkserver` or `spawn` to start process pool management (instead of the default `fork`). For example, the `forkserver` mode is enabled globally directly in the code. + +```python +import multiprocessing +# other imports, custom code, load data, define model... +if __name__ == '__main__': + multiprocessing.set_start_method('forkserver') + + # call scikit-learn utils with n_jobs > 1 here +``` + +more info :[multiprocessing document](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) + +### core + +#### storage + +For each study, the data storage and parameters, and the model is additionally stored in the `Storage` object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results. + +#### update result + +When creating `Study`, you need to specify the direction of optimization `maximize` or `minimize`. Also specify the metrics for optimization when creating `Trials`. The default is `maximize accuracy`. + +## auto ml 补完计划 + +[overview](https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a) + +### bayes opt + +1. [fmfn/bayes](https://github.com/fmfn/BayesianOptimization) +2. [auto-sklearn](https://github.com/automl/auto-sklearn) + +### grid search + +1. H2O.ai + +### tree parzen + +1. hyperopt +2. mlbox + +### metaheuristics grid search + +1. pybrain + +### generation + +1.tpot + +### dl + +1. ms nni + +## issues + +## updates + +### TODO 文档更新。 + + + + + +%prep +%autosetup -n diego-0.2.7 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-diego -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 0.2.7-1 +- Package Spec generated |