diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-05-15 06:49:57 +0000 |
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-05-15 06:49:57 +0000 |
| commit | 5f0f7ca1eca7eab3cae516ffb49ca5379ae41ac4 (patch) | |
| tree | 5a8d0d3b4d35c96142733185020185c1bee14efa | |
| parent | a1181d47aaa81bd3d8a96437ecbe17cef2427631 (diff) | |
automatic import of python-lightautoml
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-lightautoml.spec | 862 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 864 insertions, 0 deletions
@@ -0,0 +1 @@ +/LightAutoML-0.3.7.3.tar.gz diff --git a/python-lightautoml.spec b/python-lightautoml.spec new file mode 100644 index 0000000..c9e9b50 --- /dev/null +++ b/python-lightautoml.spec @@ -0,0 +1,862 @@ +%global _empty_manifest_terminate_build 0 +Name: python-lightautoml +Version: 0.3.7.3 +Release: 1 +Summary: Fast and customizable framework for automatic ML model creation (AutoML) +License: Apache-2.0 +URL: https://lightautoml.readthedocs.io/en/latest/ +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/b6/eb/fa7decd357f2a9a8fdff961a20262b5f75737fbeacc22e18611342df7fbd/LightAutoML-0.3.7.3.tar.gz +BuildArch: noarch + +Requires: python3-poetry-core +Requires: python3-pandas +Requires: python3-pandas +Requires: python3-pandas +Requires: python3-scikit-learn +Requires: python3-lightgbm +Requires: python3-catboost +Requires: python3-optuna +Requires: python3-torch +Requires: python3-torch +Requires: python3-dataclasses +Requires: python3-holidays +Requires: python3-networkx +Requires: python3-cmaes +Requires: python3-pyyaml +Requires: python3-tqdm +Requires: python3-joblib +Requires: python3-importlib-metadata +Requires: python3-autowoe +Requires: python3-jinja2 +Requires: python3-json2html +Requires: python3-seaborn +Requires: python3-gensim +Requires: python3-nltk +Requires: python3-transformers +Requires: python3-albumentations +Requires: python3-efficientnet-pytorch +Requires: python3-opencv-python +Requires: python3-PyWavelets +Requires: python3-torchvision +Requires: python3-torchvision +Requires: python3-featuretools +Requires: python3-weasyprint +Requires: python3-cffi + +%description +<img src=https://github.com/AILab-MLTools/LightAutoML/raw/master/imgs/LightAutoML_logo_big.png /> + +# LightAutoML - automatic model creation framework + +[](https://t.me/lightautoml) + + +[](https://github.com/psf/black) + +LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks: +- binary classification +- multiclass classification +- regression + +Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**. +Multitable datasets and sequences are a work in progress :) + +**Note**: we use [`AutoWoE`](https://pypi.org/project/autowoe) library to automatically create interpretable models. + +**Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Vasilii Bunakov, Rinchin Damdinov, Alexander Kirilin, Pavel Shvets. + +**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it. + +# (New features) GPU and Spark pipelines +Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for: +- GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU) +- Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA) + +<a name="toc"></a> +# Table of Contents + +* [Installation LightAutoML from PyPI](#installation) +* [Quick tour](#quicktour) +* [Resources](#examples) +* [Contributing to LightAutoML](#contributing) +* [License](#apache) +* [For developers](#developers) +* [Support and feature requests](#support) + +<a name="installation"></a> +# Installation +To install LAMA framework on your machine from PyPI, execute following commands: +```bash + +# Install base functionality: + +pip install -U lightautoml + +# For partial installation use corresponding option. +# Extra dependecies: [nlp, cv, report] +# Or you can use 'all' to install everything + +pip install -U lightautoml[nlp] + +``` + +Additionaly, run following commands to enable pdf report generation: + +```bash +# MacOS +brew install cairo pango gdk-pixbuf libffi + +# Debian / Ubuntu +sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info + +# Fedora +sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2 + +# Windows +# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows +``` +[Back to top](#toc) + +<a name="quicktour"></a> +# Quick tour + +Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML: +* Use ready preset for tabular data: +```python +import pandas as pd +from sklearn.metrics import f1_score + +from lightautoml.automl.presets.tabular_presets import TabularAutoML +from lightautoml.tasks import Task + +df_train = pd.read_csv('../input/titanic/train.csv') +df_test = pd.read_csv('../input/titanic/test.csv') + +automl = TabularAutoML( + task = Task( + name = 'binary', + metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1)) +) +oof_pred = automl.fit_predict( + df_train, + roles = {'target': 'Survived', 'drop': ['PassengerId']} +) +test_pred = automl.predict(df_test) + +pd.DataFrame({ + 'PassengerId':df_test.PassengerId, + 'Survived': (test_pred.data[:, 0] > 0.5)*1 +}).to_csv('submit.csv', index = False) +``` + +LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section. + +[Back to top](#toc) + +<a name="examples"></a> +# Resources + +### Kaggle kernel examples of LightAutoML usage: + +- [Tabular Playground Series April 2021 competition solution](https://www.kaggle.com/alexryzhkov/n3-tps-april-21-lightautoml-starter) +- [Titanic competition solution (80% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-titanic-love) +- [Titanic **12-code-lines** competition solution (78% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-extreme-short-titanic-solution) +- [House prices competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-houseprices-love) +- [Natural Language Processing with Disaster Tweets solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-nlp) +- [Tabular Playground Series March 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-for-tabulardatamarch) +- [Tabular Playground Series February 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-tabulardata-love) +- [Interpretable WhiteBox solution](https://www.kaggle.com/simakov/lama-whitebox-preset-example) +- [Custom ML pipeline elements inside existing ones](https://www.kaggle.com/simakov/lama-custom-automl-pipeline-example) + +### Google Colab tutorials and [other examples](examples/): + +- [`Tutorial_1_basics.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_1_basics.ipynb) - get started with LightAutoML on tabular data. +- [`Tutorial_2_WhiteBox_AutoWoE.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb) - creating interpretable models. +- [`Tutorial_3_sql_data_source.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_3_sql_data_source.ipynb) - shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data from SQL data base instead of CSV. +- [`Tutorial_4_NLP_Interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_4_NLP_Interpretation.ipynb) - example of using TabularNLPAutoML preset, LimeTextExplainer. +- [`Tutorial_5_uplift.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_5_uplift.ipynb) - shows how to use LightAutoML for a uplift-modeling task. +- [`Tutorial_6_custom_pipeline.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_6_custom_pipeline.ipynb) - shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc. +- [`Tutorial_7_ICE_and_PDP_interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_7_ICE_and_PDP_interpretation.ipynb) - shows how to obtain local and global interpretation of model results using ICE and PDP approaches. +- [`Tutorial_8_CV_preset.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_8_CV_preset.ipynb) - example of using TabularCVAutoML preset in CV multi-class classification task. + + +**Note 1**: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default + +**Note 2**: to take a look at this report after the run, please comment last line of demo with report deletion command. + +### Courses, videos and papers + +* **LightAutoML crash courses**: + - (Russian) [AutoML course for OpenDataScience community](https://ods.ai/tracks/automl-course-part1) + +* **Video guides**: + - (Russian) [LightAutoML webinar for Sberloga community](https://www.youtube.com/watch?v=ci8uqgWFJGg) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Dmitry Simakov](https://kaggle.com/simakov)) + - (Russian) [LightAutoML hands-on tutorial in Kaggle Kernels](https://www.youtube.com/watch?v=TYu1UG-E9e8) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) + - (English) [Automated Machine Learning with LightAutoML: theory and practice](https://www.youtube.com/watch?v=4pbO673B9Oo) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) + - (English) [LightAutoML framework general overview, benchmarks and advantages for business](https://vimeo.com/485383651) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) + - (English) [LightAutoML practical guide - ML pipeline presets overview](https://vimeo.com/487166940) ([Dmitry Simakov](https://kaggle.com/simakov)) + +* **Papers**: + - Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin ["LightAutoML: AutoML Solution for a Large Financial Services Ecosystem"](https://arxiv.org/pdf/2109.01528.pdf). arXiv:2109.01528, 2021. + +* **Articles about LightAutoML**: + - (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936) + - (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g) + +[Back to top](#toc) + +<a name="contributing"></a> +# Contributing to LightAutoML +If you are interested in contributing to LightAutoML, please read the [Contributing Guide](.github/CONTRIBUTING.md) to get started. + +[Back to top](#toc) + +<a name="apache"></a> +# License +This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details. + +[Back to top](#toc) + +<a name="developers"></a> +# For developers + +## Build your own custom pipeline: + +```python +import pandas as pd +from sklearn.metrics import f1_score + +from lightautoml.automl.presets.tabular_presets import TabularAutoML +from lightautoml.tasks import Task + +df_train = pd.read_csv('../input/titanic/train.csv') +df_test = pd.read_csv('../input/titanic/test.csv') + +# define that machine learning problem is binary classification +task = Task("binary") + +reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE) + +# create a feature selector +model0 = BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 64, + 'seed': 42, 'num_threads': N_THREADS} +) +pipe0 = LGBSimpleFeatures() +mbie = ModelBasedImportanceEstimator() +selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0) + +# build first level pipeline for AutoML +pipe = LGBSimpleFeatures() +# stop after 20 iterations or after 30 seconds +params_tuner1 = OptunaTuner(n_trials=20, timeout=30) +model1 = BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 128, + 'seed': 1, 'num_threads': N_THREADS} +) +model2 = BoostLGBM( + default_params={'learning_rate': 0.025, 'num_leaves': 64, + 'seed': 2, 'num_threads': N_THREADS} +) +pipeline_lvl1 = MLPipeline([ + (model1, params_tuner1), + model2 +], pre_selection=selector, features_pipeline=pipe, post_selection=None) + +# build second level pipeline for AutoML +pipe1 = LGBSimpleFeatures() +model = BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 64, + 'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS}, + freeze_defaults=True +) +pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1, + post_selection=None) + +# build AutoML pipeline +automl = AutoML(reader, [ + [pipeline_lvl1], + [pipeline_lvl2], +], skip_conn=False) + +# train AutoML and get predictions +oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']}) +test_pred = automl.predict(df_test) + +pd.DataFrame({ + 'PassengerId':df_test.PassengerId, + 'Survived': (test_pred.data[:, 0] > 0.5)*1 +}).to_csv('submit.csv', index = False) +``` + +[Back to top](#toc) + +<a name="support"></a> +# Support and feature requests +Seek prompt advice at [Telegram group](https://t.me/lightautoml). + +Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues). + + +%package -n python3-lightautoml +Summary: Fast and customizable framework for automatic ML model creation (AutoML) +Provides: python-lightautoml +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-lightautoml +<img src=https://github.com/AILab-MLTools/LightAutoML/raw/master/imgs/LightAutoML_logo_big.png /> + +# LightAutoML - automatic model creation framework + +[](https://t.me/lightautoml) + + +[](https://github.com/psf/black) + +LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks: +- binary classification +- multiclass classification +- regression + +Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**. +Multitable datasets and sequences are a work in progress :) + +**Note**: we use [`AutoWoE`](https://pypi.org/project/autowoe) library to automatically create interpretable models. + +**Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Vasilii Bunakov, Rinchin Damdinov, Alexander Kirilin, Pavel Shvets. + +**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it. + +# (New features) GPU and Spark pipelines +Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for: +- GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU) +- Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA) + +<a name="toc"></a> +# Table of Contents + +* [Installation LightAutoML from PyPI](#installation) +* [Quick tour](#quicktour) +* [Resources](#examples) +* [Contributing to LightAutoML](#contributing) +* [License](#apache) +* [For developers](#developers) +* [Support and feature requests](#support) + +<a name="installation"></a> +# Installation +To install LAMA framework on your machine from PyPI, execute following commands: +```bash + +# Install base functionality: + +pip install -U lightautoml + +# For partial installation use corresponding option. +# Extra dependecies: [nlp, cv, report] +# Or you can use 'all' to install everything + +pip install -U lightautoml[nlp] + +``` + +Additionaly, run following commands to enable pdf report generation: + +```bash +# MacOS +brew install cairo pango gdk-pixbuf libffi + +# Debian / Ubuntu +sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info + +# Fedora +sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2 + +# Windows +# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows +``` +[Back to top](#toc) + +<a name="quicktour"></a> +# Quick tour + +Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML: +* Use ready preset for tabular data: +```python +import pandas as pd +from sklearn.metrics import f1_score + +from lightautoml.automl.presets.tabular_presets import TabularAutoML +from lightautoml.tasks import Task + +df_train = pd.read_csv('../input/titanic/train.csv') +df_test = pd.read_csv('../input/titanic/test.csv') + +automl = TabularAutoML( + task = Task( + name = 'binary', + metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1)) +) +oof_pred = automl.fit_predict( + df_train, + roles = {'target': 'Survived', 'drop': ['PassengerId']} +) +test_pred = automl.predict(df_test) + +pd.DataFrame({ + 'PassengerId':df_test.PassengerId, + 'Survived': (test_pred.data[:, 0] > 0.5)*1 +}).to_csv('submit.csv', index = False) +``` + +LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section. + +[Back to top](#toc) + +<a name="examples"></a> +# Resources + +### Kaggle kernel examples of LightAutoML usage: + +- [Tabular Playground Series April 2021 competition solution](https://www.kaggle.com/alexryzhkov/n3-tps-april-21-lightautoml-starter) +- [Titanic competition solution (80% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-titanic-love) +- [Titanic **12-code-lines** competition solution (78% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-extreme-short-titanic-solution) +- [House prices competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-houseprices-love) +- [Natural Language Processing with Disaster Tweets solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-nlp) +- [Tabular Playground Series March 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-for-tabulardatamarch) +- [Tabular Playground Series February 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-tabulardata-love) +- [Interpretable WhiteBox solution](https://www.kaggle.com/simakov/lama-whitebox-preset-example) +- [Custom ML pipeline elements inside existing ones](https://www.kaggle.com/simakov/lama-custom-automl-pipeline-example) + +### Google Colab tutorials and [other examples](examples/): + +- [`Tutorial_1_basics.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_1_basics.ipynb) - get started with LightAutoML on tabular data. +- [`Tutorial_2_WhiteBox_AutoWoE.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb) - creating interpretable models. +- [`Tutorial_3_sql_data_source.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_3_sql_data_source.ipynb) - shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data from SQL data base instead of CSV. +- [`Tutorial_4_NLP_Interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_4_NLP_Interpretation.ipynb) - example of using TabularNLPAutoML preset, LimeTextExplainer. +- [`Tutorial_5_uplift.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_5_uplift.ipynb) - shows how to use LightAutoML for a uplift-modeling task. +- [`Tutorial_6_custom_pipeline.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_6_custom_pipeline.ipynb) - shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc. +- [`Tutorial_7_ICE_and_PDP_interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_7_ICE_and_PDP_interpretation.ipynb) - shows how to obtain local and global interpretation of model results using ICE and PDP approaches. +- [`Tutorial_8_CV_preset.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_8_CV_preset.ipynb) - example of using TabularCVAutoML preset in CV multi-class classification task. + + +**Note 1**: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default + +**Note 2**: to take a look at this report after the run, please comment last line of demo with report deletion command. + +### Courses, videos and papers + +* **LightAutoML crash courses**: + - (Russian) [AutoML course for OpenDataScience community](https://ods.ai/tracks/automl-course-part1) + +* **Video guides**: + - (Russian) [LightAutoML webinar for Sberloga community](https://www.youtube.com/watch?v=ci8uqgWFJGg) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Dmitry Simakov](https://kaggle.com/simakov)) + - (Russian) [LightAutoML hands-on tutorial in Kaggle Kernels](https://www.youtube.com/watch?v=TYu1UG-E9e8) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) + - (English) [Automated Machine Learning with LightAutoML: theory and practice](https://www.youtube.com/watch?v=4pbO673B9Oo) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) + - (English) [LightAutoML framework general overview, benchmarks and advantages for business](https://vimeo.com/485383651) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) + - (English) [LightAutoML practical guide - ML pipeline presets overview](https://vimeo.com/487166940) ([Dmitry Simakov](https://kaggle.com/simakov)) + +* **Papers**: + - Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin ["LightAutoML: AutoML Solution for a Large Financial Services Ecosystem"](https://arxiv.org/pdf/2109.01528.pdf). arXiv:2109.01528, 2021. + +* **Articles about LightAutoML**: + - (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936) + - (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g) + +[Back to top](#toc) + +<a name="contributing"></a> +# Contributing to LightAutoML +If you are interested in contributing to LightAutoML, please read the [Contributing Guide](.github/CONTRIBUTING.md) to get started. + +[Back to top](#toc) + +<a name="apache"></a> +# License +This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details. + +[Back to top](#toc) + +<a name="developers"></a> +# For developers + +## Build your own custom pipeline: + +```python +import pandas as pd +from sklearn.metrics import f1_score + +from lightautoml.automl.presets.tabular_presets import TabularAutoML +from lightautoml.tasks import Task + +df_train = pd.read_csv('../input/titanic/train.csv') +df_test = pd.read_csv('../input/titanic/test.csv') + +# define that machine learning problem is binary classification +task = Task("binary") + +reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE) + +# create a feature selector +model0 = BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 64, + 'seed': 42, 'num_threads': N_THREADS} +) +pipe0 = LGBSimpleFeatures() +mbie = ModelBasedImportanceEstimator() +selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0) + +# build first level pipeline for AutoML +pipe = LGBSimpleFeatures() +# stop after 20 iterations or after 30 seconds +params_tuner1 = OptunaTuner(n_trials=20, timeout=30) +model1 = BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 128, + 'seed': 1, 'num_threads': N_THREADS} +) +model2 = BoostLGBM( + default_params={'learning_rate': 0.025, 'num_leaves': 64, + 'seed': 2, 'num_threads': N_THREADS} +) +pipeline_lvl1 = MLPipeline([ + (model1, params_tuner1), + model2 +], pre_selection=selector, features_pipeline=pipe, post_selection=None) + +# build second level pipeline for AutoML +pipe1 = LGBSimpleFeatures() +model = BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 64, + 'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS}, + freeze_defaults=True +) +pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1, + post_selection=None) + +# build AutoML pipeline +automl = AutoML(reader, [ + [pipeline_lvl1], + [pipeline_lvl2], +], skip_conn=False) + +# train AutoML and get predictions +oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']}) +test_pred = automl.predict(df_test) + +pd.DataFrame({ + 'PassengerId':df_test.PassengerId, + 'Survived': (test_pred.data[:, 0] > 0.5)*1 +}).to_csv('submit.csv', index = False) +``` + +[Back to top](#toc) + +<a name="support"></a> +# Support and feature requests +Seek prompt advice at [Telegram group](https://t.me/lightautoml). + +Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues). + + +%package help +Summary: Development documents and examples for lightautoml +Provides: python3-lightautoml-doc +%description help +<img src=https://github.com/AILab-MLTools/LightAutoML/raw/master/imgs/LightAutoML_logo_big.png /> + +# LightAutoML - automatic model creation framework + +[](https://t.me/lightautoml) + + +[](https://github.com/psf/black) + +LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks: +- binary classification +- multiclass classification +- regression + +Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**. +Multitable datasets and sequences are a work in progress :) + +**Note**: we use [`AutoWoE`](https://pypi.org/project/autowoe) library to automatically create interpretable models. + +**Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Vasilii Bunakov, Rinchin Damdinov, Alexander Kirilin, Pavel Shvets. + +**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it. + +# (New features) GPU and Spark pipelines +Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for: +- GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU) +- Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA) + +<a name="toc"></a> +# Table of Contents + +* [Installation LightAutoML from PyPI](#installation) +* [Quick tour](#quicktour) +* [Resources](#examples) +* [Contributing to LightAutoML](#contributing) +* [License](#apache) +* [For developers](#developers) +* [Support and feature requests](#support) + +<a name="installation"></a> +# Installation +To install LAMA framework on your machine from PyPI, execute following commands: +```bash + +# Install base functionality: + +pip install -U lightautoml + +# For partial installation use corresponding option. +# Extra dependecies: [nlp, cv, report] +# Or you can use 'all' to install everything + +pip install -U lightautoml[nlp] + +``` + +Additionaly, run following commands to enable pdf report generation: + +```bash +# MacOS +brew install cairo pango gdk-pixbuf libffi + +# Debian / Ubuntu +sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info + +# Fedora +sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2 + +# Windows +# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows +``` +[Back to top](#toc) + +<a name="quicktour"></a> +# Quick tour + +Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML: +* Use ready preset for tabular data: +```python +import pandas as pd +from sklearn.metrics import f1_score + +from lightautoml.automl.presets.tabular_presets import TabularAutoML +from lightautoml.tasks import Task + +df_train = pd.read_csv('../input/titanic/train.csv') +df_test = pd.read_csv('../input/titanic/test.csv') + +automl = TabularAutoML( + task = Task( + name = 'binary', + metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1)) +) +oof_pred = automl.fit_predict( + df_train, + roles = {'target': 'Survived', 'drop': ['PassengerId']} +) +test_pred = automl.predict(df_test) + +pd.DataFrame({ + 'PassengerId':df_test.PassengerId, + 'Survived': (test_pred.data[:, 0] > 0.5)*1 +}).to_csv('submit.csv', index = False) +``` + +LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section. + +[Back to top](#toc) + +<a name="examples"></a> +# Resources + +### Kaggle kernel examples of LightAutoML usage: + +- [Tabular Playground Series April 2021 competition solution](https://www.kaggle.com/alexryzhkov/n3-tps-april-21-lightautoml-starter) +- [Titanic competition solution (80% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-titanic-love) +- [Titanic **12-code-lines** competition solution (78% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-extreme-short-titanic-solution) +- [House prices competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-houseprices-love) +- [Natural Language Processing with Disaster Tweets solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-nlp) +- [Tabular Playground Series March 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-for-tabulardatamarch) +- [Tabular Playground Series February 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-tabulardata-love) +- [Interpretable WhiteBox solution](https://www.kaggle.com/simakov/lama-whitebox-preset-example) +- [Custom ML pipeline elements inside existing ones](https://www.kaggle.com/simakov/lama-custom-automl-pipeline-example) + +### Google Colab tutorials and [other examples](examples/): + +- [`Tutorial_1_basics.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_1_basics.ipynb) - get started with LightAutoML on tabular data. +- [`Tutorial_2_WhiteBox_AutoWoE.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb) - creating interpretable models. +- [`Tutorial_3_sql_data_source.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_3_sql_data_source.ipynb) - shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data from SQL data base instead of CSV. +- [`Tutorial_4_NLP_Interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_4_NLP_Interpretation.ipynb) - example of using TabularNLPAutoML preset, LimeTextExplainer. +- [`Tutorial_5_uplift.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_5_uplift.ipynb) - shows how to use LightAutoML for a uplift-modeling task. +- [`Tutorial_6_custom_pipeline.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_6_custom_pipeline.ipynb) - shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc. +- [`Tutorial_7_ICE_and_PDP_interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_7_ICE_and_PDP_interpretation.ipynb) - shows how to obtain local and global interpretation of model results using ICE and PDP approaches. +- [`Tutorial_8_CV_preset.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_8_CV_preset.ipynb) - example of using TabularCVAutoML preset in CV multi-class classification task. + + +**Note 1**: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default + +**Note 2**: to take a look at this report after the run, please comment last line of demo with report deletion command. + +### Courses, videos and papers + +* **LightAutoML crash courses**: + - (Russian) [AutoML course for OpenDataScience community](https://ods.ai/tracks/automl-course-part1) + +* **Video guides**: + - (Russian) [LightAutoML webinar for Sberloga community](https://www.youtube.com/watch?v=ci8uqgWFJGg) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Dmitry Simakov](https://kaggle.com/simakov)) + - (Russian) [LightAutoML hands-on tutorial in Kaggle Kernels](https://www.youtube.com/watch?v=TYu1UG-E9e8) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) + - (English) [Automated Machine Learning with LightAutoML: theory and practice](https://www.youtube.com/watch?v=4pbO673B9Oo) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) + - (English) [LightAutoML framework general overview, benchmarks and advantages for business](https://vimeo.com/485383651) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) + - (English) [LightAutoML practical guide - ML pipeline presets overview](https://vimeo.com/487166940) ([Dmitry Simakov](https://kaggle.com/simakov)) + +* **Papers**: + - Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin ["LightAutoML: AutoML Solution for a Large Financial Services Ecosystem"](https://arxiv.org/pdf/2109.01528.pdf). arXiv:2109.01528, 2021. + +* **Articles about LightAutoML**: + - (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936) + - (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g) + +[Back to top](#toc) + +<a name="contributing"></a> +# Contributing to LightAutoML +If you are interested in contributing to LightAutoML, please read the [Contributing Guide](.github/CONTRIBUTING.md) to get started. + +[Back to top](#toc) + +<a name="apache"></a> +# License +This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details. + +[Back to top](#toc) + +<a name="developers"></a> +# For developers + +## Build your own custom pipeline: + +```python +import pandas as pd +from sklearn.metrics import f1_score + +from lightautoml.automl.presets.tabular_presets import TabularAutoML +from lightautoml.tasks import Task + +df_train = pd.read_csv('../input/titanic/train.csv') +df_test = pd.read_csv('../input/titanic/test.csv') + +# define that machine learning problem is binary classification +task = Task("binary") + +reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE) + +# create a feature selector +model0 = BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 64, + 'seed': 42, 'num_threads': N_THREADS} +) +pipe0 = LGBSimpleFeatures() +mbie = ModelBasedImportanceEstimator() +selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0) + +# build first level pipeline for AutoML +pipe = LGBSimpleFeatures() +# stop after 20 iterations or after 30 seconds +params_tuner1 = OptunaTuner(n_trials=20, timeout=30) +model1 = BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 128, + 'seed': 1, 'num_threads': N_THREADS} +) +model2 = BoostLGBM( + default_params={'learning_rate': 0.025, 'num_leaves': 64, + 'seed': 2, 'num_threads': N_THREADS} +) +pipeline_lvl1 = MLPipeline([ + (model1, params_tuner1), + model2 +], pre_selection=selector, features_pipeline=pipe, post_selection=None) + +# build second level pipeline for AutoML +pipe1 = LGBSimpleFeatures() +model = BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 64, + 'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS}, + freeze_defaults=True +) +pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1, + post_selection=None) + +# build AutoML pipeline +automl = AutoML(reader, [ + [pipeline_lvl1], + [pipeline_lvl2], +], skip_conn=False) + +# train AutoML and get predictions +oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']}) +test_pred = automl.predict(df_test) + +pd.DataFrame({ + 'PassengerId':df_test.PassengerId, + 'Survived': (test_pred.data[:, 0] > 0.5)*1 +}).to_csv('submit.csv', index = False) +``` + +[Back to top](#toc) + +<a name="support"></a> +# Support and feature requests +Seek prompt advice at [Telegram group](https://t.me/lightautoml). + +Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues). + + +%prep +%autosetup -n lightautoml-0.3.7.3 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-lightautoml -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Mon May 15 2023 Python_Bot <Python_Bot@openeuler.org> - 0.3.7.3-1 +- Package Spec generated @@ -0,0 +1 @@ +ce0719dc7e6fcba0fe2ca16ef7fa679a LightAutoML-0.3.7.3.tar.gz |
