%global _empty_manifest_terminate_build 0 Name: python-lightautoml Version: 0.3.7.3 Release: 1 Summary: Fast and customizable framework for automatic ML model creation (AutoML) License: Apache-2.0 URL: https://lightautoml.readthedocs.io/en/latest/ Source0: https://mirrors.nju.edu.cn/pypi/web/packages/b6/eb/fa7decd357f2a9a8fdff961a20262b5f75737fbeacc22e18611342df7fbd/LightAutoML-0.3.7.3.tar.gz BuildArch: noarch Requires: python3-poetry-core Requires: python3-pandas Requires: python3-pandas Requires: python3-pandas Requires: python3-scikit-learn Requires: python3-lightgbm Requires: python3-catboost Requires: python3-optuna Requires: python3-torch Requires: python3-torch Requires: python3-dataclasses Requires: python3-holidays Requires: python3-networkx Requires: python3-cmaes Requires: python3-pyyaml Requires: python3-tqdm Requires: python3-joblib Requires: python3-importlib-metadata Requires: python3-autowoe Requires: python3-jinja2 Requires: python3-json2html Requires: python3-seaborn Requires: python3-gensim Requires: python3-nltk Requires: python3-transformers Requires: python3-albumentations Requires: python3-efficientnet-pytorch Requires: python3-opencv-python Requires: python3-PyWavelets Requires: python3-torchvision Requires: python3-torchvision Requires: python3-featuretools Requires: python3-weasyprint Requires: python3-cffi %description # LightAutoML - automatic model creation framework [![Telegram](https://img.shields.io/badge/chat-on%20Telegram-2ba2d9.svg)](https://t.me/lightautoml) ![PyPI - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=orange&style=plastic) ![Read the Docs](https://img.shields.io/readthedocs/lightautoml?style=plastic) [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks: - binary classification - multiclass classification - regression Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**. Multitable datasets and sequences are a work in progress :) **Note**: we use [`AutoWoE`](https://pypi.org/project/autowoe) library to automatically create interpretable models. **Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Vasilii Bunakov, Rinchin Damdinov, Alexander Kirilin, Pavel Shvets. **Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it. # (New features) GPU and Spark pipelines Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for: - GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU) - Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA) # Table of Contents * [Installation LightAutoML from PyPI](#installation) * [Quick tour](#quicktour) * [Resources](#examples) * [Contributing to LightAutoML](#contributing) * [License](#apache) * [For developers](#developers) * [Support and feature requests](#support) # Installation To install LAMA framework on your machine from PyPI, execute following commands: ```bash # Install base functionality: pip install -U lightautoml # For partial installation use corresponding option. # Extra dependecies: [nlp, cv, report] # Or you can use 'all' to install everything pip install -U lightautoml[nlp] ``` Additionaly, run following commands to enable pdf report generation: ```bash # MacOS brew install cairo pango gdk-pixbuf libffi # Debian / Ubuntu sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info # Fedora sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2 # Windows # follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows ``` [Back to top](#toc) # Quick tour Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML: * Use ready preset for tabular data: ```python import pandas as pd from sklearn.metrics import f1_score from lightautoml.automl.presets.tabular_presets import TabularAutoML from lightautoml.tasks import Task df_train = pd.read_csv('../input/titanic/train.csv') df_test = pd.read_csv('../input/titanic/test.csv') automl = TabularAutoML( task = Task( name = 'binary', metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1)) ) oof_pred = automl.fit_predict( df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']} ) test_pred = automl.predict(df_test) pd.DataFrame({ 'PassengerId':df_test.PassengerId, 'Survived': (test_pred.data[:, 0] > 0.5)*1 }).to_csv('submit.csv', index = False) ``` LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section. [Back to top](#toc) # Resources ### Kaggle kernel examples of LightAutoML usage: - [Tabular Playground Series April 2021 competition solution](https://www.kaggle.com/alexryzhkov/n3-tps-april-21-lightautoml-starter) - [Titanic competition solution (80% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-titanic-love) - [Titanic **12-code-lines** competition solution (78% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-extreme-short-titanic-solution) - [House prices competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-houseprices-love) - [Natural Language Processing with Disaster Tweets solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-nlp) - [Tabular Playground Series March 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-for-tabulardatamarch) - [Tabular Playground Series February 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-tabulardata-love) - [Interpretable WhiteBox solution](https://www.kaggle.com/simakov/lama-whitebox-preset-example) - [Custom ML pipeline elements inside existing ones](https://www.kaggle.com/simakov/lama-custom-automl-pipeline-example) ### Google Colab tutorials and [other examples](examples/): - [`Tutorial_1_basics.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_1_basics.ipynb) - get started with LightAutoML on tabular data. - [`Tutorial_2_WhiteBox_AutoWoE.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb) - creating interpretable models. - [`Tutorial_3_sql_data_source.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_3_sql_data_source.ipynb) - shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data from SQL data base instead of CSV. - [`Tutorial_4_NLP_Interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_4_NLP_Interpretation.ipynb) - example of using TabularNLPAutoML preset, LimeTextExplainer. - [`Tutorial_5_uplift.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_5_uplift.ipynb) - shows how to use LightAutoML for a uplift-modeling task. - [`Tutorial_6_custom_pipeline.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_6_custom_pipeline.ipynb) - shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc. - [`Tutorial_7_ICE_and_PDP_interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_7_ICE_and_PDP_interpretation.ipynb) - shows how to obtain local and global interpretation of model results using ICE and PDP approaches. - [`Tutorial_8_CV_preset.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_8_CV_preset.ipynb) - example of using TabularCVAutoML preset in CV multi-class classification task. **Note 1**: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default **Note 2**: to take a look at this report after the run, please comment last line of demo with report deletion command. ### Courses, videos and papers * **LightAutoML crash courses**: - (Russian) [AutoML course for OpenDataScience community](https://ods.ai/tracks/automl-course-part1) * **Video guides**: - (Russian) [LightAutoML webinar for Sberloga community](https://www.youtube.com/watch?v=ci8uqgWFJGg) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Dmitry Simakov](https://kaggle.com/simakov)) - (Russian) [LightAutoML hands-on tutorial in Kaggle Kernels](https://www.youtube.com/watch?v=TYu1UG-E9e8) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) - (English) [Automated Machine Learning with LightAutoML: theory and practice](https://www.youtube.com/watch?v=4pbO673B9Oo) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) - (English) [LightAutoML framework general overview, benchmarks and advantages for business](https://vimeo.com/485383651) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) - (English) [LightAutoML practical guide - ML pipeline presets overview](https://vimeo.com/487166940) ([Dmitry Simakov](https://kaggle.com/simakov)) * **Papers**: - Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin ["LightAutoML: AutoML Solution for a Large Financial Services Ecosystem"](https://arxiv.org/pdf/2109.01528.pdf). arXiv:2109.01528, 2021. * **Articles about LightAutoML**: - (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936) - (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g) [Back to top](#toc) # Contributing to LightAutoML If you are interested in contributing to LightAutoML, please read the [Contributing Guide](.github/CONTRIBUTING.md) to get started. [Back to top](#toc) # License This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details. [Back to top](#toc) # For developers ## Build your own custom pipeline: ```python import pandas as pd from sklearn.metrics import f1_score from lightautoml.automl.presets.tabular_presets import TabularAutoML from lightautoml.tasks import Task df_train = pd.read_csv('../input/titanic/train.csv') df_test = pd.read_csv('../input/titanic/test.csv') # define that machine learning problem is binary classification task = Task("binary") reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE) # create a feature selector model0 = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 64, 'seed': 42, 'num_threads': N_THREADS} ) pipe0 = LGBSimpleFeatures() mbie = ModelBasedImportanceEstimator() selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0) # build first level pipeline for AutoML pipe = LGBSimpleFeatures() # stop after 20 iterations or after 30 seconds params_tuner1 = OptunaTuner(n_trials=20, timeout=30) model1 = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 128, 'seed': 1, 'num_threads': N_THREADS} ) model2 = BoostLGBM( default_params={'learning_rate': 0.025, 'num_leaves': 64, 'seed': 2, 'num_threads': N_THREADS} ) pipeline_lvl1 = MLPipeline([ (model1, params_tuner1), model2 ], pre_selection=selector, features_pipeline=pipe, post_selection=None) # build second level pipeline for AutoML pipe1 = LGBSimpleFeatures() model = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 64, 'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS}, freeze_defaults=True ) pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1, post_selection=None) # build AutoML pipeline automl = AutoML(reader, [ [pipeline_lvl1], [pipeline_lvl2], ], skip_conn=False) # train AutoML and get predictions oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']}) test_pred = automl.predict(df_test) pd.DataFrame({ 'PassengerId':df_test.PassengerId, 'Survived': (test_pred.data[:, 0] > 0.5)*1 }).to_csv('submit.csv', index = False) ``` [Back to top](#toc) # Support and feature requests Seek prompt advice at [Telegram group](https://t.me/lightautoml). Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues). %package -n python3-lightautoml Summary: Fast and customizable framework for automatic ML model creation (AutoML) Provides: python-lightautoml BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-lightautoml # LightAutoML - automatic model creation framework [![Telegram](https://img.shields.io/badge/chat-on%20Telegram-2ba2d9.svg)](https://t.me/lightautoml) ![PyPI - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=orange&style=plastic) ![Read the Docs](https://img.shields.io/readthedocs/lightautoml?style=plastic) [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks: - binary classification - multiclass classification - regression Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**. Multitable datasets and sequences are a work in progress :) **Note**: we use [`AutoWoE`](https://pypi.org/project/autowoe) library to automatically create interpretable models. **Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Vasilii Bunakov, Rinchin Damdinov, Alexander Kirilin, Pavel Shvets. **Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it. # (New features) GPU and Spark pipelines Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for: - GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU) - Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA) # Table of Contents * [Installation LightAutoML from PyPI](#installation) * [Quick tour](#quicktour) * [Resources](#examples) * [Contributing to LightAutoML](#contributing) * [License](#apache) * [For developers](#developers) * [Support and feature requests](#support) # Installation To install LAMA framework on your machine from PyPI, execute following commands: ```bash # Install base functionality: pip install -U lightautoml # For partial installation use corresponding option. # Extra dependecies: [nlp, cv, report] # Or you can use 'all' to install everything pip install -U lightautoml[nlp] ``` Additionaly, run following commands to enable pdf report generation: ```bash # MacOS brew install cairo pango gdk-pixbuf libffi # Debian / Ubuntu sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info # Fedora sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2 # Windows # follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows ``` [Back to top](#toc) # Quick tour Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML: * Use ready preset for tabular data: ```python import pandas as pd from sklearn.metrics import f1_score from lightautoml.automl.presets.tabular_presets import TabularAutoML from lightautoml.tasks import Task df_train = pd.read_csv('../input/titanic/train.csv') df_test = pd.read_csv('../input/titanic/test.csv') automl = TabularAutoML( task = Task( name = 'binary', metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1)) ) oof_pred = automl.fit_predict( df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']} ) test_pred = automl.predict(df_test) pd.DataFrame({ 'PassengerId':df_test.PassengerId, 'Survived': (test_pred.data[:, 0] > 0.5)*1 }).to_csv('submit.csv', index = False) ``` LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section. [Back to top](#toc) # Resources ### Kaggle kernel examples of LightAutoML usage: - [Tabular Playground Series April 2021 competition solution](https://www.kaggle.com/alexryzhkov/n3-tps-april-21-lightautoml-starter) - [Titanic competition solution (80% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-titanic-love) - [Titanic **12-code-lines** competition solution (78% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-extreme-short-titanic-solution) - [House prices competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-houseprices-love) - [Natural Language Processing with Disaster Tweets solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-nlp) - [Tabular Playground Series March 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-for-tabulardatamarch) - [Tabular Playground Series February 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-tabulardata-love) - [Interpretable WhiteBox solution](https://www.kaggle.com/simakov/lama-whitebox-preset-example) - [Custom ML pipeline elements inside existing ones](https://www.kaggle.com/simakov/lama-custom-automl-pipeline-example) ### Google Colab tutorials and [other examples](examples/): - [`Tutorial_1_basics.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_1_basics.ipynb) - get started with LightAutoML on tabular data. - [`Tutorial_2_WhiteBox_AutoWoE.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb) - creating interpretable models. - [`Tutorial_3_sql_data_source.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_3_sql_data_source.ipynb) - shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data from SQL data base instead of CSV. - [`Tutorial_4_NLP_Interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_4_NLP_Interpretation.ipynb) - example of using TabularNLPAutoML preset, LimeTextExplainer. - [`Tutorial_5_uplift.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_5_uplift.ipynb) - shows how to use LightAutoML for a uplift-modeling task. - [`Tutorial_6_custom_pipeline.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_6_custom_pipeline.ipynb) - shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc. - [`Tutorial_7_ICE_and_PDP_interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_7_ICE_and_PDP_interpretation.ipynb) - shows how to obtain local and global interpretation of model results using ICE and PDP approaches. - [`Tutorial_8_CV_preset.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_8_CV_preset.ipynb) - example of using TabularCVAutoML preset in CV multi-class classification task. **Note 1**: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default **Note 2**: to take a look at this report after the run, please comment last line of demo with report deletion command. ### Courses, videos and papers * **LightAutoML crash courses**: - (Russian) [AutoML course for OpenDataScience community](https://ods.ai/tracks/automl-course-part1) * **Video guides**: - (Russian) [LightAutoML webinar for Sberloga community](https://www.youtube.com/watch?v=ci8uqgWFJGg) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Dmitry Simakov](https://kaggle.com/simakov)) - (Russian) [LightAutoML hands-on tutorial in Kaggle Kernels](https://www.youtube.com/watch?v=TYu1UG-E9e8) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) - (English) [Automated Machine Learning with LightAutoML: theory and practice](https://www.youtube.com/watch?v=4pbO673B9Oo) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) - (English) [LightAutoML framework general overview, benchmarks and advantages for business](https://vimeo.com/485383651) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) - (English) [LightAutoML practical guide - ML pipeline presets overview](https://vimeo.com/487166940) ([Dmitry Simakov](https://kaggle.com/simakov)) * **Papers**: - Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin ["LightAutoML: AutoML Solution for a Large Financial Services Ecosystem"](https://arxiv.org/pdf/2109.01528.pdf). arXiv:2109.01528, 2021. * **Articles about LightAutoML**: - (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936) - (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g) [Back to top](#toc) # Contributing to LightAutoML If you are interested in contributing to LightAutoML, please read the [Contributing Guide](.github/CONTRIBUTING.md) to get started. [Back to top](#toc) # License This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details. [Back to top](#toc) # For developers ## Build your own custom pipeline: ```python import pandas as pd from sklearn.metrics import f1_score from lightautoml.automl.presets.tabular_presets import TabularAutoML from lightautoml.tasks import Task df_train = pd.read_csv('../input/titanic/train.csv') df_test = pd.read_csv('../input/titanic/test.csv') # define that machine learning problem is binary classification task = Task("binary") reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE) # create a feature selector model0 = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 64, 'seed': 42, 'num_threads': N_THREADS} ) pipe0 = LGBSimpleFeatures() mbie = ModelBasedImportanceEstimator() selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0) # build first level pipeline for AutoML pipe = LGBSimpleFeatures() # stop after 20 iterations or after 30 seconds params_tuner1 = OptunaTuner(n_trials=20, timeout=30) model1 = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 128, 'seed': 1, 'num_threads': N_THREADS} ) model2 = BoostLGBM( default_params={'learning_rate': 0.025, 'num_leaves': 64, 'seed': 2, 'num_threads': N_THREADS} ) pipeline_lvl1 = MLPipeline([ (model1, params_tuner1), model2 ], pre_selection=selector, features_pipeline=pipe, post_selection=None) # build second level pipeline for AutoML pipe1 = LGBSimpleFeatures() model = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 64, 'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS}, freeze_defaults=True ) pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1, post_selection=None) # build AutoML pipeline automl = AutoML(reader, [ [pipeline_lvl1], [pipeline_lvl2], ], skip_conn=False) # train AutoML and get predictions oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']}) test_pred = automl.predict(df_test) pd.DataFrame({ 'PassengerId':df_test.PassengerId, 'Survived': (test_pred.data[:, 0] > 0.5)*1 }).to_csv('submit.csv', index = False) ``` [Back to top](#toc) # Support and feature requests Seek prompt advice at [Telegram group](https://t.me/lightautoml). Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues). %package help Summary: Development documents and examples for lightautoml Provides: python3-lightautoml-doc %description help # LightAutoML - automatic model creation framework [![Telegram](https://img.shields.io/badge/chat-on%20Telegram-2ba2d9.svg)](https://t.me/lightautoml) ![PyPI - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=orange&style=plastic) ![Read the Docs](https://img.shields.io/readthedocs/lightautoml?style=plastic) [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks: - binary classification - multiclass classification - regression Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**. Multitable datasets and sequences are a work in progress :) **Note**: we use [`AutoWoE`](https://pypi.org/project/autowoe) library to automatically create interpretable models. **Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Vasilii Bunakov, Rinchin Damdinov, Alexander Kirilin, Pavel Shvets. **Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it. # (New features) GPU and Spark pipelines Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for: - GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU) - Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA) # Table of Contents * [Installation LightAutoML from PyPI](#installation) * [Quick tour](#quicktour) * [Resources](#examples) * [Contributing to LightAutoML](#contributing) * [License](#apache) * [For developers](#developers) * [Support and feature requests](#support) # Installation To install LAMA framework on your machine from PyPI, execute following commands: ```bash # Install base functionality: pip install -U lightautoml # For partial installation use corresponding option. # Extra dependecies: [nlp, cv, report] # Or you can use 'all' to install everything pip install -U lightautoml[nlp] ``` Additionaly, run following commands to enable pdf report generation: ```bash # MacOS brew install cairo pango gdk-pixbuf libffi # Debian / Ubuntu sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info # Fedora sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2 # Windows # follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows ``` [Back to top](#toc) # Quick tour Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML: * Use ready preset for tabular data: ```python import pandas as pd from sklearn.metrics import f1_score from lightautoml.automl.presets.tabular_presets import TabularAutoML from lightautoml.tasks import Task df_train = pd.read_csv('../input/titanic/train.csv') df_test = pd.read_csv('../input/titanic/test.csv') automl = TabularAutoML( task = Task( name = 'binary', metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1)) ) oof_pred = automl.fit_predict( df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']} ) test_pred = automl.predict(df_test) pd.DataFrame({ 'PassengerId':df_test.PassengerId, 'Survived': (test_pred.data[:, 0] > 0.5)*1 }).to_csv('submit.csv', index = False) ``` LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section. [Back to top](#toc) # Resources ### Kaggle kernel examples of LightAutoML usage: - [Tabular Playground Series April 2021 competition solution](https://www.kaggle.com/alexryzhkov/n3-tps-april-21-lightautoml-starter) - [Titanic competition solution (80% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-titanic-love) - [Titanic **12-code-lines** competition solution (78% accuracy)](https://www.kaggle.com/alexryzhkov/lightautoml-extreme-short-titanic-solution) - [House prices competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-houseprices-love) - [Natural Language Processing with Disaster Tweets solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-nlp) - [Tabular Playground Series March 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-starter-for-tabulardatamarch) - [Tabular Playground Series February 2021 competition solution](https://www.kaggle.com/alexryzhkov/lightautoml-tabulardata-love) - [Interpretable WhiteBox solution](https://www.kaggle.com/simakov/lama-whitebox-preset-example) - [Custom ML pipeline elements inside existing ones](https://www.kaggle.com/simakov/lama-custom-automl-pipeline-example) ### Google Colab tutorials and [other examples](examples/): - [`Tutorial_1_basics.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_1_basics.ipynb) - get started with LightAutoML on tabular data. - [`Tutorial_2_WhiteBox_AutoWoE.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb) - creating interpretable models. - [`Tutorial_3_sql_data_source.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_3_sql_data_source.ipynb) - shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data from SQL data base instead of CSV. - [`Tutorial_4_NLP_Interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_4_NLP_Interpretation.ipynb) - example of using TabularNLPAutoML preset, LimeTextExplainer. - [`Tutorial_5_uplift.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_5_uplift.ipynb) - shows how to use LightAutoML for a uplift-modeling task. - [`Tutorial_6_custom_pipeline.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_6_custom_pipeline.ipynb) - shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc. - [`Tutorial_7_ICE_and_PDP_interpretation.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_7_ICE_and_PDP_interpretation.ipynb) - shows how to obtain local and global interpretation of model results using ICE and PDP approaches. - [`Tutorial_8_CV_preset.ipynb`](https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_8_CV_preset.ipynb) - example of using TabularCVAutoML preset in CV multi-class classification task. **Note 1**: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default **Note 2**: to take a look at this report after the run, please comment last line of demo with report deletion command. ### Courses, videos and papers * **LightAutoML crash courses**: - (Russian) [AutoML course for OpenDataScience community](https://ods.ai/tracks/automl-course-part1) * **Video guides**: - (Russian) [LightAutoML webinar for Sberloga community](https://www.youtube.com/watch?v=ci8uqgWFJGg) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Dmitry Simakov](https://kaggle.com/simakov)) - (Russian) [LightAutoML hands-on tutorial in Kaggle Kernels](https://www.youtube.com/watch?v=TYu1UG-E9e8) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) - (English) [Automated Machine Learning with LightAutoML: theory and practice](https://www.youtube.com/watch?v=4pbO673B9Oo) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) - (English) [LightAutoML framework general overview, benchmarks and advantages for business](https://vimeo.com/485383651) ([Alexander Ryzhkov](https://kaggle.com/alexryzhkov)) - (English) [LightAutoML practical guide - ML pipeline presets overview](https://vimeo.com/487166940) ([Dmitry Simakov](https://kaggle.com/simakov)) * **Papers**: - Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin ["LightAutoML: AutoML Solution for a Large Financial Services Ecosystem"](https://arxiv.org/pdf/2109.01528.pdf). arXiv:2109.01528, 2021. * **Articles about LightAutoML**: - (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936) - (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g) [Back to top](#toc) # Contributing to LightAutoML If you are interested in contributing to LightAutoML, please read the [Contributing Guide](.github/CONTRIBUTING.md) to get started. [Back to top](#toc) # License This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details. [Back to top](#toc) # For developers ## Build your own custom pipeline: ```python import pandas as pd from sklearn.metrics import f1_score from lightautoml.automl.presets.tabular_presets import TabularAutoML from lightautoml.tasks import Task df_train = pd.read_csv('../input/titanic/train.csv') df_test = pd.read_csv('../input/titanic/test.csv') # define that machine learning problem is binary classification task = Task("binary") reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE) # create a feature selector model0 = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 64, 'seed': 42, 'num_threads': N_THREADS} ) pipe0 = LGBSimpleFeatures() mbie = ModelBasedImportanceEstimator() selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0) # build first level pipeline for AutoML pipe = LGBSimpleFeatures() # stop after 20 iterations or after 30 seconds params_tuner1 = OptunaTuner(n_trials=20, timeout=30) model1 = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 128, 'seed': 1, 'num_threads': N_THREADS} ) model2 = BoostLGBM( default_params={'learning_rate': 0.025, 'num_leaves': 64, 'seed': 2, 'num_threads': N_THREADS} ) pipeline_lvl1 = MLPipeline([ (model1, params_tuner1), model2 ], pre_selection=selector, features_pipeline=pipe, post_selection=None) # build second level pipeline for AutoML pipe1 = LGBSimpleFeatures() model = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 64, 'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS}, freeze_defaults=True ) pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1, post_selection=None) # build AutoML pipeline automl = AutoML(reader, [ [pipeline_lvl1], [pipeline_lvl2], ], skip_conn=False) # train AutoML and get predictions oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']}) test_pred = automl.predict(df_test) pd.DataFrame({ 'PassengerId':df_test.PassengerId, 'Survived': (test_pred.data[:, 0] > 0.5)*1 }).to_csv('submit.csv', index = False) ``` [Back to top](#toc) # Support and feature requests Seek prompt advice at [Telegram group](https://t.me/lightautoml). Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues). %prep %autosetup -n lightautoml-0.3.7.3 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-lightautoml -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue May 30 2023 Python_Bot - 0.3.7.3-1 - Package Spec generated