diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-04-11 12:40:16 +0000 |
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-04-11 12:40:16 +0000 |
| commit | a5199dbe07dd85bbb6cdb2429621d582f580ca83 (patch) | |
| tree | c5477ef8852b83c65ae9a5dd458c9d72019c1880 | |
| parent | 149e8d8e0f1db89d0618c9a4c4694de980891459 (diff) | |
automatic import of python-amplo
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-amplo.spec | 589 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 591 insertions, 0 deletions
@@ -0,0 +1 @@ +/Amplo-0.17.0.tar.gz diff --git a/python-amplo.spec b/python-amplo.spec new file mode 100644 index 0000000..76dc57c --- /dev/null +++ b/python-amplo.spec @@ -0,0 +1,589 @@ +%global _empty_manifest_terminate_build 0 +Name: python-Amplo +Version: 0.17.0 +Release: 1 +Summary: Fully automated end to end machine learning pipeline +License: GNU General Public License v3 (GPLv3) +URL: https://github.com/nielsuit227/AutoML +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/4a/40/f178bed9ff3276ccb073ca265efd1672b8901bcb6a16dedd489f8ebf1e84/Amplo-0.17.0.tar.gz +BuildArch: noarch + +Requires: python3-azure-core +Requires: python3-azure-storage-blob +Requires: python3-catboost +Requires: python3-cleanlab +Requires: python3-colorlog +Requires: python3-joblib +Requires: python3-lightgbm +Requires: python3-numba +Requires: python3-numpy +Requires: python3-optuna +Requires: python3-pandas +Requires: python3-polars +Requires: python3-pyarrow +Requires: python3-pytest +Requires: python3-pywavelets +Requires: python3-requests +Requires: python3-scikit-learn +Requires: python3-scipy +Requires: python3-setuptools +Requires: python3-shap +Requires: python3-tqdm +Requires: python3-xgboost +Requires: python3-flake8 +Requires: python3-mypy +Requires: python3-types-chardet +Requires: python3-types-colorama +Requires: python3-types-decorator +Requires: python3-types-psycopg2 +Requires: python3-types-Pygments +Requires: python3-types-PyMySQL +Requires: python3-types-python-dateutil +Requires: python3-types-pytz +Requires: python3-types-redis +Requires: python3-types-requests +Requires: python3-types-setuptools +Requires: python3-types-six +Requires: python3-types-urllib3 + +%description +# Amplo - AutoML (for Machine Data) + +[](https://pypi.python.org/pypi/amplo) +[](https://opensource.org/licenses/MIT) + + + + +Welcome to the Automated Machine Learning package `amplo`. Amplo's AutoML is designed specifically for machine data and +works very well with tabular time series data (especially unbalanced classification!). + +Though this is a standalone Python package, Amplo's AutoML is also available on Amplo's Smart Maintenance Platform. +With a graphical user interface and various data connectors, it is the ideal place for service engineers to get started +on Predictive. + +Amplo's AutoML Pipeline contains the entire Machine Learning development cycle, including exploratory data analysis, +data cleaning, feature extraction, feature selection, model selection, hyperparameter optimization, stacking, +version control, production-ready models and documentation. It comes with additional tools such as interval analysers, +drift detectors, data quality checks, etc. + +## 1. Downloading Amplo + +The easiest way is to install our Python package through [PyPi](https://pypi.org/project/amplo/): + +```bash +pip install amplo +``` + +## 2. Usage + +Usage is very simple with Amplo's AutoML Pipeline. + +```python +from amplo import Pipeline +from sklearn.datasets import make_classification +from sklearn.datasets import make_regression + +x, y = make_classification() +pipeline = Pipeline() +pipeline.fit(x, y) +yp = pipeline.predict_proba(x) + +x, y = make_regression() +pipeline = Pipeline() +pipeline.fit(x, y) +yp = pipeline.predict(x) +``` + +## 3. Amplo AutoML Features + +### Interval Analyser + +```python +from amplo.automl import IntervalAnalyser +``` + +Interval Analyser for Log file classification. When log files have to be classified, and there is not enough +data for time series methods (such as LSTMs, ROCKET or Weasel, Boss, etc.), one needs to fall back to classical +machine learning models which work better with lower samples. This raises the problem of which samples to +classify. You shouldn't just simply classify on every sample and accumulate, that may greatly disrupt +classification performance. Therefore, we introduce this interval analyser. By using an approximate K-Nearest +Neighbors algorithm, one can estimate the strength of correlation for every sample inside a log. Using this +allows for better interval selection for classical machine learning models. + +To use this interval analyser, make sure that your logs are located in a folder of their class, with one parent folder with all classes, e.g.: + +```text ++-- Parent Folder +| +-- Class_1 +| +-- Log_1.* +| +-- Log_2.* +| +-- Class_2 +| +-- Log_3.* +``` + +### Data Processing + +```python +from amplo.automl import DataProcessor +``` + +Automated Data Cleaning: + +- Infers & converts data types (integer, floats, categorical, datetime) +- Reformats column names +- Removes duplicates columns and rows +- Handles missing values by: + - Removing columns + - Removing rows + - Interpolating + - Filling with zero's +- Removes outliers using: + - Clipping + - Z-score + - Quantiles +- Removes constant columns + +### Feature Processing + +```python +from amplo.automl import FeatureProcessor +``` + +Automatically extracts and selects features. Removes Co-Linear Features. +Included Feature Extraction algorithms: + +- Multiplicative Features +- Dividing Features +- Additive Features +- Subtractive Features +- Trigonometric Features +- K-Means Features +- Lagged Features +- Differencing Features +- Inverse Features +- Datetime Features + +Included Feature Selection algorithms: + +- Random Forest Feature Importance (Threshold and Increment) +- Predictive Power Score + +### Sequencing + +```python +from amplo.automl import Sequencer +``` + +For time series regression problems, it is often useful to include multiple previous samples instead of just the latest. +This class sequences the data, based on which time steps you want included in the in- and output. +This is also very useful when working with tensors, as a tensor can be returned which directly fits into a Recurrent Neural Network. + +### Modelling + +```python +from amplo.automl import Modeller +``` + +Runs various regression or classification models. +Includes: + +- Scikit's Linear Model +- Scikit's Random Forest +- Scikit's Bagging +- Scikit's GradientBoosting +- Scikit's HistGradientBoosting +- DMLC's XGBoost +- Catboost's Catboost +- Microsoft's LightGBM +- Stacking Models + +### Grid Search + +```python +from amplo.grid_search import OptunaGridSearch +``` + +Contains three hyperparameter optimizers with extended predefined model parameters: + +- Optuna's Tree-Parzen-Estimator + + +%package -n python3-Amplo +Summary: Fully automated end to end machine learning pipeline +Provides: python-Amplo +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-Amplo +# Amplo - AutoML (for Machine Data) + +[](https://pypi.python.org/pypi/amplo) +[](https://opensource.org/licenses/MIT) + + + + +Welcome to the Automated Machine Learning package `amplo`. Amplo's AutoML is designed specifically for machine data and +works very well with tabular time series data (especially unbalanced classification!). + +Though this is a standalone Python package, Amplo's AutoML is also available on Amplo's Smart Maintenance Platform. +With a graphical user interface and various data connectors, it is the ideal place for service engineers to get started +on Predictive. + +Amplo's AutoML Pipeline contains the entire Machine Learning development cycle, including exploratory data analysis, +data cleaning, feature extraction, feature selection, model selection, hyperparameter optimization, stacking, +version control, production-ready models and documentation. It comes with additional tools such as interval analysers, +drift detectors, data quality checks, etc. + +## 1. Downloading Amplo + +The easiest way is to install our Python package through [PyPi](https://pypi.org/project/amplo/): + +```bash +pip install amplo +``` + +## 2. Usage + +Usage is very simple with Amplo's AutoML Pipeline. + +```python +from amplo import Pipeline +from sklearn.datasets import make_classification +from sklearn.datasets import make_regression + +x, y = make_classification() +pipeline = Pipeline() +pipeline.fit(x, y) +yp = pipeline.predict_proba(x) + +x, y = make_regression() +pipeline = Pipeline() +pipeline.fit(x, y) +yp = pipeline.predict(x) +``` + +## 3. Amplo AutoML Features + +### Interval Analyser + +```python +from amplo.automl import IntervalAnalyser +``` + +Interval Analyser for Log file classification. When log files have to be classified, and there is not enough +data for time series methods (such as LSTMs, ROCKET or Weasel, Boss, etc.), one needs to fall back to classical +machine learning models which work better with lower samples. This raises the problem of which samples to +classify. You shouldn't just simply classify on every sample and accumulate, that may greatly disrupt +classification performance. Therefore, we introduce this interval analyser. By using an approximate K-Nearest +Neighbors algorithm, one can estimate the strength of correlation for every sample inside a log. Using this +allows for better interval selection for classical machine learning models. + +To use this interval analyser, make sure that your logs are located in a folder of their class, with one parent folder with all classes, e.g.: + +```text ++-- Parent Folder +| +-- Class_1 +| +-- Log_1.* +| +-- Log_2.* +| +-- Class_2 +| +-- Log_3.* +``` + +### Data Processing + +```python +from amplo.automl import DataProcessor +``` + +Automated Data Cleaning: + +- Infers & converts data types (integer, floats, categorical, datetime) +- Reformats column names +- Removes duplicates columns and rows +- Handles missing values by: + - Removing columns + - Removing rows + - Interpolating + - Filling with zero's +- Removes outliers using: + - Clipping + - Z-score + - Quantiles +- Removes constant columns + +### Feature Processing + +```python +from amplo.automl import FeatureProcessor +``` + +Automatically extracts and selects features. Removes Co-Linear Features. +Included Feature Extraction algorithms: + +- Multiplicative Features +- Dividing Features +- Additive Features +- Subtractive Features +- Trigonometric Features +- K-Means Features +- Lagged Features +- Differencing Features +- Inverse Features +- Datetime Features + +Included Feature Selection algorithms: + +- Random Forest Feature Importance (Threshold and Increment) +- Predictive Power Score + +### Sequencing + +```python +from amplo.automl import Sequencer +``` + +For time series regression problems, it is often useful to include multiple previous samples instead of just the latest. +This class sequences the data, based on which time steps you want included in the in- and output. +This is also very useful when working with tensors, as a tensor can be returned which directly fits into a Recurrent Neural Network. + +### Modelling + +```python +from amplo.automl import Modeller +``` + +Runs various regression or classification models. +Includes: + +- Scikit's Linear Model +- Scikit's Random Forest +- Scikit's Bagging +- Scikit's GradientBoosting +- Scikit's HistGradientBoosting +- DMLC's XGBoost +- Catboost's Catboost +- Microsoft's LightGBM +- Stacking Models + +### Grid Search + +```python +from amplo.grid_search import OptunaGridSearch +``` + +Contains three hyperparameter optimizers with extended predefined model parameters: + +- Optuna's Tree-Parzen-Estimator + + +%package help +Summary: Development documents and examples for Amplo +Provides: python3-Amplo-doc +%description help +# Amplo - AutoML (for Machine Data) + +[](https://pypi.python.org/pypi/amplo) +[](https://opensource.org/licenses/MIT) + + + + +Welcome to the Automated Machine Learning package `amplo`. Amplo's AutoML is designed specifically for machine data and +works very well with tabular time series data (especially unbalanced classification!). + +Though this is a standalone Python package, Amplo's AutoML is also available on Amplo's Smart Maintenance Platform. +With a graphical user interface and various data connectors, it is the ideal place for service engineers to get started +on Predictive. + +Amplo's AutoML Pipeline contains the entire Machine Learning development cycle, including exploratory data analysis, +data cleaning, feature extraction, feature selection, model selection, hyperparameter optimization, stacking, +version control, production-ready models and documentation. It comes with additional tools such as interval analysers, +drift detectors, data quality checks, etc. + +## 1. Downloading Amplo + +The easiest way is to install our Python package through [PyPi](https://pypi.org/project/amplo/): + +```bash +pip install amplo +``` + +## 2. Usage + +Usage is very simple with Amplo's AutoML Pipeline. + +```python +from amplo import Pipeline +from sklearn.datasets import make_classification +from sklearn.datasets import make_regression + +x, y = make_classification() +pipeline = Pipeline() +pipeline.fit(x, y) +yp = pipeline.predict_proba(x) + +x, y = make_regression() +pipeline = Pipeline() +pipeline.fit(x, y) +yp = pipeline.predict(x) +``` + +## 3. Amplo AutoML Features + +### Interval Analyser + +```python +from amplo.automl import IntervalAnalyser +``` + +Interval Analyser for Log file classification. When log files have to be classified, and there is not enough +data for time series methods (such as LSTMs, ROCKET or Weasel, Boss, etc.), one needs to fall back to classical +machine learning models which work better with lower samples. This raises the problem of which samples to +classify. You shouldn't just simply classify on every sample and accumulate, that may greatly disrupt +classification performance. Therefore, we introduce this interval analyser. By using an approximate K-Nearest +Neighbors algorithm, one can estimate the strength of correlation for every sample inside a log. Using this +allows for better interval selection for classical machine learning models. + +To use this interval analyser, make sure that your logs are located in a folder of their class, with one parent folder with all classes, e.g.: + +```text ++-- Parent Folder +| +-- Class_1 +| +-- Log_1.* +| +-- Log_2.* +| +-- Class_2 +| +-- Log_3.* +``` + +### Data Processing + +```python +from amplo.automl import DataProcessor +``` + +Automated Data Cleaning: + +- Infers & converts data types (integer, floats, categorical, datetime) +- Reformats column names +- Removes duplicates columns and rows +- Handles missing values by: + - Removing columns + - Removing rows + - Interpolating + - Filling with zero's +- Removes outliers using: + - Clipping + - Z-score + - Quantiles +- Removes constant columns + +### Feature Processing + +```python +from amplo.automl import FeatureProcessor +``` + +Automatically extracts and selects features. Removes Co-Linear Features. +Included Feature Extraction algorithms: + +- Multiplicative Features +- Dividing Features +- Additive Features +- Subtractive Features +- Trigonometric Features +- K-Means Features +- Lagged Features +- Differencing Features +- Inverse Features +- Datetime Features + +Included Feature Selection algorithms: + +- Random Forest Feature Importance (Threshold and Increment) +- Predictive Power Score + +### Sequencing + +```python +from amplo.automl import Sequencer +``` + +For time series regression problems, it is often useful to include multiple previous samples instead of just the latest. +This class sequences the data, based on which time steps you want included in the in- and output. +This is also very useful when working with tensors, as a tensor can be returned which directly fits into a Recurrent Neural Network. + +### Modelling + +```python +from amplo.automl import Modeller +``` + +Runs various regression or classification models. +Includes: + +- Scikit's Linear Model +- Scikit's Random Forest +- Scikit's Bagging +- Scikit's GradientBoosting +- Scikit's HistGradientBoosting +- DMLC's XGBoost +- Catboost's Catboost +- Microsoft's LightGBM +- Stacking Models + +### Grid Search + +```python +from amplo.grid_search import OptunaGridSearch +``` + +Contains three hyperparameter optimizers with extended predefined model parameters: + +- Optuna's Tree-Parzen-Estimator + + +%prep +%autosetup -n Amplo-0.17.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-Amplo -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.17.0-1 +- Package Spec generated @@ -0,0 +1 @@ +534ea089ead20cc21926be044afcce44 Amplo-0.17.0.tar.gz |
