%global _empty_manifest_terminate_build 0
Name:		python-mlprimitives
Version:	0.3.5
Release:	1
Summary:	Pipelines and primitives for machine learning and data science.
License:	MIT license
URL:		https://github.com/MLBazaar/MLPrimitives
Source0:	https://mirrors.aliyun.com/pypi/web/packages/67/87/1ea0faf9e1314f1739a3e61781e411d230850f06532451ac3e2adf0df41c/mlprimitives-0.3.5.tar.gz
BuildArch:	noarch

Requires:	python3-Keras
Requires:	python3-featuretools
Requires:	python3-iso639
Requires:	python3-langdetect
Requires:	python3-lightfm
Requires:	python3-mlblocks
Requires:	python3-networkx
Requires:	python3-nltk
Requires:	python3-numpy
Requires:	python3-opencv-python
Requires:	python3-pandas
Requires:	python3-louvain
Requires:	python3-scikit-image
Requires:	python3-scikit-learn
Requires:	python3-scipy
Requires:	python3-statsmodels
Requires:	python3-tensorflow
Requires:	python3-xgboost
Requires:	python3-protobuf
Requires:	python3-pytest
Requires:	python3-pytest-cov
Requires:	python3-rundoc
Requires:	python3-bumpversion
Requires:	python3-pip
Requires:	python3-watchdog
Requires:	python3-m2r
Requires:	python3-Sphinx
Requires:	python3-sphinx-rtd-theme
Requires:	python3-docutils
Requires:	python3-ipython
Requires:	python3-mistune
Requires:	python3-Jinja2
Requires:	python3-flake8
Requires:	python3-isort
Requires:	python3-autoflake
Requires:	python3-autopep8
Requires:	python3-importlib-metadata
Requires:	python3-twine
Requires:	python3-wheel
Requires:	python3-coverage
Requires:	python3-tox
Requires:	python3-pytest
Requires:	python3-pytest-cov
Requires:	python3-rundoc

%description
<p align="left">
  <a href="https://dai.lids.mit.edu">
    <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" />
  </a>
  <i>An Open Source Project from the <a href="https://dai.lids.mit.edu">Data to AI Lab, at MIT</a></i>
</p>

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[![PyPi Shield](https://img.shields.io/pypi/v/mlprimitives.svg)](https://pypi.python.org/pypi/mlprimitives)
[![Tests](https://github.com/MLBazaar/MLPrimitives/workflows/Run%20Tests/badge.svg)](https://github.com/MLBazaar/MLPrimitives/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster)
[![Downloads](https://pepy.tech/badge/mlprimitives)](https://pepy.tech/project/mlprimitives)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/MLBazaar/MLBlocks/master?filepath=examples/tutorials)

# MLPrimitives

Pipelines and primitives for machine learning and data science.

* Documentation: https://MLBazaar.github.io/MLPrimitives
* Github: https://github.com/MLBazaar/MLPrimitives
* License: [MIT](https://github.com/MLBazaar/MLPrimitives/blob/master/LICENSE)
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)

# Overview

This repository contains primitive annotations to be used by the MLBlocks library, as well as
the necessary Python code to make some of them fully compatible with the MLBlocks API requirements.

There is also a collection of custom primitives contributed directly to this library, which either
combine third party tools or implement new functionalities from scratch.

## Why did we create this library?

* Too many libraries in a fast growing field
* Huge societal need to build machine learning apps
* Domain expertise resides at several places (knowledge of math)
* No documented information about hyperparameters, behavior...

# Installation

## Requirements

**MLPrimitives** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/)

Also, although it is not strictly required, the usage of a
[virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid
interfering with other software installed in the system where **MLPrimitives** is run.

## Install with pip

The easiest and recommended way to install **MLPrimitives** is using [pip](https://pip.pypa.io/en/stable/):

```bash
pip install mlprimitives
```

This will pull and install the latest stable release from [PyPi](https://pypi.org/).

If you want to install from source or contribute to the project please read the
[Contributing Guide](https://MLBazaar.github.io/MLPrimitives/community/welcome.html).

# Quickstart

This section is a short series of tutorials to help you getting started with MLPrimitives.

In the following steps you will learn how to load and run a primitive on some data.

Later on you will learn how to evaluate and improve the performance of a primitive by tuning
its hyperparameters.

## Running a Primitive

In this first tutorial, we will be executing a single primitive for data transformation.

### 1. Load a Primitive

The first step in order to run a primitive is to load it.

This will be done using the `mlprimitives.load_primitive` function, which will
load the indicated primitive as an [MLBlock Object from MLBlocks](https://MLBazaar.github.io/MLBlocks/api/mlblocks.html#mlblocks.MLBlock)

In this case, we will load the `mlprimitives.custom.feature_extraction.CategoricalEncoder`
primitive.

```python3
from mlprimitives import load_primitive

primitive = load_primitive('mlprimitives.custom.feature_extraction.CategoricalEncoder')
```

### 2. Load some data

The CategoricalEncoder is a transformation primitive which applies one-hot encoding to all the
categorical columns of a `pandas.DataFrame`.

So, in order to be able to run our primitive, we will first load some data that contains
categorical columns.

This can be done with the `mlprimitives.datasets.load_census` function:

```python3
from mlprimitives.datasets import load_census

dataset = load_census()
```

This dataset object has an attribute `data` which contains a table with several categorical
columns.

We can have a look at this table by executing `dataset.data.head()`, which will return a
table like this:

```
                             0                    1                   2
age                         39                   50                  38
workclass            State-gov     Self-emp-not-inc             Private
fnlwgt                   77516                83311              215646
education            Bachelors            Bachelors             HS-grad
education-num               13                   13                   9
marital-status   Never-married   Married-civ-spouse            Divorced
occupation        Adm-clerical      Exec-managerial   Handlers-cleaners
relationship     Not-in-family              Husband       Not-in-family
race                     White                White               White
sex                       Male                 Male                Male
capital-gain              2174                    0                   0
capital-loss                 0                    0                   0
hours-per-week              40                   13                  40
native-country   United-States        United-States       United-States
```

### 3. Fit the primitive

In order to run our pipeline, we first need to fit it.

This is the process where it analyzes the data to detect which columns are categorical

This is done by calling its `fit` method and assing the `dataset.data` as `X`.

```python3
primitive.fit(X=dataset.data)
```

### 4. Produce results

Once the pipeline is fit, we can process the data by calling the `produce` method of the
primitive instance and passing agin the `data` as `X`.

```python3
transformed = primitive.produce(X=dataset.data)
```

After this is done, we can see how the transformed data contains the newly generated
one-hot vectors:

```
                                                0      1       2       3       4
age                                            39     50      38      53      28
fnlwgt                                      77516  83311  215646  234721  338409
education-num                                  13     13       9       7      13
capital-gain                                 2174      0       0       0       0
capital-loss                                    0      0       0       0       0
hours-per-week                                 40     13      40      40      40
workclass= Private                              0      0       1       1       1
workclass= Self-emp-not-inc                     0      1       0       0       0
workclass= Local-gov                            0      0       0       0       0
workclass= ?                                    0      0       0       0       0
workclass= State-gov                            1      0       0       0       0
workclass= Self-emp-inc                         0      0       0       0       0
...                                             ...    ...     ...     ...     ...
```

## Tuning a Primitive

In this short tutorial we will teach you how to evaluate the performance of a primitive
and improve its performance by modifying its hyperparameters.

To do so, we will load a primitive that can learn from the transformed data that we just
generated and later on make predictions based on new data.

### 1. Load another primitive

Firs of all, we will load the `xgboost.XGBClassifier` primitive that we will use afterwards.

```python3
primitive = load_primitive('xgboost.XGBClassifier')
```

### 2. Split the dataset

Before being able to evaluate the primitive perfomance, we need to split the data in two
parts: train, which will be used for the primitive to learn, and test, which will be used
to make the predictions that later on will be evaluated.

In order to do this, we will get the first 75% of rows from the transformed data that we
obtained above and call it `X_train`, and then set the next 25% of rows as `X_test`.

```python3
train_size = int(len(transformed) * 0.75)
X_train = transformed.iloc[:train_size]
X_test = transformed.iloc[train_size:]
```

Similarly, we need to obtain the `y_train` and `y_test` variables containing the corresponding
output values.

```python3
y_train = dataset.target[:train_size]
y_test = dataset.target[train_size:]
```

### 3. Fit the new primitive

Once we have have splitted the data, we can fit the primitive by passing `X_train` and `y_train`
to its `fit` method.

```python3
primitive.fit(X=X_train, y=y_train)
```

### 4. Make predictions

Once the primitive has been fitted, we can produce predictions using the `X_test` data as input.

```python3
predictions = primitive.produce(X=X_test)
```

### 5. Evalute the performance

We can now evaluate how good the predictions from our primitive are by using the `score`
method from the `dataset` object on both the expected output and the real output from the
primitive:

```python3
dataset.score(y_test, predictions)
```

This will output a float value between 0 and 1 indicating how good the predicitons are, being
0 the worst score possible and 1 the best one.

In this case we will obtain a score around 0.866

### 6. Set new hyperparameter values

In order to improve the performance of our primitive we will try to modify a couple of its
hyperparameters.

First we will see which hyperparameter values the primitive has by calling its
`get_hyperparameters` method.

```python3
primitive.get_hyperparameters()
```

which will return a dictionary like this:

```python
{
    "n_jobs": -1,
    "n_estimators": 100,
    "max_depth": 3,
    "learning_rate": 0.1,
    "gamma": 0,
    "min_child_weight": 1
}
```

Next, we will see which are the valid values for each one of those hyperparameters by calling its
`get_tunable_hyperparameters` method:

```python3
primitive.get_tunable_hyperparameters()
```

For example, we will see that the `max_depth` hyperparameter has the following specification:

```python
{
    "type": "int",
    "default": 3,
    "range": [
        3,
        10
    ]
}
```

Next, we will choose a valid value, for example 7, and set it into the pipeline using the
`set_hyperparameters` method:

```python3
primitive.set_hyperparameters({'max_depth': 7})
```

### 7. Re-evaluate the performance

Once the new hyperparameter value has been set, we repeat the fit/train/score cycle to
evaluate the performance of this new hyperparameter value:

```python3
primitive.fit(X=X_train, y=y_train)
predictions = primitive.produce(X=X_test)
dataset.score(y_test, predictions)
```

This time we should see that the performance has improved to a value around 0.724

## What's Next?

Do you want to [learn more about how the project](https://MLBazaar.github.io/MLPrimitives/getting_started/concepts.html),
about [how to contribute to it](https://MLBazaar.github.io/MLPrimitives/community/contributing.html)
or browse the [API Reference](https://MLBazaar.github.io/MLPrimitives/api/mlprimitives.html)?
Please check the corresponding sections of the [documentation](https://MLBazaar.github.io/MLPrimitives/)!


# History

## 0.3.5 - 2023-04-14

### General Imporvements

* Update `mlblocks` cap - [Issue #278](https://github.com/MLBazaar/MLPrimitives/issues/278) by @sarahmish

## 0.3.4 - 2023-01-24

### General Imporvements

* Update `mlblocks` cap - [Issue #277](https://github.com/MLBazaar/MLPrimitives/issues/277) by @sarahmish

## 0.3.3 - 2023-01-20

### General Imporvements

* Update dependencies - [Issue #276](https://github.com/MLBazaar/MLPrimitives/issues/276) by @sarahmish

### Adapter Improvements

* Building model within fit in keras adapter- [Issue #267](https://github.com/MLBazaar/MLPrimitives/issues/267) by @sarahmish

## 0.3.2 - 2021-11-09

### Adapter Improvements

* Inferring data shapes with single dimension for keras adapter - [Issue #265](https://github.com/MLBazaar/MLPrimitives/issues/265) by @sarahmish

## 0.3.1 - 2021-10-07

### Adapter Improvements

* Dynamic target_shape in keras adapter - [Issue #263](https://github.com/MLBazaar/MLPrimitives/issues/263) by @sarahmish
* Save keras primitives in Windows environment - [Issue #261](https://github.com/MLBazaar/MLPrimitives/issues/261) by @sarahmish

### General Imporvements

* Update TensorFlow and NumPy dependency - [Issue #259](https://github.com/MLBazaar/MLPrimitives/issues/259) by @sarahmish

## 0.3.0 - 2021-01-09

### New Primitives

* Add primitive `sklearn.naive_bayes.GaussianNB` - [Issue #242](https://github.com/MLBazaar/MLPrimitives/issues/242) by @sarahmish
* Add primitive `sklearn.linear_model.SGDClassifier` - [Issue #241](https://github.com/MLBazaar/MLPrimitives/issues/241) by @sarahmish

### Primitive Improvements

* Add offset to rolling_window_sequence primitive - [Issue #251](https://github.com/MLBazaar/MLPrimitives/issues/251) by @skyeeiskowitz
* Rename the time_index column to time - [Issue #252](https://github.com/MLBazaar/MLPrimitives/issues/252) by @pvk-developer
* Update featuretools dependency - [Issue #250](https://github.com/MLBazaar/MLPrimitives/issues/250) by @pvk-developer

### General Improvements

* Udpate dependencies and add python3.8 - [Issue #246](https://github.com/MLBazaar/MLPrimitives/issues/246) by @csala
* Drop Python35 - [Issue #244](https://github.com/MLBazaar/MLPrimitives/issues/244) by @csala

## 0.2.5 - 2020-07-29

### Primitive Improvements

* Accept timedelta `window_size` in `cutoff_window_sequences` - [Issue #239](https://github.com/MLBazaar/MLPrimitives/issues/239) by @joanvaquer

### Bug Fixes

* ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via `pip install tensorflow` - [Issue #237](https://github.com/MLBazaar/MLPrimitives/issues/237) by @joanvaquer

### New Primitives

* Add `pandas.DataFrame.set_index` primitive - [Issue #222](https://github.com/MLBazaar/MLPrimitives/issues/222) by @JDTheRipperPC

## 0.2.4 - 2020-01-30

### New Primitives

* Add RangeScaler and RangeUnscaler primitives - [Issue #232](https://github.com/MLBazaar/MLPrimitives/issues/232) by @csala

### Primitive Improvements

* Extract input_shape from X in keras.Sequential - [Issue #223](https://github.com/MLBazaar/MLPrimitives/issues/223) by @csala

### Bug Fixes

* mlprimitives.custom.text.TextCleaner fails if text is empty - [Issue #228](https://github.com/MLBazaar/MLPrimitives/issues/228) by @csala
* Error when loading the reviews dataset - [Issue #230](https://github.com/MLBazaar/MLPrimitives/issues/230) by @csala
* Curate dependencies: specify an explicit prompt-toolkit version range - [Issue #224](https://github.com/MLBazaar/MLPrimitives/issues/224) by @csala

## 0.2.3 - 2019-11-14

### New Primitives

* Add primitive to make window_sequences based on cutoff times - [Issue #217](https://github.com/MLBazaar/MLPrimitives/issues/217) by @csala
* Create a keras LSTM based TimeSeriesClassifier primitive - [Issue #218](https://github.com/MLBazaar/MLPrimitives/issues/218) by @csala
* Add pandas DataFrame primitives - [Issue #214](https://github.com/MLBazaar/MLPrimitives/issues/214) by @csala
* Add featuretools.EntitySet.normalize_entity primitive - [Issue #209](https://github.com/MLBazaar/MLPrimitives/issues/209) by @csala

### Primitive Improvements

* Make featuretools.EntitySet.entity_from_dataframe entityset arg optional - [Issue #208](https://github.com/MLBazaar/MLPrimitives/issues/208) by @csala

* Add text regression dataset - [Issue #206](https://github.com/MLBazaar/MLPrimitives/issues/206) by @csala

### Bug Fixes

* pandas.DataFrame.resample crash when grouping by integer columns - [Issue #211](https://github.com/MLBazaar/MLPrimitives/issues/211) by @csala

## 0.2.2 - 2019-10-08

### New Primitives

* Add primitives for GAN based time-series anomaly detection - [Issue #200](https://github.com/MLBazaar/MLPrimitives/issues/200) by @AlexanderGeiger
* Add `numpy.reshape` and `numpy.ravel` primitives - [Issue #197](https://github.com/MLBazaar/MLPrimitives/issues/197) by @AlexanderGeiger
* Add feature selection primitive based on Lasso - [Issue #194](https://github.com/MLBazaar/MLPrimitives/issues/194) by @csala

### Primitive Improvements

* `feature_extraction.CategoricalEncoder` support dtype category - [Issue #196](https://github.com/MLBazaar/MLPrimitives/issues/196) by @csala

## 0.2.1 - 2019-09-09

### New Primitives

* Timeseries Intervals to Mask Primitive - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger
* Add new primitive: Arima model - [Issue #168](https://github.com/MLBazaar/MLPrimitives/issues/168) by @AlexanderGeiger

### Primitive Improvements

* Curate PCA primitive hyperparameters - [Issue #190](https://github.com/MLBazaar/MLPrimitives/issues/190) by @AlexanderGeiger
* Add option to drop rolling window sequences - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger

### Bug Fixes

* scikit-image==0.14.3 crashes when installed on Mac - [Issue #188](https://github.com/MLBazaar/MLPrimitives/issues/188) by @csala

## 0.2.0

### New Features

* Publish the pipelines as an `entry_point`
[Issue #175](https://github.com/MLBazaar/MLPrimitives/issues/175) by @csala

### Primitive Improvements

* Improve pandas.DataFrame.resample primitive [Issue #177](https://github.com/MLBazaar/MLPrimitives/issues/177) by @csala
* Improve `feature_extractor` primitives [Issue #183](https://github.com/MLBazaar/MLPrimitives/issues/183) by @csala
* Improve `find_anomalies` primitive [Issue #180](https://github.com/MLBazaar/MLPrimitives/issues/180) by @AlexanderGeiger

### Bug Fixes

* Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor [Issue #176](https://github.com/MLBazaar/MLPrimitives/issues/176) by @DanielCalvoCerezo


## 0.1.10

### New Features

* Add function to run primitives without a pipeline [Issue #43](https://github.com/MLBazaar/MLPrimitives/issues/43) by @csala

### New Pipelines

* Add pipelines for all the MLBlocks examples [Issue #162](https://github.com/MLBazaar/MLPrimitives/issues/162) by @csala

### Primitive Improvements

* Add Early Stopping to `keras.Sequential.LSTMTimeSeriesRegressor` primitive [Issue #156](https://github.com/MLBazaar/MLPrimitives/issues/156) by @csala
* Make FeatureExtractor primitives accept Numpy arrays [Issue #165](https://github.com/MLBazaar/MLPrimitives/issues/165) by @csala
* Add window size and pruning to the `timeseries_anomalies.find_anomalies` primitive [Issue #160](https://github.com/MLBazaar/MLPrimitives/issues/160) by @csala


## 0.1.9

### New Features

* Add a single table binary classification dataset [Issue #141](https://github.com/MLBazaar/MLPrimitives/issues/141) by @csala

### New Primitives

* Add Multilayer Perceptron (MLP) primitive for binary classification [Issue #140](https://github.com/MLBazaar/MLPrimitives/issues/140) by @Hector-hedb12
* Add primitive for Sequence classification with LSTM [Issue #150](https://github.com/MLBazaar/MLPrimitives/issues/150) by @Hector-hedb12
* Add VGG-like convnet primitive [Issue #149](https://github.com/MLBazaar/MLPrimitives/issues/149) by @Hector-hedb12
* Add Multilayer Perceptron (MLP) primitive for multi-class softmax classification [Issue #139](https://github.com/MLBazaar/MLPrimitives/issues/139) by @Hector-hedb12
* Add primitive to count feature matrix columns [Issue #146](https://github.com/MLBazaar/MLPrimitives/issues/146) by @csala

### Primitive Improvements

* Add additional fit and predict arguments to keras.Sequential [Issue #161](https://github.com/MLBazaar/MLPrimitives/issues/161) by @csala
* Add suport for keras.Sequential Callbacks [Issue #159](https://github.com/MLBazaar/MLPrimitives/issues/159) by @csala
* Add fixed hyperparam to control keras.Sequential verbosity [Issue #143](https://github.com/MLBazaar/MLPrimitives/issues/143) by @csala

## 0.1.8

### New Primitives

* mlprimitives.custom.timeseries_preprocessing.time_segments_average - [Issue #137](https://github.com/MLBazaar/MLPrimitives/issues/137)

### New Features

* Add target_index output in timseries_preprocessing.rolling_window_sequences - [Issue #136](https://github.com/MLBazaar/MLPrimitives/issues/136)

## 0.1.7

### General Improvements

* Validate JSON format in `make lint` -  [Issue #133](https://github.com/MLBazaar/MLPrimitives/issues/133)
* Add demo datasets - [Issue #131](https://github.com/MLBazaar/MLPrimitives/issues/131)
* Improve featuretools.dfs primitive - [Issue #127](https://github.com/MLBazaar/MLPrimitives/issues/127)

### New Primitives

* pandas.DataFrame.resample - [Issue #123](https://github.com/MLBazaar/MLPrimitives/issues/123)
* pandas.DataFrame.unstack - [Issue #124](https://github.com/MLBazaar/MLPrimitives/issues/124)
* featuretools.EntitySet.add_relationship - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126)
* featuretools.EntitySet.entity_from_dataframe - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126)

### Bug Fixes

* Bug in timeseries_anomalies.py - [Issue #119](https://github.com/MLBazaar/MLPrimitives/issues/119)

## 0.1.6

### General Improvements

* Add Contributing Documentation
* Remove upper bound in pandas version given new release of `featuretools` v0.6.1
* Improve LSTMTimeSeriesRegressor hyperparameters

### New Primitives

* mlprimitives.candidates.dsp.SpectralMask
* mlprimitives.custom.timeseries_anomalies.find_anomalies
* mlprimitives.custom.timeseries_anomalies.regression_errors
* mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences
* mlprimitives.custom.timeseries_preprocessing.time_segments_average
* sklearn.linear_model.ElasticNet
* sklearn.linear_model.Lars
* sklearn.linear_model.Lasso
* sklearn.linear_model.MultiTaskLasso
* sklearn.linear_model.Ridge

## 0.1.5

### New Primitives

* sklearn.impute.SimpleImputer
* sklearn.preprocessing.MinMaxScaler
* sklearn.preprocessing.MaxAbsScaler
* sklearn.preprocessing.RobustScaler
* sklearn.linear_model.LinearRegression

### General Improvements

* Separate curated from candidate primitives
* Setup `entry_points` in setup.py to improve compaitibility with MLBlocks
* Add a test-pipelines command to test all the existing pipelines
* Clean sklearn example pipelines
* Change the `author` entry to a `contributors` list
* Change the name of `mlblocks_primitives` folder
* Pip install `requirements_dev.txt` fail documentation

### Bug Fixes

* Fix LSTMTimeSeriesRegressor primitive. Issue #90
* Fix timeseries primitives. Issue #91
* Negative index anomalies in `timeseries_errors`. Issue #89
* Keep pandas version below 0.24.0. Issue #87

## 0.1.4

### New Primitives

* mlprimitives.timeseries primitives for timeseries data preprocessing
* mlprimitives.timeseres_error primitives for timeseries anomaly detection
* keras.Sequential.LSTMTimeSeriesRegressor
* sklearn.neighbors.KNeighbors Classifier and Regressor
* several sklearn.decomposition primitives
* several sklearn.ensemble primitives

### Bug Fixes

* Fix typo in mlprimitives.text.TextCleaner primitive
* Fix bug in index handling in featuretools.dfs primitive
* Fix bug in SingleLayerCNNImageClassifier annotation
* Remove old vlaidation tags from JSON annotations

## 0.1.3

### New Features

* Fix and re-enable featuretools.dfs primitive.

## 0.1.2

### New Features

* Add pipeline specification language and Evaluation utilities.
* Add pipelines for graph, text and tabular problems.
* New primitives ClassEncoder and ClassDecoder
* New primitives UniqueCounter and VocabularyCounter

### Bug Fixes

* Fix TrivialPredictor bug when working with numpy arrays
* Change XGB default learning rate and number of estimators


## 0.1.1

### New Features

* Add more keras.applications primitives.
* Add a Text Cleanup primitive.

### Bug Fixes

* Add keywords to `keras.preprocessing` primtives.
* Fix the `image_transform` method.
* Add `epoch` as a fixed hyperparameter for `keras.Sequential` primitives.

## 0.1.0

* First release on PyPI.


%package -n python3-mlprimitives
Summary:	Pipelines and primitives for machine learning and data science.
Provides:	python-mlprimitives
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-mlprimitives
<p align="left">
  <a href="https://dai.lids.mit.edu">
    <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" />
  </a>
  <i>An Open Source Project from the <a href="https://dai.lids.mit.edu">Data to AI Lab, at MIT</a></i>
</p>

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[![PyPi Shield](https://img.shields.io/pypi/v/mlprimitives.svg)](https://pypi.python.org/pypi/mlprimitives)
[![Tests](https://github.com/MLBazaar/MLPrimitives/workflows/Run%20Tests/badge.svg)](https://github.com/MLBazaar/MLPrimitives/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster)
[![Downloads](https://pepy.tech/badge/mlprimitives)](https://pepy.tech/project/mlprimitives)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/MLBazaar/MLBlocks/master?filepath=examples/tutorials)

# MLPrimitives

Pipelines and primitives for machine learning and data science.

* Documentation: https://MLBazaar.github.io/MLPrimitives
* Github: https://github.com/MLBazaar/MLPrimitives
* License: [MIT](https://github.com/MLBazaar/MLPrimitives/blob/master/LICENSE)
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)

# Overview

This repository contains primitive annotations to be used by the MLBlocks library, as well as
the necessary Python code to make some of them fully compatible with the MLBlocks API requirements.

There is also a collection of custom primitives contributed directly to this library, which either
combine third party tools or implement new functionalities from scratch.

## Why did we create this library?

* Too many libraries in a fast growing field
* Huge societal need to build machine learning apps
* Domain expertise resides at several places (knowledge of math)
* No documented information about hyperparameters, behavior...

# Installation

## Requirements

**MLPrimitives** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/)

Also, although it is not strictly required, the usage of a
[virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid
interfering with other software installed in the system where **MLPrimitives** is run.

## Install with pip

The easiest and recommended way to install **MLPrimitives** is using [pip](https://pip.pypa.io/en/stable/):

```bash
pip install mlprimitives
```

This will pull and install the latest stable release from [PyPi](https://pypi.org/).

If you want to install from source or contribute to the project please read the
[Contributing Guide](https://MLBazaar.github.io/MLPrimitives/community/welcome.html).

# Quickstart

This section is a short series of tutorials to help you getting started with MLPrimitives.

In the following steps you will learn how to load and run a primitive on some data.

Later on you will learn how to evaluate and improve the performance of a primitive by tuning
its hyperparameters.

## Running a Primitive

In this first tutorial, we will be executing a single primitive for data transformation.

### 1. Load a Primitive

The first step in order to run a primitive is to load it.

This will be done using the `mlprimitives.load_primitive` function, which will
load the indicated primitive as an [MLBlock Object from MLBlocks](https://MLBazaar.github.io/MLBlocks/api/mlblocks.html#mlblocks.MLBlock)

In this case, we will load the `mlprimitives.custom.feature_extraction.CategoricalEncoder`
primitive.

```python3
from mlprimitives import load_primitive

primitive = load_primitive('mlprimitives.custom.feature_extraction.CategoricalEncoder')
```

### 2. Load some data

The CategoricalEncoder is a transformation primitive which applies one-hot encoding to all the
categorical columns of a `pandas.DataFrame`.

So, in order to be able to run our primitive, we will first load some data that contains
categorical columns.

This can be done with the `mlprimitives.datasets.load_census` function:

```python3
from mlprimitives.datasets import load_census

dataset = load_census()
```

This dataset object has an attribute `data` which contains a table with several categorical
columns.

We can have a look at this table by executing `dataset.data.head()`, which will return a
table like this:

```
                             0                    1                   2
age                         39                   50                  38
workclass            State-gov     Self-emp-not-inc             Private
fnlwgt                   77516                83311              215646
education            Bachelors            Bachelors             HS-grad
education-num               13                   13                   9
marital-status   Never-married   Married-civ-spouse            Divorced
occupation        Adm-clerical      Exec-managerial   Handlers-cleaners
relationship     Not-in-family              Husband       Not-in-family
race                     White                White               White
sex                       Male                 Male                Male
capital-gain              2174                    0                   0
capital-loss                 0                    0                   0
hours-per-week              40                   13                  40
native-country   United-States        United-States       United-States
```

### 3. Fit the primitive

In order to run our pipeline, we first need to fit it.

This is the process where it analyzes the data to detect which columns are categorical

This is done by calling its `fit` method and assing the `dataset.data` as `X`.

```python3
primitive.fit(X=dataset.data)
```

### 4. Produce results

Once the pipeline is fit, we can process the data by calling the `produce` method of the
primitive instance and passing agin the `data` as `X`.

```python3
transformed = primitive.produce(X=dataset.data)
```

After this is done, we can see how the transformed data contains the newly generated
one-hot vectors:

```
                                                0      1       2       3       4
age                                            39     50      38      53      28
fnlwgt                                      77516  83311  215646  234721  338409
education-num                                  13     13       9       7      13
capital-gain                                 2174      0       0       0       0
capital-loss                                    0      0       0       0       0
hours-per-week                                 40     13      40      40      40
workclass= Private                              0      0       1       1       1
workclass= Self-emp-not-inc                     0      1       0       0       0
workclass= Local-gov                            0      0       0       0       0
workclass= ?                                    0      0       0       0       0
workclass= State-gov                            1      0       0       0       0
workclass= Self-emp-inc                         0      0       0       0       0
...                                             ...    ...     ...     ...     ...
```

## Tuning a Primitive

In this short tutorial we will teach you how to evaluate the performance of a primitive
and improve its performance by modifying its hyperparameters.

To do so, we will load a primitive that can learn from the transformed data that we just
generated and later on make predictions based on new data.

### 1. Load another primitive

Firs of all, we will load the `xgboost.XGBClassifier` primitive that we will use afterwards.

```python3
primitive = load_primitive('xgboost.XGBClassifier')
```

### 2. Split the dataset

Before being able to evaluate the primitive perfomance, we need to split the data in two
parts: train, which will be used for the primitive to learn, and test, which will be used
to make the predictions that later on will be evaluated.

In order to do this, we will get the first 75% of rows from the transformed data that we
obtained above and call it `X_train`, and then set the next 25% of rows as `X_test`.

```python3
train_size = int(len(transformed) * 0.75)
X_train = transformed.iloc[:train_size]
X_test = transformed.iloc[train_size:]
```

Similarly, we need to obtain the `y_train` and `y_test` variables containing the corresponding
output values.

```python3
y_train = dataset.target[:train_size]
y_test = dataset.target[train_size:]
```

### 3. Fit the new primitive

Once we have have splitted the data, we can fit the primitive by passing `X_train` and `y_train`
to its `fit` method.

```python3
primitive.fit(X=X_train, y=y_train)
```

### 4. Make predictions

Once the primitive has been fitted, we can produce predictions using the `X_test` data as input.

```python3
predictions = primitive.produce(X=X_test)
```

### 5. Evalute the performance

We can now evaluate how good the predictions from our primitive are by using the `score`
method from the `dataset` object on both the expected output and the real output from the
primitive:

```python3
dataset.score(y_test, predictions)
```

This will output a float value between 0 and 1 indicating how good the predicitons are, being
0 the worst score possible and 1 the best one.

In this case we will obtain a score around 0.866

### 6. Set new hyperparameter values

In order to improve the performance of our primitive we will try to modify a couple of its
hyperparameters.

First we will see which hyperparameter values the primitive has by calling its
`get_hyperparameters` method.

```python3
primitive.get_hyperparameters()
```

which will return a dictionary like this:

```python
{
    "n_jobs": -1,
    "n_estimators": 100,
    "max_depth": 3,
    "learning_rate": 0.1,
    "gamma": 0,
    "min_child_weight": 1
}
```

Next, we will see which are the valid values for each one of those hyperparameters by calling its
`get_tunable_hyperparameters` method:

```python3
primitive.get_tunable_hyperparameters()
```

For example, we will see that the `max_depth` hyperparameter has the following specification:

```python
{
    "type": "int",
    "default": 3,
    "range": [
        3,
        10
    ]
}
```

Next, we will choose a valid value, for example 7, and set it into the pipeline using the
`set_hyperparameters` method:

```python3
primitive.set_hyperparameters({'max_depth': 7})
```

### 7. Re-evaluate the performance

Once the new hyperparameter value has been set, we repeat the fit/train/score cycle to
evaluate the performance of this new hyperparameter value:

```python3
primitive.fit(X=X_train, y=y_train)
predictions = primitive.produce(X=X_test)
dataset.score(y_test, predictions)
```

This time we should see that the performance has improved to a value around 0.724

## What's Next?

Do you want to [learn more about how the project](https://MLBazaar.github.io/MLPrimitives/getting_started/concepts.html),
about [how to contribute to it](https://MLBazaar.github.io/MLPrimitives/community/contributing.html)
or browse the [API Reference](https://MLBazaar.github.io/MLPrimitives/api/mlprimitives.html)?
Please check the corresponding sections of the [documentation](https://MLBazaar.github.io/MLPrimitives/)!


# History

## 0.3.5 - 2023-04-14

### General Imporvements

* Update `mlblocks` cap - [Issue #278](https://github.com/MLBazaar/MLPrimitives/issues/278) by @sarahmish

## 0.3.4 - 2023-01-24

### General Imporvements

* Update `mlblocks` cap - [Issue #277](https://github.com/MLBazaar/MLPrimitives/issues/277) by @sarahmish

## 0.3.3 - 2023-01-20

### General Imporvements

* Update dependencies - [Issue #276](https://github.com/MLBazaar/MLPrimitives/issues/276) by @sarahmish

### Adapter Improvements

* Building model within fit in keras adapter- [Issue #267](https://github.com/MLBazaar/MLPrimitives/issues/267) by @sarahmish

## 0.3.2 - 2021-11-09

### Adapter Improvements

* Inferring data shapes with single dimension for keras adapter - [Issue #265](https://github.com/MLBazaar/MLPrimitives/issues/265) by @sarahmish

## 0.3.1 - 2021-10-07

### Adapter Improvements

* Dynamic target_shape in keras adapter - [Issue #263](https://github.com/MLBazaar/MLPrimitives/issues/263) by @sarahmish
* Save keras primitives in Windows environment - [Issue #261](https://github.com/MLBazaar/MLPrimitives/issues/261) by @sarahmish

### General Imporvements

* Update TensorFlow and NumPy dependency - [Issue #259](https://github.com/MLBazaar/MLPrimitives/issues/259) by @sarahmish

## 0.3.0 - 2021-01-09

### New Primitives

* Add primitive `sklearn.naive_bayes.GaussianNB` - [Issue #242](https://github.com/MLBazaar/MLPrimitives/issues/242) by @sarahmish
* Add primitive `sklearn.linear_model.SGDClassifier` - [Issue #241](https://github.com/MLBazaar/MLPrimitives/issues/241) by @sarahmish

### Primitive Improvements

* Add offset to rolling_window_sequence primitive - [Issue #251](https://github.com/MLBazaar/MLPrimitives/issues/251) by @skyeeiskowitz
* Rename the time_index column to time - [Issue #252](https://github.com/MLBazaar/MLPrimitives/issues/252) by @pvk-developer
* Update featuretools dependency - [Issue #250](https://github.com/MLBazaar/MLPrimitives/issues/250) by @pvk-developer

### General Improvements

* Udpate dependencies and add python3.8 - [Issue #246](https://github.com/MLBazaar/MLPrimitives/issues/246) by @csala
* Drop Python35 - [Issue #244](https://github.com/MLBazaar/MLPrimitives/issues/244) by @csala

## 0.2.5 - 2020-07-29

### Primitive Improvements

* Accept timedelta `window_size` in `cutoff_window_sequences` - [Issue #239](https://github.com/MLBazaar/MLPrimitives/issues/239) by @joanvaquer

### Bug Fixes

* ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via `pip install tensorflow` - [Issue #237](https://github.com/MLBazaar/MLPrimitives/issues/237) by @joanvaquer

### New Primitives

* Add `pandas.DataFrame.set_index` primitive - [Issue #222](https://github.com/MLBazaar/MLPrimitives/issues/222) by @JDTheRipperPC

## 0.2.4 - 2020-01-30

### New Primitives

* Add RangeScaler and RangeUnscaler primitives - [Issue #232](https://github.com/MLBazaar/MLPrimitives/issues/232) by @csala

### Primitive Improvements

* Extract input_shape from X in keras.Sequential - [Issue #223](https://github.com/MLBazaar/MLPrimitives/issues/223) by @csala

### Bug Fixes

* mlprimitives.custom.text.TextCleaner fails if text is empty - [Issue #228](https://github.com/MLBazaar/MLPrimitives/issues/228) by @csala
* Error when loading the reviews dataset - [Issue #230](https://github.com/MLBazaar/MLPrimitives/issues/230) by @csala
* Curate dependencies: specify an explicit prompt-toolkit version range - [Issue #224](https://github.com/MLBazaar/MLPrimitives/issues/224) by @csala

## 0.2.3 - 2019-11-14

### New Primitives

* Add primitive to make window_sequences based on cutoff times - [Issue #217](https://github.com/MLBazaar/MLPrimitives/issues/217) by @csala
* Create a keras LSTM based TimeSeriesClassifier primitive - [Issue #218](https://github.com/MLBazaar/MLPrimitives/issues/218) by @csala
* Add pandas DataFrame primitives - [Issue #214](https://github.com/MLBazaar/MLPrimitives/issues/214) by @csala
* Add featuretools.EntitySet.normalize_entity primitive - [Issue #209](https://github.com/MLBazaar/MLPrimitives/issues/209) by @csala

### Primitive Improvements

* Make featuretools.EntitySet.entity_from_dataframe entityset arg optional - [Issue #208](https://github.com/MLBazaar/MLPrimitives/issues/208) by @csala

* Add text regression dataset - [Issue #206](https://github.com/MLBazaar/MLPrimitives/issues/206) by @csala

### Bug Fixes

* pandas.DataFrame.resample crash when grouping by integer columns - [Issue #211](https://github.com/MLBazaar/MLPrimitives/issues/211) by @csala

## 0.2.2 - 2019-10-08

### New Primitives

* Add primitives for GAN based time-series anomaly detection - [Issue #200](https://github.com/MLBazaar/MLPrimitives/issues/200) by @AlexanderGeiger
* Add `numpy.reshape` and `numpy.ravel` primitives - [Issue #197](https://github.com/MLBazaar/MLPrimitives/issues/197) by @AlexanderGeiger
* Add feature selection primitive based on Lasso - [Issue #194](https://github.com/MLBazaar/MLPrimitives/issues/194) by @csala

### Primitive Improvements

* `feature_extraction.CategoricalEncoder` support dtype category - [Issue #196](https://github.com/MLBazaar/MLPrimitives/issues/196) by @csala

## 0.2.1 - 2019-09-09

### New Primitives

* Timeseries Intervals to Mask Primitive - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger
* Add new primitive: Arima model - [Issue #168](https://github.com/MLBazaar/MLPrimitives/issues/168) by @AlexanderGeiger

### Primitive Improvements

* Curate PCA primitive hyperparameters - [Issue #190](https://github.com/MLBazaar/MLPrimitives/issues/190) by @AlexanderGeiger
* Add option to drop rolling window sequences - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger

### Bug Fixes

* scikit-image==0.14.3 crashes when installed on Mac - [Issue #188](https://github.com/MLBazaar/MLPrimitives/issues/188) by @csala

## 0.2.0

### New Features

* Publish the pipelines as an `entry_point`
[Issue #175](https://github.com/MLBazaar/MLPrimitives/issues/175) by @csala

### Primitive Improvements

* Improve pandas.DataFrame.resample primitive [Issue #177](https://github.com/MLBazaar/MLPrimitives/issues/177) by @csala
* Improve `feature_extractor` primitives [Issue #183](https://github.com/MLBazaar/MLPrimitives/issues/183) by @csala
* Improve `find_anomalies` primitive [Issue #180](https://github.com/MLBazaar/MLPrimitives/issues/180) by @AlexanderGeiger

### Bug Fixes

* Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor [Issue #176](https://github.com/MLBazaar/MLPrimitives/issues/176) by @DanielCalvoCerezo


## 0.1.10

### New Features

* Add function to run primitives without a pipeline [Issue #43](https://github.com/MLBazaar/MLPrimitives/issues/43) by @csala

### New Pipelines

* Add pipelines for all the MLBlocks examples [Issue #162](https://github.com/MLBazaar/MLPrimitives/issues/162) by @csala

### Primitive Improvements

* Add Early Stopping to `keras.Sequential.LSTMTimeSeriesRegressor` primitive [Issue #156](https://github.com/MLBazaar/MLPrimitives/issues/156) by @csala
* Make FeatureExtractor primitives accept Numpy arrays [Issue #165](https://github.com/MLBazaar/MLPrimitives/issues/165) by @csala
* Add window size and pruning to the `timeseries_anomalies.find_anomalies` primitive [Issue #160](https://github.com/MLBazaar/MLPrimitives/issues/160) by @csala


## 0.1.9

### New Features

* Add a single table binary classification dataset [Issue #141](https://github.com/MLBazaar/MLPrimitives/issues/141) by @csala

### New Primitives

* Add Multilayer Perceptron (MLP) primitive for binary classification [Issue #140](https://github.com/MLBazaar/MLPrimitives/issues/140) by @Hector-hedb12
* Add primitive for Sequence classification with LSTM [Issue #150](https://github.com/MLBazaar/MLPrimitives/issues/150) by @Hector-hedb12
* Add VGG-like convnet primitive [Issue #149](https://github.com/MLBazaar/MLPrimitives/issues/149) by @Hector-hedb12
* Add Multilayer Perceptron (MLP) primitive for multi-class softmax classification [Issue #139](https://github.com/MLBazaar/MLPrimitives/issues/139) by @Hector-hedb12
* Add primitive to count feature matrix columns [Issue #146](https://github.com/MLBazaar/MLPrimitives/issues/146) by @csala

### Primitive Improvements

* Add additional fit and predict arguments to keras.Sequential [Issue #161](https://github.com/MLBazaar/MLPrimitives/issues/161) by @csala
* Add suport for keras.Sequential Callbacks [Issue #159](https://github.com/MLBazaar/MLPrimitives/issues/159) by @csala
* Add fixed hyperparam to control keras.Sequential verbosity [Issue #143](https://github.com/MLBazaar/MLPrimitives/issues/143) by @csala

## 0.1.8

### New Primitives

* mlprimitives.custom.timeseries_preprocessing.time_segments_average - [Issue #137](https://github.com/MLBazaar/MLPrimitives/issues/137)

### New Features

* Add target_index output in timseries_preprocessing.rolling_window_sequences - [Issue #136](https://github.com/MLBazaar/MLPrimitives/issues/136)

## 0.1.7

### General Improvements

* Validate JSON format in `make lint` -  [Issue #133](https://github.com/MLBazaar/MLPrimitives/issues/133)
* Add demo datasets - [Issue #131](https://github.com/MLBazaar/MLPrimitives/issues/131)
* Improve featuretools.dfs primitive - [Issue #127](https://github.com/MLBazaar/MLPrimitives/issues/127)

### New Primitives

* pandas.DataFrame.resample - [Issue #123](https://github.com/MLBazaar/MLPrimitives/issues/123)
* pandas.DataFrame.unstack - [Issue #124](https://github.com/MLBazaar/MLPrimitives/issues/124)
* featuretools.EntitySet.add_relationship - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126)
* featuretools.EntitySet.entity_from_dataframe - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126)

### Bug Fixes

* Bug in timeseries_anomalies.py - [Issue #119](https://github.com/MLBazaar/MLPrimitives/issues/119)

## 0.1.6

### General Improvements

* Add Contributing Documentation
* Remove upper bound in pandas version given new release of `featuretools` v0.6.1
* Improve LSTMTimeSeriesRegressor hyperparameters

### New Primitives

* mlprimitives.candidates.dsp.SpectralMask
* mlprimitives.custom.timeseries_anomalies.find_anomalies
* mlprimitives.custom.timeseries_anomalies.regression_errors
* mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences
* mlprimitives.custom.timeseries_preprocessing.time_segments_average
* sklearn.linear_model.ElasticNet
* sklearn.linear_model.Lars
* sklearn.linear_model.Lasso
* sklearn.linear_model.MultiTaskLasso
* sklearn.linear_model.Ridge

## 0.1.5

### New Primitives

* sklearn.impute.SimpleImputer
* sklearn.preprocessing.MinMaxScaler
* sklearn.preprocessing.MaxAbsScaler
* sklearn.preprocessing.RobustScaler
* sklearn.linear_model.LinearRegression

### General Improvements

* Separate curated from candidate primitives
* Setup `entry_points` in setup.py to improve compaitibility with MLBlocks
* Add a test-pipelines command to test all the existing pipelines
* Clean sklearn example pipelines
* Change the `author` entry to a `contributors` list
* Change the name of `mlblocks_primitives` folder
* Pip install `requirements_dev.txt` fail documentation

### Bug Fixes

* Fix LSTMTimeSeriesRegressor primitive. Issue #90
* Fix timeseries primitives. Issue #91
* Negative index anomalies in `timeseries_errors`. Issue #89
* Keep pandas version below 0.24.0. Issue #87

## 0.1.4

### New Primitives

* mlprimitives.timeseries primitives for timeseries data preprocessing
* mlprimitives.timeseres_error primitives for timeseries anomaly detection
* keras.Sequential.LSTMTimeSeriesRegressor
* sklearn.neighbors.KNeighbors Classifier and Regressor
* several sklearn.decomposition primitives
* several sklearn.ensemble primitives

### Bug Fixes

* Fix typo in mlprimitives.text.TextCleaner primitive
* Fix bug in index handling in featuretools.dfs primitive
* Fix bug in SingleLayerCNNImageClassifier annotation
* Remove old vlaidation tags from JSON annotations

## 0.1.3

### New Features

* Fix and re-enable featuretools.dfs primitive.

## 0.1.2

### New Features

* Add pipeline specification language and Evaluation utilities.
* Add pipelines for graph, text and tabular problems.
* New primitives ClassEncoder and ClassDecoder
* New primitives UniqueCounter and VocabularyCounter

### Bug Fixes

* Fix TrivialPredictor bug when working with numpy arrays
* Change XGB default learning rate and number of estimators


## 0.1.1

### New Features

* Add more keras.applications primitives.
* Add a Text Cleanup primitive.

### Bug Fixes

* Add keywords to `keras.preprocessing` primtives.
* Fix the `image_transform` method.
* Add `epoch` as a fixed hyperparameter for `keras.Sequential` primitives.

## 0.1.0

* First release on PyPI.


%package help
Summary:	Development documents and examples for mlprimitives
Provides:	python3-mlprimitives-doc
%description help
<p align="left">
  <a href="https://dai.lids.mit.edu">
    <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" />
  </a>
  <i>An Open Source Project from the <a href="https://dai.lids.mit.edu">Data to AI Lab, at MIT</a></i>
</p>

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[![PyPi Shield](https://img.shields.io/pypi/v/mlprimitives.svg)](https://pypi.python.org/pypi/mlprimitives)
[![Tests](https://github.com/MLBazaar/MLPrimitives/workflows/Run%20Tests/badge.svg)](https://github.com/MLBazaar/MLPrimitives/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster)
[![Downloads](https://pepy.tech/badge/mlprimitives)](https://pepy.tech/project/mlprimitives)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/MLBazaar/MLBlocks/master?filepath=examples/tutorials)

# MLPrimitives

Pipelines and primitives for machine learning and data science.

* Documentation: https://MLBazaar.github.io/MLPrimitives
* Github: https://github.com/MLBazaar/MLPrimitives
* License: [MIT](https://github.com/MLBazaar/MLPrimitives/blob/master/LICENSE)
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)

# Overview

This repository contains primitive annotations to be used by the MLBlocks library, as well as
the necessary Python code to make some of them fully compatible with the MLBlocks API requirements.

There is also a collection of custom primitives contributed directly to this library, which either
combine third party tools or implement new functionalities from scratch.

## Why did we create this library?

* Too many libraries in a fast growing field
* Huge societal need to build machine learning apps
* Domain expertise resides at several places (knowledge of math)
* No documented information about hyperparameters, behavior...

# Installation

## Requirements

**MLPrimitives** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/)

Also, although it is not strictly required, the usage of a
[virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid
interfering with other software installed in the system where **MLPrimitives** is run.

## Install with pip

The easiest and recommended way to install **MLPrimitives** is using [pip](https://pip.pypa.io/en/stable/):

```bash
pip install mlprimitives
```

This will pull and install the latest stable release from [PyPi](https://pypi.org/).

If you want to install from source or contribute to the project please read the
[Contributing Guide](https://MLBazaar.github.io/MLPrimitives/community/welcome.html).

# Quickstart

This section is a short series of tutorials to help you getting started with MLPrimitives.

In the following steps you will learn how to load and run a primitive on some data.

Later on you will learn how to evaluate and improve the performance of a primitive by tuning
its hyperparameters.

## Running a Primitive

In this first tutorial, we will be executing a single primitive for data transformation.

### 1. Load a Primitive

The first step in order to run a primitive is to load it.

This will be done using the `mlprimitives.load_primitive` function, which will
load the indicated primitive as an [MLBlock Object from MLBlocks](https://MLBazaar.github.io/MLBlocks/api/mlblocks.html#mlblocks.MLBlock)

In this case, we will load the `mlprimitives.custom.feature_extraction.CategoricalEncoder`
primitive.

```python3
from mlprimitives import load_primitive

primitive = load_primitive('mlprimitives.custom.feature_extraction.CategoricalEncoder')
```

### 2. Load some data

The CategoricalEncoder is a transformation primitive which applies one-hot encoding to all the
categorical columns of a `pandas.DataFrame`.

So, in order to be able to run our primitive, we will first load some data that contains
categorical columns.

This can be done with the `mlprimitives.datasets.load_census` function:

```python3
from mlprimitives.datasets import load_census

dataset = load_census()
```

This dataset object has an attribute `data` which contains a table with several categorical
columns.

We can have a look at this table by executing `dataset.data.head()`, which will return a
table like this:

```
                             0                    1                   2
age                         39                   50                  38
workclass            State-gov     Self-emp-not-inc             Private
fnlwgt                   77516                83311              215646
education            Bachelors            Bachelors             HS-grad
education-num               13                   13                   9
marital-status   Never-married   Married-civ-spouse            Divorced
occupation        Adm-clerical      Exec-managerial   Handlers-cleaners
relationship     Not-in-family              Husband       Not-in-family
race                     White                White               White
sex                       Male                 Male                Male
capital-gain              2174                    0                   0
capital-loss                 0                    0                   0
hours-per-week              40                   13                  40
native-country   United-States        United-States       United-States
```

### 3. Fit the primitive

In order to run our pipeline, we first need to fit it.

This is the process where it analyzes the data to detect which columns are categorical

This is done by calling its `fit` method and assing the `dataset.data` as `X`.

```python3
primitive.fit(X=dataset.data)
```

### 4. Produce results

Once the pipeline is fit, we can process the data by calling the `produce` method of the
primitive instance and passing agin the `data` as `X`.

```python3
transformed = primitive.produce(X=dataset.data)
```

After this is done, we can see how the transformed data contains the newly generated
one-hot vectors:

```
                                                0      1       2       3       4
age                                            39     50      38      53      28
fnlwgt                                      77516  83311  215646  234721  338409
education-num                                  13     13       9       7      13
capital-gain                                 2174      0       0       0       0
capital-loss                                    0      0       0       0       0
hours-per-week                                 40     13      40      40      40
workclass= Private                              0      0       1       1       1
workclass= Self-emp-not-inc                     0      1       0       0       0
workclass= Local-gov                            0      0       0       0       0
workclass= ?                                    0      0       0       0       0
workclass= State-gov                            1      0       0       0       0
workclass= Self-emp-inc                         0      0       0       0       0
...                                             ...    ...     ...     ...     ...
```

## Tuning a Primitive

In this short tutorial we will teach you how to evaluate the performance of a primitive
and improve its performance by modifying its hyperparameters.

To do so, we will load a primitive that can learn from the transformed data that we just
generated and later on make predictions based on new data.

### 1. Load another primitive

Firs of all, we will load the `xgboost.XGBClassifier` primitive that we will use afterwards.

```python3
primitive = load_primitive('xgboost.XGBClassifier')
```

### 2. Split the dataset

Before being able to evaluate the primitive perfomance, we need to split the data in two
parts: train, which will be used for the primitive to learn, and test, which will be used
to make the predictions that later on will be evaluated.

In order to do this, we will get the first 75% of rows from the transformed data that we
obtained above and call it `X_train`, and then set the next 25% of rows as `X_test`.

```python3
train_size = int(len(transformed) * 0.75)
X_train = transformed.iloc[:train_size]
X_test = transformed.iloc[train_size:]
```

Similarly, we need to obtain the `y_train` and `y_test` variables containing the corresponding
output values.

```python3
y_train = dataset.target[:train_size]
y_test = dataset.target[train_size:]
```

### 3. Fit the new primitive

Once we have have splitted the data, we can fit the primitive by passing `X_train` and `y_train`
to its `fit` method.

```python3
primitive.fit(X=X_train, y=y_train)
```

### 4. Make predictions

Once the primitive has been fitted, we can produce predictions using the `X_test` data as input.

```python3
predictions = primitive.produce(X=X_test)
```

### 5. Evalute the performance

We can now evaluate how good the predictions from our primitive are by using the `score`
method from the `dataset` object on both the expected output and the real output from the
primitive:

```python3
dataset.score(y_test, predictions)
```

This will output a float value between 0 and 1 indicating how good the predicitons are, being
0 the worst score possible and 1 the best one.

In this case we will obtain a score around 0.866

### 6. Set new hyperparameter values

In order to improve the performance of our primitive we will try to modify a couple of its
hyperparameters.

First we will see which hyperparameter values the primitive has by calling its
`get_hyperparameters` method.

```python3
primitive.get_hyperparameters()
```

which will return a dictionary like this:

```python
{
    "n_jobs": -1,
    "n_estimators": 100,
    "max_depth": 3,
    "learning_rate": 0.1,
    "gamma": 0,
    "min_child_weight": 1
}
```

Next, we will see which are the valid values for each one of those hyperparameters by calling its
`get_tunable_hyperparameters` method:

```python3
primitive.get_tunable_hyperparameters()
```

For example, we will see that the `max_depth` hyperparameter has the following specification:

```python
{
    "type": "int",
    "default": 3,
    "range": [
        3,
        10
    ]
}
```

Next, we will choose a valid value, for example 7, and set it into the pipeline using the
`set_hyperparameters` method:

```python3
primitive.set_hyperparameters({'max_depth': 7})
```

### 7. Re-evaluate the performance

Once the new hyperparameter value has been set, we repeat the fit/train/score cycle to
evaluate the performance of this new hyperparameter value:

```python3
primitive.fit(X=X_train, y=y_train)
predictions = primitive.produce(X=X_test)
dataset.score(y_test, predictions)
```

This time we should see that the performance has improved to a value around 0.724

## What's Next?

Do you want to [learn more about how the project](https://MLBazaar.github.io/MLPrimitives/getting_started/concepts.html),
about [how to contribute to it](https://MLBazaar.github.io/MLPrimitives/community/contributing.html)
or browse the [API Reference](https://MLBazaar.github.io/MLPrimitives/api/mlprimitives.html)?
Please check the corresponding sections of the [documentation](https://MLBazaar.github.io/MLPrimitives/)!


# History

## 0.3.5 - 2023-04-14

### General Imporvements

* Update `mlblocks` cap - [Issue #278](https://github.com/MLBazaar/MLPrimitives/issues/278) by @sarahmish

## 0.3.4 - 2023-01-24

### General Imporvements

* Update `mlblocks` cap - [Issue #277](https://github.com/MLBazaar/MLPrimitives/issues/277) by @sarahmish

## 0.3.3 - 2023-01-20

### General Imporvements

* Update dependencies - [Issue #276](https://github.com/MLBazaar/MLPrimitives/issues/276) by @sarahmish

### Adapter Improvements

* Building model within fit in keras adapter- [Issue #267](https://github.com/MLBazaar/MLPrimitives/issues/267) by @sarahmish

## 0.3.2 - 2021-11-09

### Adapter Improvements

* Inferring data shapes with single dimension for keras adapter - [Issue #265](https://github.com/MLBazaar/MLPrimitives/issues/265) by @sarahmish

## 0.3.1 - 2021-10-07

### Adapter Improvements

* Dynamic target_shape in keras adapter - [Issue #263](https://github.com/MLBazaar/MLPrimitives/issues/263) by @sarahmish
* Save keras primitives in Windows environment - [Issue #261](https://github.com/MLBazaar/MLPrimitives/issues/261) by @sarahmish

### General Imporvements

* Update TensorFlow and NumPy dependency - [Issue #259](https://github.com/MLBazaar/MLPrimitives/issues/259) by @sarahmish

## 0.3.0 - 2021-01-09

### New Primitives

* Add primitive `sklearn.naive_bayes.GaussianNB` - [Issue #242](https://github.com/MLBazaar/MLPrimitives/issues/242) by @sarahmish
* Add primitive `sklearn.linear_model.SGDClassifier` - [Issue #241](https://github.com/MLBazaar/MLPrimitives/issues/241) by @sarahmish

### Primitive Improvements

* Add offset to rolling_window_sequence primitive - [Issue #251](https://github.com/MLBazaar/MLPrimitives/issues/251) by @skyeeiskowitz
* Rename the time_index column to time - [Issue #252](https://github.com/MLBazaar/MLPrimitives/issues/252) by @pvk-developer
* Update featuretools dependency - [Issue #250](https://github.com/MLBazaar/MLPrimitives/issues/250) by @pvk-developer

### General Improvements

* Udpate dependencies and add python3.8 - [Issue #246](https://github.com/MLBazaar/MLPrimitives/issues/246) by @csala
* Drop Python35 - [Issue #244](https://github.com/MLBazaar/MLPrimitives/issues/244) by @csala

## 0.2.5 - 2020-07-29

### Primitive Improvements

* Accept timedelta `window_size` in `cutoff_window_sequences` - [Issue #239](https://github.com/MLBazaar/MLPrimitives/issues/239) by @joanvaquer

### Bug Fixes

* ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via `pip install tensorflow` - [Issue #237](https://github.com/MLBazaar/MLPrimitives/issues/237) by @joanvaquer

### New Primitives

* Add `pandas.DataFrame.set_index` primitive - [Issue #222](https://github.com/MLBazaar/MLPrimitives/issues/222) by @JDTheRipperPC

## 0.2.4 - 2020-01-30

### New Primitives

* Add RangeScaler and RangeUnscaler primitives - [Issue #232](https://github.com/MLBazaar/MLPrimitives/issues/232) by @csala

### Primitive Improvements

* Extract input_shape from X in keras.Sequential - [Issue #223](https://github.com/MLBazaar/MLPrimitives/issues/223) by @csala

### Bug Fixes

* mlprimitives.custom.text.TextCleaner fails if text is empty - [Issue #228](https://github.com/MLBazaar/MLPrimitives/issues/228) by @csala
* Error when loading the reviews dataset - [Issue #230](https://github.com/MLBazaar/MLPrimitives/issues/230) by @csala
* Curate dependencies: specify an explicit prompt-toolkit version range - [Issue #224](https://github.com/MLBazaar/MLPrimitives/issues/224) by @csala

## 0.2.3 - 2019-11-14

### New Primitives

* Add primitive to make window_sequences based on cutoff times - [Issue #217](https://github.com/MLBazaar/MLPrimitives/issues/217) by @csala
* Create a keras LSTM based TimeSeriesClassifier primitive - [Issue #218](https://github.com/MLBazaar/MLPrimitives/issues/218) by @csala
* Add pandas DataFrame primitives - [Issue #214](https://github.com/MLBazaar/MLPrimitives/issues/214) by @csala
* Add featuretools.EntitySet.normalize_entity primitive - [Issue #209](https://github.com/MLBazaar/MLPrimitives/issues/209) by @csala

### Primitive Improvements

* Make featuretools.EntitySet.entity_from_dataframe entityset arg optional - [Issue #208](https://github.com/MLBazaar/MLPrimitives/issues/208) by @csala

* Add text regression dataset - [Issue #206](https://github.com/MLBazaar/MLPrimitives/issues/206) by @csala

### Bug Fixes

* pandas.DataFrame.resample crash when grouping by integer columns - [Issue #211](https://github.com/MLBazaar/MLPrimitives/issues/211) by @csala

## 0.2.2 - 2019-10-08

### New Primitives

* Add primitives for GAN based time-series anomaly detection - [Issue #200](https://github.com/MLBazaar/MLPrimitives/issues/200) by @AlexanderGeiger
* Add `numpy.reshape` and `numpy.ravel` primitives - [Issue #197](https://github.com/MLBazaar/MLPrimitives/issues/197) by @AlexanderGeiger
* Add feature selection primitive based on Lasso - [Issue #194](https://github.com/MLBazaar/MLPrimitives/issues/194) by @csala

### Primitive Improvements

* `feature_extraction.CategoricalEncoder` support dtype category - [Issue #196](https://github.com/MLBazaar/MLPrimitives/issues/196) by @csala

## 0.2.1 - 2019-09-09

### New Primitives

* Timeseries Intervals to Mask Primitive - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger
* Add new primitive: Arima model - [Issue #168](https://github.com/MLBazaar/MLPrimitives/issues/168) by @AlexanderGeiger

### Primitive Improvements

* Curate PCA primitive hyperparameters - [Issue #190](https://github.com/MLBazaar/MLPrimitives/issues/190) by @AlexanderGeiger
* Add option to drop rolling window sequences - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger

### Bug Fixes

* scikit-image==0.14.3 crashes when installed on Mac - [Issue #188](https://github.com/MLBazaar/MLPrimitives/issues/188) by @csala

## 0.2.0

### New Features

* Publish the pipelines as an `entry_point`
[Issue #175](https://github.com/MLBazaar/MLPrimitives/issues/175) by @csala

### Primitive Improvements

* Improve pandas.DataFrame.resample primitive [Issue #177](https://github.com/MLBazaar/MLPrimitives/issues/177) by @csala
* Improve `feature_extractor` primitives [Issue #183](https://github.com/MLBazaar/MLPrimitives/issues/183) by @csala
* Improve `find_anomalies` primitive [Issue #180](https://github.com/MLBazaar/MLPrimitives/issues/180) by @AlexanderGeiger

### Bug Fixes

* Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor [Issue #176](https://github.com/MLBazaar/MLPrimitives/issues/176) by @DanielCalvoCerezo


## 0.1.10

### New Features

* Add function to run primitives without a pipeline [Issue #43](https://github.com/MLBazaar/MLPrimitives/issues/43) by @csala

### New Pipelines

* Add pipelines for all the MLBlocks examples [Issue #162](https://github.com/MLBazaar/MLPrimitives/issues/162) by @csala

### Primitive Improvements

* Add Early Stopping to `keras.Sequential.LSTMTimeSeriesRegressor` primitive [Issue #156](https://github.com/MLBazaar/MLPrimitives/issues/156) by @csala
* Make FeatureExtractor primitives accept Numpy arrays [Issue #165](https://github.com/MLBazaar/MLPrimitives/issues/165) by @csala
* Add window size and pruning to the `timeseries_anomalies.find_anomalies` primitive [Issue #160](https://github.com/MLBazaar/MLPrimitives/issues/160) by @csala


## 0.1.9

### New Features

* Add a single table binary classification dataset [Issue #141](https://github.com/MLBazaar/MLPrimitives/issues/141) by @csala

### New Primitives

* Add Multilayer Perceptron (MLP) primitive for binary classification [Issue #140](https://github.com/MLBazaar/MLPrimitives/issues/140) by @Hector-hedb12
* Add primitive for Sequence classification with LSTM [Issue #150](https://github.com/MLBazaar/MLPrimitives/issues/150) by @Hector-hedb12
* Add VGG-like convnet primitive [Issue #149](https://github.com/MLBazaar/MLPrimitives/issues/149) by @Hector-hedb12
* Add Multilayer Perceptron (MLP) primitive for multi-class softmax classification [Issue #139](https://github.com/MLBazaar/MLPrimitives/issues/139) by @Hector-hedb12
* Add primitive to count feature matrix columns [Issue #146](https://github.com/MLBazaar/MLPrimitives/issues/146) by @csala

### Primitive Improvements

* Add additional fit and predict arguments to keras.Sequential [Issue #161](https://github.com/MLBazaar/MLPrimitives/issues/161) by @csala
* Add suport for keras.Sequential Callbacks [Issue #159](https://github.com/MLBazaar/MLPrimitives/issues/159) by @csala
* Add fixed hyperparam to control keras.Sequential verbosity [Issue #143](https://github.com/MLBazaar/MLPrimitives/issues/143) by @csala

## 0.1.8

### New Primitives

* mlprimitives.custom.timeseries_preprocessing.time_segments_average - [Issue #137](https://github.com/MLBazaar/MLPrimitives/issues/137)

### New Features

* Add target_index output in timseries_preprocessing.rolling_window_sequences - [Issue #136](https://github.com/MLBazaar/MLPrimitives/issues/136)

## 0.1.7

### General Improvements

* Validate JSON format in `make lint` -  [Issue #133](https://github.com/MLBazaar/MLPrimitives/issues/133)
* Add demo datasets - [Issue #131](https://github.com/MLBazaar/MLPrimitives/issues/131)
* Improve featuretools.dfs primitive - [Issue #127](https://github.com/MLBazaar/MLPrimitives/issues/127)

### New Primitives

* pandas.DataFrame.resample - [Issue #123](https://github.com/MLBazaar/MLPrimitives/issues/123)
* pandas.DataFrame.unstack - [Issue #124](https://github.com/MLBazaar/MLPrimitives/issues/124)
* featuretools.EntitySet.add_relationship - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126)
* featuretools.EntitySet.entity_from_dataframe - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126)

### Bug Fixes

* Bug in timeseries_anomalies.py - [Issue #119](https://github.com/MLBazaar/MLPrimitives/issues/119)

## 0.1.6

### General Improvements

* Add Contributing Documentation
* Remove upper bound in pandas version given new release of `featuretools` v0.6.1
* Improve LSTMTimeSeriesRegressor hyperparameters

### New Primitives

* mlprimitives.candidates.dsp.SpectralMask
* mlprimitives.custom.timeseries_anomalies.find_anomalies
* mlprimitives.custom.timeseries_anomalies.regression_errors
* mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences
* mlprimitives.custom.timeseries_preprocessing.time_segments_average
* sklearn.linear_model.ElasticNet
* sklearn.linear_model.Lars
* sklearn.linear_model.Lasso
* sklearn.linear_model.MultiTaskLasso
* sklearn.linear_model.Ridge

## 0.1.5

### New Primitives

* sklearn.impute.SimpleImputer
* sklearn.preprocessing.MinMaxScaler
* sklearn.preprocessing.MaxAbsScaler
* sklearn.preprocessing.RobustScaler
* sklearn.linear_model.LinearRegression

### General Improvements

* Separate curated from candidate primitives
* Setup `entry_points` in setup.py to improve compaitibility with MLBlocks
* Add a test-pipelines command to test all the existing pipelines
* Clean sklearn example pipelines
* Change the `author` entry to a `contributors` list
* Change the name of `mlblocks_primitives` folder
* Pip install `requirements_dev.txt` fail documentation

### Bug Fixes

* Fix LSTMTimeSeriesRegressor primitive. Issue #90
* Fix timeseries primitives. Issue #91
* Negative index anomalies in `timeseries_errors`. Issue #89
* Keep pandas version below 0.24.0. Issue #87

## 0.1.4

### New Primitives

* mlprimitives.timeseries primitives for timeseries data preprocessing
* mlprimitives.timeseres_error primitives for timeseries anomaly detection
* keras.Sequential.LSTMTimeSeriesRegressor
* sklearn.neighbors.KNeighbors Classifier and Regressor
* several sklearn.decomposition primitives
* several sklearn.ensemble primitives

### Bug Fixes

* Fix typo in mlprimitives.text.TextCleaner primitive
* Fix bug in index handling in featuretools.dfs primitive
* Fix bug in SingleLayerCNNImageClassifier annotation
* Remove old vlaidation tags from JSON annotations

## 0.1.3

### New Features

* Fix and re-enable featuretools.dfs primitive.

## 0.1.2

### New Features

* Add pipeline specification language and Evaluation utilities.
* Add pipelines for graph, text and tabular problems.
* New primitives ClassEncoder and ClassDecoder
* New primitives UniqueCounter and VocabularyCounter

### Bug Fixes

* Fix TrivialPredictor bug when working with numpy arrays
* Change XGB default learning rate and number of estimators


## 0.1.1

### New Features

* Add more keras.applications primitives.
* Add a Text Cleanup primitive.

### Bug Fixes

* Add keywords to `keras.preprocessing` primtives.
* Fix the `image_transform` method.
* Add `epoch` as a fixed hyperparameter for `keras.Sequential` primitives.

## 0.1.0

* First release on PyPI.


%prep
%autosetup -n mlprimitives-0.3.5

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-mlprimitives -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Thu Jun 08 2023 Python_Bot <Python_Bot@openeuler.org> - 0.3.5-1
- Package Spec generated