diff options
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-mlprimitives.spec | 2027 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 2029 insertions, 0 deletions
@@ -0,0 +1 @@ +/mlprimitives-0.3.5.tar.gz diff --git a/python-mlprimitives.spec b/python-mlprimitives.spec new file mode 100644 index 0000000..759fea6 --- /dev/null +++ b/python-mlprimitives.spec @@ -0,0 +1,2027 @@ +%global _empty_manifest_terminate_build 0 +Name: python-mlprimitives +Version: 0.3.5 +Release: 1 +Summary: Pipelines and primitives for machine learning and data science. +License: MIT license +URL: https://github.com/MLBazaar/MLPrimitives +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/67/87/1ea0faf9e1314f1739a3e61781e411d230850f06532451ac3e2adf0df41c/mlprimitives-0.3.5.tar.gz +BuildArch: noarch + +Requires: python3-Keras +Requires: python3-featuretools +Requires: python3-iso639 +Requires: python3-langdetect +Requires: python3-lightfm +Requires: python3-mlblocks +Requires: python3-networkx +Requires: python3-nltk +Requires: python3-numpy +Requires: python3-opencv-python +Requires: python3-pandas +Requires: python3-louvain +Requires: python3-scikit-image +Requires: python3-scikit-learn +Requires: python3-scipy +Requires: python3-statsmodels +Requires: python3-tensorflow +Requires: python3-xgboost +Requires: python3-protobuf +Requires: python3-pytest +Requires: python3-pytest-cov +Requires: python3-rundoc +Requires: python3-bumpversion +Requires: python3-pip +Requires: python3-watchdog +Requires: python3-m2r +Requires: python3-Sphinx +Requires: python3-sphinx-rtd-theme +Requires: python3-docutils +Requires: python3-ipython +Requires: python3-mistune +Requires: python3-Jinja2 +Requires: python3-flake8 +Requires: python3-isort +Requires: python3-autoflake +Requires: python3-autopep8 +Requires: python3-importlib-metadata +Requires: python3-twine +Requires: python3-wheel +Requires: python3-coverage +Requires: python3-tox +Requires: python3-pytest +Requires: python3-pytest-cov +Requires: python3-rundoc + +%description +<p align="left"> + <a href="https://dai.lids.mit.edu"> + <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" /> + </a> + <i>An Open Source Project from the <a href="https://dai.lids.mit.edu">Data to AI Lab, at MIT</a></i> +</p> + +[](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) +[](https://pypi.python.org/pypi/mlprimitives) +[](https://github.com/MLBazaar/MLPrimitives/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster) +[](https://pepy.tech/project/mlprimitives) +[](https://mybinder.org/v2/gh/MLBazaar/MLBlocks/master?filepath=examples/tutorials) + +# MLPrimitives + +Pipelines and primitives for machine learning and data science. + +* Documentation: https://MLBazaar.github.io/MLPrimitives +* Github: https://github.com/MLBazaar/MLPrimitives +* License: [MIT](https://github.com/MLBazaar/MLPrimitives/blob/master/LICENSE) +* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) + +# Overview + +This repository contains primitive annotations to be used by the MLBlocks library, as well as +the necessary Python code to make some of them fully compatible with the MLBlocks API requirements. + +There is also a collection of custom primitives contributed directly to this library, which either +combine third party tools or implement new functionalities from scratch. + +## Why did we create this library? + +* Too many libraries in a fast growing field +* Huge societal need to build machine learning apps +* Domain expertise resides at several places (knowledge of math) +* No documented information about hyperparameters, behavior... + +# Installation + +## Requirements + +**MLPrimitives** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/) + +Also, although it is not strictly required, the usage of a +[virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid +interfering with other software installed in the system where **MLPrimitives** is run. + +## Install with pip + +The easiest and recommended way to install **MLPrimitives** is using [pip](https://pip.pypa.io/en/stable/): + +```bash +pip install mlprimitives +``` + +This will pull and install the latest stable release from [PyPi](https://pypi.org/). + +If you want to install from source or contribute to the project please read the +[Contributing Guide](https://MLBazaar.github.io/MLPrimitives/community/welcome.html). + +# Quickstart + +This section is a short series of tutorials to help you getting started with MLPrimitives. + +In the following steps you will learn how to load and run a primitive on some data. + +Later on you will learn how to evaluate and improve the performance of a primitive by tuning +its hyperparameters. + +## Running a Primitive + +In this first tutorial, we will be executing a single primitive for data transformation. + +### 1. Load a Primitive + +The first step in order to run a primitive is to load it. + +This will be done using the `mlprimitives.load_primitive` function, which will +load the indicated primitive as an [MLBlock Object from MLBlocks](https://MLBazaar.github.io/MLBlocks/api/mlblocks.html#mlblocks.MLBlock) + +In this case, we will load the `mlprimitives.custom.feature_extraction.CategoricalEncoder` +primitive. + +```python3 +from mlprimitives import load_primitive + +primitive = load_primitive('mlprimitives.custom.feature_extraction.CategoricalEncoder') +``` + +### 2. Load some data + +The CategoricalEncoder is a transformation primitive which applies one-hot encoding to all the +categorical columns of a `pandas.DataFrame`. + +So, in order to be able to run our primitive, we will first load some data that contains +categorical columns. + +This can be done with the `mlprimitives.datasets.load_census` function: + +```python3 +from mlprimitives.datasets import load_census + +dataset = load_census() +``` + +This dataset object has an attribute `data` which contains a table with several categorical +columns. + +We can have a look at this table by executing `dataset.data.head()`, which will return a +table like this: + +``` + 0 1 2 +age 39 50 38 +workclass State-gov Self-emp-not-inc Private +fnlwgt 77516 83311 215646 +education Bachelors Bachelors HS-grad +education-num 13 13 9 +marital-status Never-married Married-civ-spouse Divorced +occupation Adm-clerical Exec-managerial Handlers-cleaners +relationship Not-in-family Husband Not-in-family +race White White White +sex Male Male Male +capital-gain 2174 0 0 +capital-loss 0 0 0 +hours-per-week 40 13 40 +native-country United-States United-States United-States +``` + +### 3. Fit the primitive + +In order to run our pipeline, we first need to fit it. + +This is the process where it analyzes the data to detect which columns are categorical + +This is done by calling its `fit` method and assing the `dataset.data` as `X`. + +```python3 +primitive.fit(X=dataset.data) +``` + +### 4. Produce results + +Once the pipeline is fit, we can process the data by calling the `produce` method of the +primitive instance and passing agin the `data` as `X`. + +```python3 +transformed = primitive.produce(X=dataset.data) +``` + +After this is done, we can see how the transformed data contains the newly generated +one-hot vectors: + +``` + 0 1 2 3 4 +age 39 50 38 53 28 +fnlwgt 77516 83311 215646 234721 338409 +education-num 13 13 9 7 13 +capital-gain 2174 0 0 0 0 +capital-loss 0 0 0 0 0 +hours-per-week 40 13 40 40 40 +workclass= Private 0 0 1 1 1 +workclass= Self-emp-not-inc 0 1 0 0 0 +workclass= Local-gov 0 0 0 0 0 +workclass= ? 0 0 0 0 0 +workclass= State-gov 1 0 0 0 0 +workclass= Self-emp-inc 0 0 0 0 0 +... ... ... ... ... ... +``` + +## Tuning a Primitive + +In this short tutorial we will teach you how to evaluate the performance of a primitive +and improve its performance by modifying its hyperparameters. + +To do so, we will load a primitive that can learn from the transformed data that we just +generated and later on make predictions based on new data. + +### 1. Load another primitive + +Firs of all, we will load the `xgboost.XGBClassifier` primitive that we will use afterwards. + +```python3 +primitive = load_primitive('xgboost.XGBClassifier') +``` + +### 2. Split the dataset + +Before being able to evaluate the primitive perfomance, we need to split the data in two +parts: train, which will be used for the primitive to learn, and test, which will be used +to make the predictions that later on will be evaluated. + +In order to do this, we will get the first 75% of rows from the transformed data that we +obtained above and call it `X_train`, and then set the next 25% of rows as `X_test`. + +```python3 +train_size = int(len(transformed) * 0.75) +X_train = transformed.iloc[:train_size] +X_test = transformed.iloc[train_size:] +``` + +Similarly, we need to obtain the `y_train` and `y_test` variables containing the corresponding +output values. + +```python3 +y_train = dataset.target[:train_size] +y_test = dataset.target[train_size:] +``` + +### 3. Fit the new primitive + +Once we have have splitted the data, we can fit the primitive by passing `X_train` and `y_train` +to its `fit` method. + +```python3 +primitive.fit(X=X_train, y=y_train) +``` + +### 4. Make predictions + +Once the primitive has been fitted, we can produce predictions using the `X_test` data as input. + +```python3 +predictions = primitive.produce(X=X_test) +``` + +### 5. Evalute the performance + +We can now evaluate how good the predictions from our primitive are by using the `score` +method from the `dataset` object on both the expected output and the real output from the +primitive: + +```python3 +dataset.score(y_test, predictions) +``` + +This will output a float value between 0 and 1 indicating how good the predicitons are, being +0 the worst score possible and 1 the best one. + +In this case we will obtain a score around 0.866 + +### 6. Set new hyperparameter values + +In order to improve the performance of our primitive we will try to modify a couple of its +hyperparameters. + +First we will see which hyperparameter values the primitive has by calling its +`get_hyperparameters` method. + +```python3 +primitive.get_hyperparameters() +``` + +which will return a dictionary like this: + +```python +{ + "n_jobs": -1, + "n_estimators": 100, + "max_depth": 3, + "learning_rate": 0.1, + "gamma": 0, + "min_child_weight": 1 +} +``` + +Next, we will see which are the valid values for each one of those hyperparameters by calling its +`get_tunable_hyperparameters` method: + +```python3 +primitive.get_tunable_hyperparameters() +``` + +For example, we will see that the `max_depth` hyperparameter has the following specification: + +```python +{ + "type": "int", + "default": 3, + "range": [ + 3, + 10 + ] +} +``` + +Next, we will choose a valid value, for example 7, and set it into the pipeline using the +`set_hyperparameters` method: + +```python3 +primitive.set_hyperparameters({'max_depth': 7}) +``` + +### 7. Re-evaluate the performance + +Once the new hyperparameter value has been set, we repeat the fit/train/score cycle to +evaluate the performance of this new hyperparameter value: + +```python3 +primitive.fit(X=X_train, y=y_train) +predictions = primitive.produce(X=X_test) +dataset.score(y_test, predictions) +``` + +This time we should see that the performance has improved to a value around 0.724 + +## What's Next? + +Do you want to [learn more about how the project](https://MLBazaar.github.io/MLPrimitives/getting_started/concepts.html), +about [how to contribute to it](https://MLBazaar.github.io/MLPrimitives/community/contributing.html) +or browse the [API Reference](https://MLBazaar.github.io/MLPrimitives/api/mlprimitives.html)? +Please check the corresponding sections of the [documentation](https://MLBazaar.github.io/MLPrimitives/)! + + +# History + +## 0.3.5 - 2023-04-14 + +### General Imporvements + +* Update `mlblocks` cap - [Issue #278](https://github.com/MLBazaar/MLPrimitives/issues/278) by @sarahmish + +## 0.3.4 - 2023-01-24 + +### General Imporvements + +* Update `mlblocks` cap - [Issue #277](https://github.com/MLBazaar/MLPrimitives/issues/277) by @sarahmish + +## 0.3.3 - 2023-01-20 + +### General Imporvements + +* Update dependencies - [Issue #276](https://github.com/MLBazaar/MLPrimitives/issues/276) by @sarahmish + +### Adapter Improvements + +* Building model within fit in keras adapter- [Issue #267](https://github.com/MLBazaar/MLPrimitives/issues/267) by @sarahmish + +## 0.3.2 - 2021-11-09 + +### Adapter Improvements + +* Inferring data shapes with single dimension for keras adapter - [Issue #265](https://github.com/MLBazaar/MLPrimitives/issues/265) by @sarahmish + +## 0.3.1 - 2021-10-07 + +### Adapter Improvements + +* Dynamic target_shape in keras adapter - [Issue #263](https://github.com/MLBazaar/MLPrimitives/issues/263) by @sarahmish +* Save keras primitives in Windows environment - [Issue #261](https://github.com/MLBazaar/MLPrimitives/issues/261) by @sarahmish + +### General Imporvements + +* Update TensorFlow and NumPy dependency - [Issue #259](https://github.com/MLBazaar/MLPrimitives/issues/259) by @sarahmish + +## 0.3.0 - 2021-01-09 + +### New Primitives + +* Add primitive `sklearn.naive_bayes.GaussianNB` - [Issue #242](https://github.com/MLBazaar/MLPrimitives/issues/242) by @sarahmish +* Add primitive `sklearn.linear_model.SGDClassifier` - [Issue #241](https://github.com/MLBazaar/MLPrimitives/issues/241) by @sarahmish + +### Primitive Improvements + +* Add offset to rolling_window_sequence primitive - [Issue #251](https://github.com/MLBazaar/MLPrimitives/issues/251) by @skyeeiskowitz +* Rename the time_index column to time - [Issue #252](https://github.com/MLBazaar/MLPrimitives/issues/252) by @pvk-developer +* Update featuretools dependency - [Issue #250](https://github.com/MLBazaar/MLPrimitives/issues/250) by @pvk-developer + +### General Improvements + +* Udpate dependencies and add python3.8 - [Issue #246](https://github.com/MLBazaar/MLPrimitives/issues/246) by @csala +* Drop Python35 - [Issue #244](https://github.com/MLBazaar/MLPrimitives/issues/244) by @csala + +## 0.2.5 - 2020-07-29 + +### Primitive Improvements + +* Accept timedelta `window_size` in `cutoff_window_sequences` - [Issue #239](https://github.com/MLBazaar/MLPrimitives/issues/239) by @joanvaquer + +### Bug Fixes + +* ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via `pip install tensorflow` - [Issue #237](https://github.com/MLBazaar/MLPrimitives/issues/237) by @joanvaquer + +### New Primitives + +* Add `pandas.DataFrame.set_index` primitive - [Issue #222](https://github.com/MLBazaar/MLPrimitives/issues/222) by @JDTheRipperPC + +## 0.2.4 - 2020-01-30 + +### New Primitives + +* Add RangeScaler and RangeUnscaler primitives - [Issue #232](https://github.com/MLBazaar/MLPrimitives/issues/232) by @csala + +### Primitive Improvements + +* Extract input_shape from X in keras.Sequential - [Issue #223](https://github.com/MLBazaar/MLPrimitives/issues/223) by @csala + +### Bug Fixes + +* mlprimitives.custom.text.TextCleaner fails if text is empty - [Issue #228](https://github.com/MLBazaar/MLPrimitives/issues/228) by @csala +* Error when loading the reviews dataset - [Issue #230](https://github.com/MLBazaar/MLPrimitives/issues/230) by @csala +* Curate dependencies: specify an explicit prompt-toolkit version range - [Issue #224](https://github.com/MLBazaar/MLPrimitives/issues/224) by @csala + +## 0.2.3 - 2019-11-14 + +### New Primitives + +* Add primitive to make window_sequences based on cutoff times - [Issue #217](https://github.com/MLBazaar/MLPrimitives/issues/217) by @csala +* Create a keras LSTM based TimeSeriesClassifier primitive - [Issue #218](https://github.com/MLBazaar/MLPrimitives/issues/218) by @csala +* Add pandas DataFrame primitives - [Issue #214](https://github.com/MLBazaar/MLPrimitives/issues/214) by @csala +* Add featuretools.EntitySet.normalize_entity primitive - [Issue #209](https://github.com/MLBazaar/MLPrimitives/issues/209) by @csala + +### Primitive Improvements + +* Make featuretools.EntitySet.entity_from_dataframe entityset arg optional - [Issue #208](https://github.com/MLBazaar/MLPrimitives/issues/208) by @csala + +* Add text regression dataset - [Issue #206](https://github.com/MLBazaar/MLPrimitives/issues/206) by @csala + +### Bug Fixes + +* pandas.DataFrame.resample crash when grouping by integer columns - [Issue #211](https://github.com/MLBazaar/MLPrimitives/issues/211) by @csala + +## 0.2.2 - 2019-10-08 + +### New Primitives + +* Add primitives for GAN based time-series anomaly detection - [Issue #200](https://github.com/MLBazaar/MLPrimitives/issues/200) by @AlexanderGeiger +* Add `numpy.reshape` and `numpy.ravel` primitives - [Issue #197](https://github.com/MLBazaar/MLPrimitives/issues/197) by @AlexanderGeiger +* Add feature selection primitive based on Lasso - [Issue #194](https://github.com/MLBazaar/MLPrimitives/issues/194) by @csala + +### Primitive Improvements + +* `feature_extraction.CategoricalEncoder` support dtype category - [Issue #196](https://github.com/MLBazaar/MLPrimitives/issues/196) by @csala + +## 0.2.1 - 2019-09-09 + +### New Primitives + +* Timeseries Intervals to Mask Primitive - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger +* Add new primitive: Arima model - [Issue #168](https://github.com/MLBazaar/MLPrimitives/issues/168) by @AlexanderGeiger + +### Primitive Improvements + +* Curate PCA primitive hyperparameters - [Issue #190](https://github.com/MLBazaar/MLPrimitives/issues/190) by @AlexanderGeiger +* Add option to drop rolling window sequences - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger + +### Bug Fixes + +* scikit-image==0.14.3 crashes when installed on Mac - [Issue #188](https://github.com/MLBazaar/MLPrimitives/issues/188) by @csala + +## 0.2.0 + +### New Features + +* Publish the pipelines as an `entry_point` +[Issue #175](https://github.com/MLBazaar/MLPrimitives/issues/175) by @csala + +### Primitive Improvements + +* Improve pandas.DataFrame.resample primitive [Issue #177](https://github.com/MLBazaar/MLPrimitives/issues/177) by @csala +* Improve `feature_extractor` primitives [Issue #183](https://github.com/MLBazaar/MLPrimitives/issues/183) by @csala +* Improve `find_anomalies` primitive [Issue #180](https://github.com/MLBazaar/MLPrimitives/issues/180) by @AlexanderGeiger + +### Bug Fixes + +* Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor [Issue #176](https://github.com/MLBazaar/MLPrimitives/issues/176) by @DanielCalvoCerezo + + +## 0.1.10 + +### New Features + +* Add function to run primitives without a pipeline [Issue #43](https://github.com/MLBazaar/MLPrimitives/issues/43) by @csala + +### New Pipelines + +* Add pipelines for all the MLBlocks examples [Issue #162](https://github.com/MLBazaar/MLPrimitives/issues/162) by @csala + +### Primitive Improvements + +* Add Early Stopping to `keras.Sequential.LSTMTimeSeriesRegressor` primitive [Issue #156](https://github.com/MLBazaar/MLPrimitives/issues/156) by @csala +* Make FeatureExtractor primitives accept Numpy arrays [Issue #165](https://github.com/MLBazaar/MLPrimitives/issues/165) by @csala +* Add window size and pruning to the `timeseries_anomalies.find_anomalies` primitive [Issue #160](https://github.com/MLBazaar/MLPrimitives/issues/160) by @csala + + +## 0.1.9 + +### New Features + +* Add a single table binary classification dataset [Issue #141](https://github.com/MLBazaar/MLPrimitives/issues/141) by @csala + +### New Primitives + +* Add Multilayer Perceptron (MLP) primitive for binary classification [Issue #140](https://github.com/MLBazaar/MLPrimitives/issues/140) by @Hector-hedb12 +* Add primitive for Sequence classification with LSTM [Issue #150](https://github.com/MLBazaar/MLPrimitives/issues/150) by @Hector-hedb12 +* Add VGG-like convnet primitive [Issue #149](https://github.com/MLBazaar/MLPrimitives/issues/149) by @Hector-hedb12 +* Add Multilayer Perceptron (MLP) primitive for multi-class softmax classification [Issue #139](https://github.com/MLBazaar/MLPrimitives/issues/139) by @Hector-hedb12 +* Add primitive to count feature matrix columns [Issue #146](https://github.com/MLBazaar/MLPrimitives/issues/146) by @csala + +### Primitive Improvements + +* Add additional fit and predict arguments to keras.Sequential [Issue #161](https://github.com/MLBazaar/MLPrimitives/issues/161) by @csala +* Add suport for keras.Sequential Callbacks [Issue #159](https://github.com/MLBazaar/MLPrimitives/issues/159) by @csala +* Add fixed hyperparam to control keras.Sequential verbosity [Issue #143](https://github.com/MLBazaar/MLPrimitives/issues/143) by @csala + +## 0.1.8 + +### New Primitives + +* mlprimitives.custom.timeseries_preprocessing.time_segments_average - [Issue #137](https://github.com/MLBazaar/MLPrimitives/issues/137) + +### New Features + +* Add target_index output in timseries_preprocessing.rolling_window_sequences - [Issue #136](https://github.com/MLBazaar/MLPrimitives/issues/136) + +## 0.1.7 + +### General Improvements + +* Validate JSON format in `make lint` - [Issue #133](https://github.com/MLBazaar/MLPrimitives/issues/133) +* Add demo datasets - [Issue #131](https://github.com/MLBazaar/MLPrimitives/issues/131) +* Improve featuretools.dfs primitive - [Issue #127](https://github.com/MLBazaar/MLPrimitives/issues/127) + +### New Primitives + +* pandas.DataFrame.resample - [Issue #123](https://github.com/MLBazaar/MLPrimitives/issues/123) +* pandas.DataFrame.unstack - [Issue #124](https://github.com/MLBazaar/MLPrimitives/issues/124) +* featuretools.EntitySet.add_relationship - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126) +* featuretools.EntitySet.entity_from_dataframe - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126) + +### Bug Fixes + +* Bug in timeseries_anomalies.py - [Issue #119](https://github.com/MLBazaar/MLPrimitives/issues/119) + +## 0.1.6 + +### General Improvements + +* Add Contributing Documentation +* Remove upper bound in pandas version given new release of `featuretools` v0.6.1 +* Improve LSTMTimeSeriesRegressor hyperparameters + +### New Primitives + +* mlprimitives.candidates.dsp.SpectralMask +* mlprimitives.custom.timeseries_anomalies.find_anomalies +* mlprimitives.custom.timeseries_anomalies.regression_errors +* mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences +* mlprimitives.custom.timeseries_preprocessing.time_segments_average +* sklearn.linear_model.ElasticNet +* sklearn.linear_model.Lars +* sklearn.linear_model.Lasso +* sklearn.linear_model.MultiTaskLasso +* sklearn.linear_model.Ridge + +## 0.1.5 + +### New Primitives + +* sklearn.impute.SimpleImputer +* sklearn.preprocessing.MinMaxScaler +* sklearn.preprocessing.MaxAbsScaler +* sklearn.preprocessing.RobustScaler +* sklearn.linear_model.LinearRegression + +### General Improvements + +* Separate curated from candidate primitives +* Setup `entry_points` in setup.py to improve compaitibility with MLBlocks +* Add a test-pipelines command to test all the existing pipelines +* Clean sklearn example pipelines +* Change the `author` entry to a `contributors` list +* Change the name of `mlblocks_primitives` folder +* Pip install `requirements_dev.txt` fail documentation + +### Bug Fixes + +* Fix LSTMTimeSeriesRegressor primitive. Issue #90 +* Fix timeseries primitives. Issue #91 +* Negative index anomalies in `timeseries_errors`. Issue #89 +* Keep pandas version below 0.24.0. Issue #87 + +## 0.1.4 + +### New Primitives + +* mlprimitives.timeseries primitives for timeseries data preprocessing +* mlprimitives.timeseres_error primitives for timeseries anomaly detection +* keras.Sequential.LSTMTimeSeriesRegressor +* sklearn.neighbors.KNeighbors Classifier and Regressor +* several sklearn.decomposition primitives +* several sklearn.ensemble primitives + +### Bug Fixes + +* Fix typo in mlprimitives.text.TextCleaner primitive +* Fix bug in index handling in featuretools.dfs primitive +* Fix bug in SingleLayerCNNImageClassifier annotation +* Remove old vlaidation tags from JSON annotations + +## 0.1.3 + +### New Features + +* Fix and re-enable featuretools.dfs primitive. + +## 0.1.2 + +### New Features + +* Add pipeline specification language and Evaluation utilities. +* Add pipelines for graph, text and tabular problems. +* New primitives ClassEncoder and ClassDecoder +* New primitives UniqueCounter and VocabularyCounter + +### Bug Fixes + +* Fix TrivialPredictor bug when working with numpy arrays +* Change XGB default learning rate and number of estimators + + +## 0.1.1 + +### New Features + +* Add more keras.applications primitives. +* Add a Text Cleanup primitive. + +### Bug Fixes + +* Add keywords to `keras.preprocessing` primtives. +* Fix the `image_transform` method. +* Add `epoch` as a fixed hyperparameter for `keras.Sequential` primitives. + +## 0.1.0 + +* First release on PyPI. + + +%package -n python3-mlprimitives +Summary: Pipelines and primitives for machine learning and data science. +Provides: python-mlprimitives +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-mlprimitives +<p align="left"> + <a href="https://dai.lids.mit.edu"> + <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" /> + </a> + <i>An Open Source Project from the <a href="https://dai.lids.mit.edu">Data to AI Lab, at MIT</a></i> +</p> + +[](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) +[](https://pypi.python.org/pypi/mlprimitives) +[](https://github.com/MLBazaar/MLPrimitives/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster) +[](https://pepy.tech/project/mlprimitives) +[](https://mybinder.org/v2/gh/MLBazaar/MLBlocks/master?filepath=examples/tutorials) + +# MLPrimitives + +Pipelines and primitives for machine learning and data science. + +* Documentation: https://MLBazaar.github.io/MLPrimitives +* Github: https://github.com/MLBazaar/MLPrimitives +* License: [MIT](https://github.com/MLBazaar/MLPrimitives/blob/master/LICENSE) +* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) + +# Overview + +This repository contains primitive annotations to be used by the MLBlocks library, as well as +the necessary Python code to make some of them fully compatible with the MLBlocks API requirements. + +There is also a collection of custom primitives contributed directly to this library, which either +combine third party tools or implement new functionalities from scratch. + +## Why did we create this library? + +* Too many libraries in a fast growing field +* Huge societal need to build machine learning apps +* Domain expertise resides at several places (knowledge of math) +* No documented information about hyperparameters, behavior... + +# Installation + +## Requirements + +**MLPrimitives** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/) + +Also, although it is not strictly required, the usage of a +[virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid +interfering with other software installed in the system where **MLPrimitives** is run. + +## Install with pip + +The easiest and recommended way to install **MLPrimitives** is using [pip](https://pip.pypa.io/en/stable/): + +```bash +pip install mlprimitives +``` + +This will pull and install the latest stable release from [PyPi](https://pypi.org/). + +If you want to install from source or contribute to the project please read the +[Contributing Guide](https://MLBazaar.github.io/MLPrimitives/community/welcome.html). + +# Quickstart + +This section is a short series of tutorials to help you getting started with MLPrimitives. + +In the following steps you will learn how to load and run a primitive on some data. + +Later on you will learn how to evaluate and improve the performance of a primitive by tuning +its hyperparameters. + +## Running a Primitive + +In this first tutorial, we will be executing a single primitive for data transformation. + +### 1. Load a Primitive + +The first step in order to run a primitive is to load it. + +This will be done using the `mlprimitives.load_primitive` function, which will +load the indicated primitive as an [MLBlock Object from MLBlocks](https://MLBazaar.github.io/MLBlocks/api/mlblocks.html#mlblocks.MLBlock) + +In this case, we will load the `mlprimitives.custom.feature_extraction.CategoricalEncoder` +primitive. + +```python3 +from mlprimitives import load_primitive + +primitive = load_primitive('mlprimitives.custom.feature_extraction.CategoricalEncoder') +``` + +### 2. Load some data + +The CategoricalEncoder is a transformation primitive which applies one-hot encoding to all the +categorical columns of a `pandas.DataFrame`. + +So, in order to be able to run our primitive, we will first load some data that contains +categorical columns. + +This can be done with the `mlprimitives.datasets.load_census` function: + +```python3 +from mlprimitives.datasets import load_census + +dataset = load_census() +``` + +This dataset object has an attribute `data` which contains a table with several categorical +columns. + +We can have a look at this table by executing `dataset.data.head()`, which will return a +table like this: + +``` + 0 1 2 +age 39 50 38 +workclass State-gov Self-emp-not-inc Private +fnlwgt 77516 83311 215646 +education Bachelors Bachelors HS-grad +education-num 13 13 9 +marital-status Never-married Married-civ-spouse Divorced +occupation Adm-clerical Exec-managerial Handlers-cleaners +relationship Not-in-family Husband Not-in-family +race White White White +sex Male Male Male +capital-gain 2174 0 0 +capital-loss 0 0 0 +hours-per-week 40 13 40 +native-country United-States United-States United-States +``` + +### 3. Fit the primitive + +In order to run our pipeline, we first need to fit it. + +This is the process where it analyzes the data to detect which columns are categorical + +This is done by calling its `fit` method and assing the `dataset.data` as `X`. + +```python3 +primitive.fit(X=dataset.data) +``` + +### 4. Produce results + +Once the pipeline is fit, we can process the data by calling the `produce` method of the +primitive instance and passing agin the `data` as `X`. + +```python3 +transformed = primitive.produce(X=dataset.data) +``` + +After this is done, we can see how the transformed data contains the newly generated +one-hot vectors: + +``` + 0 1 2 3 4 +age 39 50 38 53 28 +fnlwgt 77516 83311 215646 234721 338409 +education-num 13 13 9 7 13 +capital-gain 2174 0 0 0 0 +capital-loss 0 0 0 0 0 +hours-per-week 40 13 40 40 40 +workclass= Private 0 0 1 1 1 +workclass= Self-emp-not-inc 0 1 0 0 0 +workclass= Local-gov 0 0 0 0 0 +workclass= ? 0 0 0 0 0 +workclass= State-gov 1 0 0 0 0 +workclass= Self-emp-inc 0 0 0 0 0 +... ... ... ... ... ... +``` + +## Tuning a Primitive + +In this short tutorial we will teach you how to evaluate the performance of a primitive +and improve its performance by modifying its hyperparameters. + +To do so, we will load a primitive that can learn from the transformed data that we just +generated and later on make predictions based on new data. + +### 1. Load another primitive + +Firs of all, we will load the `xgboost.XGBClassifier` primitive that we will use afterwards. + +```python3 +primitive = load_primitive('xgboost.XGBClassifier') +``` + +### 2. Split the dataset + +Before being able to evaluate the primitive perfomance, we need to split the data in two +parts: train, which will be used for the primitive to learn, and test, which will be used +to make the predictions that later on will be evaluated. + +In order to do this, we will get the first 75% of rows from the transformed data that we +obtained above and call it `X_train`, and then set the next 25% of rows as `X_test`. + +```python3 +train_size = int(len(transformed) * 0.75) +X_train = transformed.iloc[:train_size] +X_test = transformed.iloc[train_size:] +``` + +Similarly, we need to obtain the `y_train` and `y_test` variables containing the corresponding +output values. + +```python3 +y_train = dataset.target[:train_size] +y_test = dataset.target[train_size:] +``` + +### 3. Fit the new primitive + +Once we have have splitted the data, we can fit the primitive by passing `X_train` and `y_train` +to its `fit` method. + +```python3 +primitive.fit(X=X_train, y=y_train) +``` + +### 4. Make predictions + +Once the primitive has been fitted, we can produce predictions using the `X_test` data as input. + +```python3 +predictions = primitive.produce(X=X_test) +``` + +### 5. Evalute the performance + +We can now evaluate how good the predictions from our primitive are by using the `score` +method from the `dataset` object on both the expected output and the real output from the +primitive: + +```python3 +dataset.score(y_test, predictions) +``` + +This will output a float value between 0 and 1 indicating how good the predicitons are, being +0 the worst score possible and 1 the best one. + +In this case we will obtain a score around 0.866 + +### 6. Set new hyperparameter values + +In order to improve the performance of our primitive we will try to modify a couple of its +hyperparameters. + +First we will see which hyperparameter values the primitive has by calling its +`get_hyperparameters` method. + +```python3 +primitive.get_hyperparameters() +``` + +which will return a dictionary like this: + +```python +{ + "n_jobs": -1, + "n_estimators": 100, + "max_depth": 3, + "learning_rate": 0.1, + "gamma": 0, + "min_child_weight": 1 +} +``` + +Next, we will see which are the valid values for each one of those hyperparameters by calling its +`get_tunable_hyperparameters` method: + +```python3 +primitive.get_tunable_hyperparameters() +``` + +For example, we will see that the `max_depth` hyperparameter has the following specification: + +```python +{ + "type": "int", + "default": 3, + "range": [ + 3, + 10 + ] +} +``` + +Next, we will choose a valid value, for example 7, and set it into the pipeline using the +`set_hyperparameters` method: + +```python3 +primitive.set_hyperparameters({'max_depth': 7}) +``` + +### 7. Re-evaluate the performance + +Once the new hyperparameter value has been set, we repeat the fit/train/score cycle to +evaluate the performance of this new hyperparameter value: + +```python3 +primitive.fit(X=X_train, y=y_train) +predictions = primitive.produce(X=X_test) +dataset.score(y_test, predictions) +``` + +This time we should see that the performance has improved to a value around 0.724 + +## What's Next? + +Do you want to [learn more about how the project](https://MLBazaar.github.io/MLPrimitives/getting_started/concepts.html), +about [how to contribute to it](https://MLBazaar.github.io/MLPrimitives/community/contributing.html) +or browse the [API Reference](https://MLBazaar.github.io/MLPrimitives/api/mlprimitives.html)? +Please check the corresponding sections of the [documentation](https://MLBazaar.github.io/MLPrimitives/)! + + +# History + +## 0.3.5 - 2023-04-14 + +### General Imporvements + +* Update `mlblocks` cap - [Issue #278](https://github.com/MLBazaar/MLPrimitives/issues/278) by @sarahmish + +## 0.3.4 - 2023-01-24 + +### General Imporvements + +* Update `mlblocks` cap - [Issue #277](https://github.com/MLBazaar/MLPrimitives/issues/277) by @sarahmish + +## 0.3.3 - 2023-01-20 + +### General Imporvements + +* Update dependencies - [Issue #276](https://github.com/MLBazaar/MLPrimitives/issues/276) by @sarahmish + +### Adapter Improvements + +* Building model within fit in keras adapter- [Issue #267](https://github.com/MLBazaar/MLPrimitives/issues/267) by @sarahmish + +## 0.3.2 - 2021-11-09 + +### Adapter Improvements + +* Inferring data shapes with single dimension for keras adapter - [Issue #265](https://github.com/MLBazaar/MLPrimitives/issues/265) by @sarahmish + +## 0.3.1 - 2021-10-07 + +### Adapter Improvements + +* Dynamic target_shape in keras adapter - [Issue #263](https://github.com/MLBazaar/MLPrimitives/issues/263) by @sarahmish +* Save keras primitives in Windows environment - [Issue #261](https://github.com/MLBazaar/MLPrimitives/issues/261) by @sarahmish + +### General Imporvements + +* Update TensorFlow and NumPy dependency - [Issue #259](https://github.com/MLBazaar/MLPrimitives/issues/259) by @sarahmish + +## 0.3.0 - 2021-01-09 + +### New Primitives + +* Add primitive `sklearn.naive_bayes.GaussianNB` - [Issue #242](https://github.com/MLBazaar/MLPrimitives/issues/242) by @sarahmish +* Add primitive `sklearn.linear_model.SGDClassifier` - [Issue #241](https://github.com/MLBazaar/MLPrimitives/issues/241) by @sarahmish + +### Primitive Improvements + +* Add offset to rolling_window_sequence primitive - [Issue #251](https://github.com/MLBazaar/MLPrimitives/issues/251) by @skyeeiskowitz +* Rename the time_index column to time - [Issue #252](https://github.com/MLBazaar/MLPrimitives/issues/252) by @pvk-developer +* Update featuretools dependency - [Issue #250](https://github.com/MLBazaar/MLPrimitives/issues/250) by @pvk-developer + +### General Improvements + +* Udpate dependencies and add python3.8 - [Issue #246](https://github.com/MLBazaar/MLPrimitives/issues/246) by @csala +* Drop Python35 - [Issue #244](https://github.com/MLBazaar/MLPrimitives/issues/244) by @csala + +## 0.2.5 - 2020-07-29 + +### Primitive Improvements + +* Accept timedelta `window_size` in `cutoff_window_sequences` - [Issue #239](https://github.com/MLBazaar/MLPrimitives/issues/239) by @joanvaquer + +### Bug Fixes + +* ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via `pip install tensorflow` - [Issue #237](https://github.com/MLBazaar/MLPrimitives/issues/237) by @joanvaquer + +### New Primitives + +* Add `pandas.DataFrame.set_index` primitive - [Issue #222](https://github.com/MLBazaar/MLPrimitives/issues/222) by @JDTheRipperPC + +## 0.2.4 - 2020-01-30 + +### New Primitives + +* Add RangeScaler and RangeUnscaler primitives - [Issue #232](https://github.com/MLBazaar/MLPrimitives/issues/232) by @csala + +### Primitive Improvements + +* Extract input_shape from X in keras.Sequential - [Issue #223](https://github.com/MLBazaar/MLPrimitives/issues/223) by @csala + +### Bug Fixes + +* mlprimitives.custom.text.TextCleaner fails if text is empty - [Issue #228](https://github.com/MLBazaar/MLPrimitives/issues/228) by @csala +* Error when loading the reviews dataset - [Issue #230](https://github.com/MLBazaar/MLPrimitives/issues/230) by @csala +* Curate dependencies: specify an explicit prompt-toolkit version range - [Issue #224](https://github.com/MLBazaar/MLPrimitives/issues/224) by @csala + +## 0.2.3 - 2019-11-14 + +### New Primitives + +* Add primitive to make window_sequences based on cutoff times - [Issue #217](https://github.com/MLBazaar/MLPrimitives/issues/217) by @csala +* Create a keras LSTM based TimeSeriesClassifier primitive - [Issue #218](https://github.com/MLBazaar/MLPrimitives/issues/218) by @csala +* Add pandas DataFrame primitives - [Issue #214](https://github.com/MLBazaar/MLPrimitives/issues/214) by @csala +* Add featuretools.EntitySet.normalize_entity primitive - [Issue #209](https://github.com/MLBazaar/MLPrimitives/issues/209) by @csala + +### Primitive Improvements + +* Make featuretools.EntitySet.entity_from_dataframe entityset arg optional - [Issue #208](https://github.com/MLBazaar/MLPrimitives/issues/208) by @csala + +* Add text regression dataset - [Issue #206](https://github.com/MLBazaar/MLPrimitives/issues/206) by @csala + +### Bug Fixes + +* pandas.DataFrame.resample crash when grouping by integer columns - [Issue #211](https://github.com/MLBazaar/MLPrimitives/issues/211) by @csala + +## 0.2.2 - 2019-10-08 + +### New Primitives + +* Add primitives for GAN based time-series anomaly detection - [Issue #200](https://github.com/MLBazaar/MLPrimitives/issues/200) by @AlexanderGeiger +* Add `numpy.reshape` and `numpy.ravel` primitives - [Issue #197](https://github.com/MLBazaar/MLPrimitives/issues/197) by @AlexanderGeiger +* Add feature selection primitive based on Lasso - [Issue #194](https://github.com/MLBazaar/MLPrimitives/issues/194) by @csala + +### Primitive Improvements + +* `feature_extraction.CategoricalEncoder` support dtype category - [Issue #196](https://github.com/MLBazaar/MLPrimitives/issues/196) by @csala + +## 0.2.1 - 2019-09-09 + +### New Primitives + +* Timeseries Intervals to Mask Primitive - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger +* Add new primitive: Arima model - [Issue #168](https://github.com/MLBazaar/MLPrimitives/issues/168) by @AlexanderGeiger + +### Primitive Improvements + +* Curate PCA primitive hyperparameters - [Issue #190](https://github.com/MLBazaar/MLPrimitives/issues/190) by @AlexanderGeiger +* Add option to drop rolling window sequences - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger + +### Bug Fixes + +* scikit-image==0.14.3 crashes when installed on Mac - [Issue #188](https://github.com/MLBazaar/MLPrimitives/issues/188) by @csala + +## 0.2.0 + +### New Features + +* Publish the pipelines as an `entry_point` +[Issue #175](https://github.com/MLBazaar/MLPrimitives/issues/175) by @csala + +### Primitive Improvements + +* Improve pandas.DataFrame.resample primitive [Issue #177](https://github.com/MLBazaar/MLPrimitives/issues/177) by @csala +* Improve `feature_extractor` primitives [Issue #183](https://github.com/MLBazaar/MLPrimitives/issues/183) by @csala +* Improve `find_anomalies` primitive [Issue #180](https://github.com/MLBazaar/MLPrimitives/issues/180) by @AlexanderGeiger + +### Bug Fixes + +* Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor [Issue #176](https://github.com/MLBazaar/MLPrimitives/issues/176) by @DanielCalvoCerezo + + +## 0.1.10 + +### New Features + +* Add function to run primitives without a pipeline [Issue #43](https://github.com/MLBazaar/MLPrimitives/issues/43) by @csala + +### New Pipelines + +* Add pipelines for all the MLBlocks examples [Issue #162](https://github.com/MLBazaar/MLPrimitives/issues/162) by @csala + +### Primitive Improvements + +* Add Early Stopping to `keras.Sequential.LSTMTimeSeriesRegressor` primitive [Issue #156](https://github.com/MLBazaar/MLPrimitives/issues/156) by @csala +* Make FeatureExtractor primitives accept Numpy arrays [Issue #165](https://github.com/MLBazaar/MLPrimitives/issues/165) by @csala +* Add window size and pruning to the `timeseries_anomalies.find_anomalies` primitive [Issue #160](https://github.com/MLBazaar/MLPrimitives/issues/160) by @csala + + +## 0.1.9 + +### New Features + +* Add a single table binary classification dataset [Issue #141](https://github.com/MLBazaar/MLPrimitives/issues/141) by @csala + +### New Primitives + +* Add Multilayer Perceptron (MLP) primitive for binary classification [Issue #140](https://github.com/MLBazaar/MLPrimitives/issues/140) by @Hector-hedb12 +* Add primitive for Sequence classification with LSTM [Issue #150](https://github.com/MLBazaar/MLPrimitives/issues/150) by @Hector-hedb12 +* Add VGG-like convnet primitive [Issue #149](https://github.com/MLBazaar/MLPrimitives/issues/149) by @Hector-hedb12 +* Add Multilayer Perceptron (MLP) primitive for multi-class softmax classification [Issue #139](https://github.com/MLBazaar/MLPrimitives/issues/139) by @Hector-hedb12 +* Add primitive to count feature matrix columns [Issue #146](https://github.com/MLBazaar/MLPrimitives/issues/146) by @csala + +### Primitive Improvements + +* Add additional fit and predict arguments to keras.Sequential [Issue #161](https://github.com/MLBazaar/MLPrimitives/issues/161) by @csala +* Add suport for keras.Sequential Callbacks [Issue #159](https://github.com/MLBazaar/MLPrimitives/issues/159) by @csala +* Add fixed hyperparam to control keras.Sequential verbosity [Issue #143](https://github.com/MLBazaar/MLPrimitives/issues/143) by @csala + +## 0.1.8 + +### New Primitives + +* mlprimitives.custom.timeseries_preprocessing.time_segments_average - [Issue #137](https://github.com/MLBazaar/MLPrimitives/issues/137) + +### New Features + +* Add target_index output in timseries_preprocessing.rolling_window_sequences - [Issue #136](https://github.com/MLBazaar/MLPrimitives/issues/136) + +## 0.1.7 + +### General Improvements + +* Validate JSON format in `make lint` - [Issue #133](https://github.com/MLBazaar/MLPrimitives/issues/133) +* Add demo datasets - [Issue #131](https://github.com/MLBazaar/MLPrimitives/issues/131) +* Improve featuretools.dfs primitive - [Issue #127](https://github.com/MLBazaar/MLPrimitives/issues/127) + +### New Primitives + +* pandas.DataFrame.resample - [Issue #123](https://github.com/MLBazaar/MLPrimitives/issues/123) +* pandas.DataFrame.unstack - [Issue #124](https://github.com/MLBazaar/MLPrimitives/issues/124) +* featuretools.EntitySet.add_relationship - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126) +* featuretools.EntitySet.entity_from_dataframe - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126) + +### Bug Fixes + +* Bug in timeseries_anomalies.py - [Issue #119](https://github.com/MLBazaar/MLPrimitives/issues/119) + +## 0.1.6 + +### General Improvements + +* Add Contributing Documentation +* Remove upper bound in pandas version given new release of `featuretools` v0.6.1 +* Improve LSTMTimeSeriesRegressor hyperparameters + +### New Primitives + +* mlprimitives.candidates.dsp.SpectralMask +* mlprimitives.custom.timeseries_anomalies.find_anomalies +* mlprimitives.custom.timeseries_anomalies.regression_errors +* mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences +* mlprimitives.custom.timeseries_preprocessing.time_segments_average +* sklearn.linear_model.ElasticNet +* sklearn.linear_model.Lars +* sklearn.linear_model.Lasso +* sklearn.linear_model.MultiTaskLasso +* sklearn.linear_model.Ridge + +## 0.1.5 + +### New Primitives + +* sklearn.impute.SimpleImputer +* sklearn.preprocessing.MinMaxScaler +* sklearn.preprocessing.MaxAbsScaler +* sklearn.preprocessing.RobustScaler +* sklearn.linear_model.LinearRegression + +### General Improvements + +* Separate curated from candidate primitives +* Setup `entry_points` in setup.py to improve compaitibility with MLBlocks +* Add a test-pipelines command to test all the existing pipelines +* Clean sklearn example pipelines +* Change the `author` entry to a `contributors` list +* Change the name of `mlblocks_primitives` folder +* Pip install `requirements_dev.txt` fail documentation + +### Bug Fixes + +* Fix LSTMTimeSeriesRegressor primitive. Issue #90 +* Fix timeseries primitives. Issue #91 +* Negative index anomalies in `timeseries_errors`. Issue #89 +* Keep pandas version below 0.24.0. Issue #87 + +## 0.1.4 + +### New Primitives + +* mlprimitives.timeseries primitives for timeseries data preprocessing +* mlprimitives.timeseres_error primitives for timeseries anomaly detection +* keras.Sequential.LSTMTimeSeriesRegressor +* sklearn.neighbors.KNeighbors Classifier and Regressor +* several sklearn.decomposition primitives +* several sklearn.ensemble primitives + +### Bug Fixes + +* Fix typo in mlprimitives.text.TextCleaner primitive +* Fix bug in index handling in featuretools.dfs primitive +* Fix bug in SingleLayerCNNImageClassifier annotation +* Remove old vlaidation tags from JSON annotations + +## 0.1.3 + +### New Features + +* Fix and re-enable featuretools.dfs primitive. + +## 0.1.2 + +### New Features + +* Add pipeline specification language and Evaluation utilities. +* Add pipelines for graph, text and tabular problems. +* New primitives ClassEncoder and ClassDecoder +* New primitives UniqueCounter and VocabularyCounter + +### Bug Fixes + +* Fix TrivialPredictor bug when working with numpy arrays +* Change XGB default learning rate and number of estimators + + +## 0.1.1 + +### New Features + +* Add more keras.applications primitives. +* Add a Text Cleanup primitive. + +### Bug Fixes + +* Add keywords to `keras.preprocessing` primtives. +* Fix the `image_transform` method. +* Add `epoch` as a fixed hyperparameter for `keras.Sequential` primitives. + +## 0.1.0 + +* First release on PyPI. + + +%package help +Summary: Development documents and examples for mlprimitives +Provides: python3-mlprimitives-doc +%description help +<p align="left"> + <a href="https://dai.lids.mit.edu"> + <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" /> + </a> + <i>An Open Source Project from the <a href="https://dai.lids.mit.edu">Data to AI Lab, at MIT</a></i> +</p> + +[](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) +[](https://pypi.python.org/pypi/mlprimitives) +[](https://github.com/MLBazaar/MLPrimitives/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster) +[](https://pepy.tech/project/mlprimitives) +[](https://mybinder.org/v2/gh/MLBazaar/MLBlocks/master?filepath=examples/tutorials) + +# MLPrimitives + +Pipelines and primitives for machine learning and data science. + +* Documentation: https://MLBazaar.github.io/MLPrimitives +* Github: https://github.com/MLBazaar/MLPrimitives +* License: [MIT](https://github.com/MLBazaar/MLPrimitives/blob/master/LICENSE) +* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) + +# Overview + +This repository contains primitive annotations to be used by the MLBlocks library, as well as +the necessary Python code to make some of them fully compatible with the MLBlocks API requirements. + +There is also a collection of custom primitives contributed directly to this library, which either +combine third party tools or implement new functionalities from scratch. + +## Why did we create this library? + +* Too many libraries in a fast growing field +* Huge societal need to build machine learning apps +* Domain expertise resides at several places (knowledge of math) +* No documented information about hyperparameters, behavior... + +# Installation + +## Requirements + +**MLPrimitives** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/) + +Also, although it is not strictly required, the usage of a +[virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid +interfering with other software installed in the system where **MLPrimitives** is run. + +## Install with pip + +The easiest and recommended way to install **MLPrimitives** is using [pip](https://pip.pypa.io/en/stable/): + +```bash +pip install mlprimitives +``` + +This will pull and install the latest stable release from [PyPi](https://pypi.org/). + +If you want to install from source or contribute to the project please read the +[Contributing Guide](https://MLBazaar.github.io/MLPrimitives/community/welcome.html). + +# Quickstart + +This section is a short series of tutorials to help you getting started with MLPrimitives. + +In the following steps you will learn how to load and run a primitive on some data. + +Later on you will learn how to evaluate and improve the performance of a primitive by tuning +its hyperparameters. + +## Running a Primitive + +In this first tutorial, we will be executing a single primitive for data transformation. + +### 1. Load a Primitive + +The first step in order to run a primitive is to load it. + +This will be done using the `mlprimitives.load_primitive` function, which will +load the indicated primitive as an [MLBlock Object from MLBlocks](https://MLBazaar.github.io/MLBlocks/api/mlblocks.html#mlblocks.MLBlock) + +In this case, we will load the `mlprimitives.custom.feature_extraction.CategoricalEncoder` +primitive. + +```python3 +from mlprimitives import load_primitive + +primitive = load_primitive('mlprimitives.custom.feature_extraction.CategoricalEncoder') +``` + +### 2. Load some data + +The CategoricalEncoder is a transformation primitive which applies one-hot encoding to all the +categorical columns of a `pandas.DataFrame`. + +So, in order to be able to run our primitive, we will first load some data that contains +categorical columns. + +This can be done with the `mlprimitives.datasets.load_census` function: + +```python3 +from mlprimitives.datasets import load_census + +dataset = load_census() +``` + +This dataset object has an attribute `data` which contains a table with several categorical +columns. + +We can have a look at this table by executing `dataset.data.head()`, which will return a +table like this: + +``` + 0 1 2 +age 39 50 38 +workclass State-gov Self-emp-not-inc Private +fnlwgt 77516 83311 215646 +education Bachelors Bachelors HS-grad +education-num 13 13 9 +marital-status Never-married Married-civ-spouse Divorced +occupation Adm-clerical Exec-managerial Handlers-cleaners +relationship Not-in-family Husband Not-in-family +race White White White +sex Male Male Male +capital-gain 2174 0 0 +capital-loss 0 0 0 +hours-per-week 40 13 40 +native-country United-States United-States United-States +``` + +### 3. Fit the primitive + +In order to run our pipeline, we first need to fit it. + +This is the process where it analyzes the data to detect which columns are categorical + +This is done by calling its `fit` method and assing the `dataset.data` as `X`. + +```python3 +primitive.fit(X=dataset.data) +``` + +### 4. Produce results + +Once the pipeline is fit, we can process the data by calling the `produce` method of the +primitive instance and passing agin the `data` as `X`. + +```python3 +transformed = primitive.produce(X=dataset.data) +``` + +After this is done, we can see how the transformed data contains the newly generated +one-hot vectors: + +``` + 0 1 2 3 4 +age 39 50 38 53 28 +fnlwgt 77516 83311 215646 234721 338409 +education-num 13 13 9 7 13 +capital-gain 2174 0 0 0 0 +capital-loss 0 0 0 0 0 +hours-per-week 40 13 40 40 40 +workclass= Private 0 0 1 1 1 +workclass= Self-emp-not-inc 0 1 0 0 0 +workclass= Local-gov 0 0 0 0 0 +workclass= ? 0 0 0 0 0 +workclass= State-gov 1 0 0 0 0 +workclass= Self-emp-inc 0 0 0 0 0 +... ... ... ... ... ... +``` + +## Tuning a Primitive + +In this short tutorial we will teach you how to evaluate the performance of a primitive +and improve its performance by modifying its hyperparameters. + +To do so, we will load a primitive that can learn from the transformed data that we just +generated and later on make predictions based on new data. + +### 1. Load another primitive + +Firs of all, we will load the `xgboost.XGBClassifier` primitive that we will use afterwards. + +```python3 +primitive = load_primitive('xgboost.XGBClassifier') +``` + +### 2. Split the dataset + +Before being able to evaluate the primitive perfomance, we need to split the data in two +parts: train, which will be used for the primitive to learn, and test, which will be used +to make the predictions that later on will be evaluated. + +In order to do this, we will get the first 75% of rows from the transformed data that we +obtained above and call it `X_train`, and then set the next 25% of rows as `X_test`. + +```python3 +train_size = int(len(transformed) * 0.75) +X_train = transformed.iloc[:train_size] +X_test = transformed.iloc[train_size:] +``` + +Similarly, we need to obtain the `y_train` and `y_test` variables containing the corresponding +output values. + +```python3 +y_train = dataset.target[:train_size] +y_test = dataset.target[train_size:] +``` + +### 3. Fit the new primitive + +Once we have have splitted the data, we can fit the primitive by passing `X_train` and `y_train` +to its `fit` method. + +```python3 +primitive.fit(X=X_train, y=y_train) +``` + +### 4. Make predictions + +Once the primitive has been fitted, we can produce predictions using the `X_test` data as input. + +```python3 +predictions = primitive.produce(X=X_test) +``` + +### 5. Evalute the performance + +We can now evaluate how good the predictions from our primitive are by using the `score` +method from the `dataset` object on both the expected output and the real output from the +primitive: + +```python3 +dataset.score(y_test, predictions) +``` + +This will output a float value between 0 and 1 indicating how good the predicitons are, being +0 the worst score possible and 1 the best one. + +In this case we will obtain a score around 0.866 + +### 6. Set new hyperparameter values + +In order to improve the performance of our primitive we will try to modify a couple of its +hyperparameters. + +First we will see which hyperparameter values the primitive has by calling its +`get_hyperparameters` method. + +```python3 +primitive.get_hyperparameters() +``` + +which will return a dictionary like this: + +```python +{ + "n_jobs": -1, + "n_estimators": 100, + "max_depth": 3, + "learning_rate": 0.1, + "gamma": 0, + "min_child_weight": 1 +} +``` + +Next, we will see which are the valid values for each one of those hyperparameters by calling its +`get_tunable_hyperparameters` method: + +```python3 +primitive.get_tunable_hyperparameters() +``` + +For example, we will see that the `max_depth` hyperparameter has the following specification: + +```python +{ + "type": "int", + "default": 3, + "range": [ + 3, + 10 + ] +} +``` + +Next, we will choose a valid value, for example 7, and set it into the pipeline using the +`set_hyperparameters` method: + +```python3 +primitive.set_hyperparameters({'max_depth': 7}) +``` + +### 7. Re-evaluate the performance + +Once the new hyperparameter value has been set, we repeat the fit/train/score cycle to +evaluate the performance of this new hyperparameter value: + +```python3 +primitive.fit(X=X_train, y=y_train) +predictions = primitive.produce(X=X_test) +dataset.score(y_test, predictions) +``` + +This time we should see that the performance has improved to a value around 0.724 + +## What's Next? + +Do you want to [learn more about how the project](https://MLBazaar.github.io/MLPrimitives/getting_started/concepts.html), +about [how to contribute to it](https://MLBazaar.github.io/MLPrimitives/community/contributing.html) +or browse the [API Reference](https://MLBazaar.github.io/MLPrimitives/api/mlprimitives.html)? +Please check the corresponding sections of the [documentation](https://MLBazaar.github.io/MLPrimitives/)! + + +# History + +## 0.3.5 - 2023-04-14 + +### General Imporvements + +* Update `mlblocks` cap - [Issue #278](https://github.com/MLBazaar/MLPrimitives/issues/278) by @sarahmish + +## 0.3.4 - 2023-01-24 + +### General Imporvements + +* Update `mlblocks` cap - [Issue #277](https://github.com/MLBazaar/MLPrimitives/issues/277) by @sarahmish + +## 0.3.3 - 2023-01-20 + +### General Imporvements + +* Update dependencies - [Issue #276](https://github.com/MLBazaar/MLPrimitives/issues/276) by @sarahmish + +### Adapter Improvements + +* Building model within fit in keras adapter- [Issue #267](https://github.com/MLBazaar/MLPrimitives/issues/267) by @sarahmish + +## 0.3.2 - 2021-11-09 + +### Adapter Improvements + +* Inferring data shapes with single dimension for keras adapter - [Issue #265](https://github.com/MLBazaar/MLPrimitives/issues/265) by @sarahmish + +## 0.3.1 - 2021-10-07 + +### Adapter Improvements + +* Dynamic target_shape in keras adapter - [Issue #263](https://github.com/MLBazaar/MLPrimitives/issues/263) by @sarahmish +* Save keras primitives in Windows environment - [Issue #261](https://github.com/MLBazaar/MLPrimitives/issues/261) by @sarahmish + +### General Imporvements + +* Update TensorFlow and NumPy dependency - [Issue #259](https://github.com/MLBazaar/MLPrimitives/issues/259) by @sarahmish + +## 0.3.0 - 2021-01-09 + +### New Primitives + +* Add primitive `sklearn.naive_bayes.GaussianNB` - [Issue #242](https://github.com/MLBazaar/MLPrimitives/issues/242) by @sarahmish +* Add primitive `sklearn.linear_model.SGDClassifier` - [Issue #241](https://github.com/MLBazaar/MLPrimitives/issues/241) by @sarahmish + +### Primitive Improvements + +* Add offset to rolling_window_sequence primitive - [Issue #251](https://github.com/MLBazaar/MLPrimitives/issues/251) by @skyeeiskowitz +* Rename the time_index column to time - [Issue #252](https://github.com/MLBazaar/MLPrimitives/issues/252) by @pvk-developer +* Update featuretools dependency - [Issue #250](https://github.com/MLBazaar/MLPrimitives/issues/250) by @pvk-developer + +### General Improvements + +* Udpate dependencies and add python3.8 - [Issue #246](https://github.com/MLBazaar/MLPrimitives/issues/246) by @csala +* Drop Python35 - [Issue #244](https://github.com/MLBazaar/MLPrimitives/issues/244) by @csala + +## 0.2.5 - 2020-07-29 + +### Primitive Improvements + +* Accept timedelta `window_size` in `cutoff_window_sequences` - [Issue #239](https://github.com/MLBazaar/MLPrimitives/issues/239) by @joanvaquer + +### Bug Fixes + +* ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via `pip install tensorflow` - [Issue #237](https://github.com/MLBazaar/MLPrimitives/issues/237) by @joanvaquer + +### New Primitives + +* Add `pandas.DataFrame.set_index` primitive - [Issue #222](https://github.com/MLBazaar/MLPrimitives/issues/222) by @JDTheRipperPC + +## 0.2.4 - 2020-01-30 + +### New Primitives + +* Add RangeScaler and RangeUnscaler primitives - [Issue #232](https://github.com/MLBazaar/MLPrimitives/issues/232) by @csala + +### Primitive Improvements + +* Extract input_shape from X in keras.Sequential - [Issue #223](https://github.com/MLBazaar/MLPrimitives/issues/223) by @csala + +### Bug Fixes + +* mlprimitives.custom.text.TextCleaner fails if text is empty - [Issue #228](https://github.com/MLBazaar/MLPrimitives/issues/228) by @csala +* Error when loading the reviews dataset - [Issue #230](https://github.com/MLBazaar/MLPrimitives/issues/230) by @csala +* Curate dependencies: specify an explicit prompt-toolkit version range - [Issue #224](https://github.com/MLBazaar/MLPrimitives/issues/224) by @csala + +## 0.2.3 - 2019-11-14 + +### New Primitives + +* Add primitive to make window_sequences based on cutoff times - [Issue #217](https://github.com/MLBazaar/MLPrimitives/issues/217) by @csala +* Create a keras LSTM based TimeSeriesClassifier primitive - [Issue #218](https://github.com/MLBazaar/MLPrimitives/issues/218) by @csala +* Add pandas DataFrame primitives - [Issue #214](https://github.com/MLBazaar/MLPrimitives/issues/214) by @csala +* Add featuretools.EntitySet.normalize_entity primitive - [Issue #209](https://github.com/MLBazaar/MLPrimitives/issues/209) by @csala + +### Primitive Improvements + +* Make featuretools.EntitySet.entity_from_dataframe entityset arg optional - [Issue #208](https://github.com/MLBazaar/MLPrimitives/issues/208) by @csala + +* Add text regression dataset - [Issue #206](https://github.com/MLBazaar/MLPrimitives/issues/206) by @csala + +### Bug Fixes + +* pandas.DataFrame.resample crash when grouping by integer columns - [Issue #211](https://github.com/MLBazaar/MLPrimitives/issues/211) by @csala + +## 0.2.2 - 2019-10-08 + +### New Primitives + +* Add primitives for GAN based time-series anomaly detection - [Issue #200](https://github.com/MLBazaar/MLPrimitives/issues/200) by @AlexanderGeiger +* Add `numpy.reshape` and `numpy.ravel` primitives - [Issue #197](https://github.com/MLBazaar/MLPrimitives/issues/197) by @AlexanderGeiger +* Add feature selection primitive based on Lasso - [Issue #194](https://github.com/MLBazaar/MLPrimitives/issues/194) by @csala + +### Primitive Improvements + +* `feature_extraction.CategoricalEncoder` support dtype category - [Issue #196](https://github.com/MLBazaar/MLPrimitives/issues/196) by @csala + +## 0.2.1 - 2019-09-09 + +### New Primitives + +* Timeseries Intervals to Mask Primitive - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger +* Add new primitive: Arima model - [Issue #168](https://github.com/MLBazaar/MLPrimitives/issues/168) by @AlexanderGeiger + +### Primitive Improvements + +* Curate PCA primitive hyperparameters - [Issue #190](https://github.com/MLBazaar/MLPrimitives/issues/190) by @AlexanderGeiger +* Add option to drop rolling window sequences - [Issue #186](https://github.com/MLBazaar/MLPrimitives/issues/186) by @AlexanderGeiger + +### Bug Fixes + +* scikit-image==0.14.3 crashes when installed on Mac - [Issue #188](https://github.com/MLBazaar/MLPrimitives/issues/188) by @csala + +## 0.2.0 + +### New Features + +* Publish the pipelines as an `entry_point` +[Issue #175](https://github.com/MLBazaar/MLPrimitives/issues/175) by @csala + +### Primitive Improvements + +* Improve pandas.DataFrame.resample primitive [Issue #177](https://github.com/MLBazaar/MLPrimitives/issues/177) by @csala +* Improve `feature_extractor` primitives [Issue #183](https://github.com/MLBazaar/MLPrimitives/issues/183) by @csala +* Improve `find_anomalies` primitive [Issue #180](https://github.com/MLBazaar/MLPrimitives/issues/180) by @AlexanderGeiger + +### Bug Fixes + +* Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor [Issue #176](https://github.com/MLBazaar/MLPrimitives/issues/176) by @DanielCalvoCerezo + + +## 0.1.10 + +### New Features + +* Add function to run primitives without a pipeline [Issue #43](https://github.com/MLBazaar/MLPrimitives/issues/43) by @csala + +### New Pipelines + +* Add pipelines for all the MLBlocks examples [Issue #162](https://github.com/MLBazaar/MLPrimitives/issues/162) by @csala + +### Primitive Improvements + +* Add Early Stopping to `keras.Sequential.LSTMTimeSeriesRegressor` primitive [Issue #156](https://github.com/MLBazaar/MLPrimitives/issues/156) by @csala +* Make FeatureExtractor primitives accept Numpy arrays [Issue #165](https://github.com/MLBazaar/MLPrimitives/issues/165) by @csala +* Add window size and pruning to the `timeseries_anomalies.find_anomalies` primitive [Issue #160](https://github.com/MLBazaar/MLPrimitives/issues/160) by @csala + + +## 0.1.9 + +### New Features + +* Add a single table binary classification dataset [Issue #141](https://github.com/MLBazaar/MLPrimitives/issues/141) by @csala + +### New Primitives + +* Add Multilayer Perceptron (MLP) primitive for binary classification [Issue #140](https://github.com/MLBazaar/MLPrimitives/issues/140) by @Hector-hedb12 +* Add primitive for Sequence classification with LSTM [Issue #150](https://github.com/MLBazaar/MLPrimitives/issues/150) by @Hector-hedb12 +* Add VGG-like convnet primitive [Issue #149](https://github.com/MLBazaar/MLPrimitives/issues/149) by @Hector-hedb12 +* Add Multilayer Perceptron (MLP) primitive for multi-class softmax classification [Issue #139](https://github.com/MLBazaar/MLPrimitives/issues/139) by @Hector-hedb12 +* Add primitive to count feature matrix columns [Issue #146](https://github.com/MLBazaar/MLPrimitives/issues/146) by @csala + +### Primitive Improvements + +* Add additional fit and predict arguments to keras.Sequential [Issue #161](https://github.com/MLBazaar/MLPrimitives/issues/161) by @csala +* Add suport for keras.Sequential Callbacks [Issue #159](https://github.com/MLBazaar/MLPrimitives/issues/159) by @csala +* Add fixed hyperparam to control keras.Sequential verbosity [Issue #143](https://github.com/MLBazaar/MLPrimitives/issues/143) by @csala + +## 0.1.8 + +### New Primitives + +* mlprimitives.custom.timeseries_preprocessing.time_segments_average - [Issue #137](https://github.com/MLBazaar/MLPrimitives/issues/137) + +### New Features + +* Add target_index output in timseries_preprocessing.rolling_window_sequences - [Issue #136](https://github.com/MLBazaar/MLPrimitives/issues/136) + +## 0.1.7 + +### General Improvements + +* Validate JSON format in `make lint` - [Issue #133](https://github.com/MLBazaar/MLPrimitives/issues/133) +* Add demo datasets - [Issue #131](https://github.com/MLBazaar/MLPrimitives/issues/131) +* Improve featuretools.dfs primitive - [Issue #127](https://github.com/MLBazaar/MLPrimitives/issues/127) + +### New Primitives + +* pandas.DataFrame.resample - [Issue #123](https://github.com/MLBazaar/MLPrimitives/issues/123) +* pandas.DataFrame.unstack - [Issue #124](https://github.com/MLBazaar/MLPrimitives/issues/124) +* featuretools.EntitySet.add_relationship - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126) +* featuretools.EntitySet.entity_from_dataframe - [Issue #126](https://github.com/MLBazaar/MLPrimitives/issues/126) + +### Bug Fixes + +* Bug in timeseries_anomalies.py - [Issue #119](https://github.com/MLBazaar/MLPrimitives/issues/119) + +## 0.1.6 + +### General Improvements + +* Add Contributing Documentation +* Remove upper bound in pandas version given new release of `featuretools` v0.6.1 +* Improve LSTMTimeSeriesRegressor hyperparameters + +### New Primitives + +* mlprimitives.candidates.dsp.SpectralMask +* mlprimitives.custom.timeseries_anomalies.find_anomalies +* mlprimitives.custom.timeseries_anomalies.regression_errors +* mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences +* mlprimitives.custom.timeseries_preprocessing.time_segments_average +* sklearn.linear_model.ElasticNet +* sklearn.linear_model.Lars +* sklearn.linear_model.Lasso +* sklearn.linear_model.MultiTaskLasso +* sklearn.linear_model.Ridge + +## 0.1.5 + +### New Primitives + +* sklearn.impute.SimpleImputer +* sklearn.preprocessing.MinMaxScaler +* sklearn.preprocessing.MaxAbsScaler +* sklearn.preprocessing.RobustScaler +* sklearn.linear_model.LinearRegression + +### General Improvements + +* Separate curated from candidate primitives +* Setup `entry_points` in setup.py to improve compaitibility with MLBlocks +* Add a test-pipelines command to test all the existing pipelines +* Clean sklearn example pipelines +* Change the `author` entry to a `contributors` list +* Change the name of `mlblocks_primitives` folder +* Pip install `requirements_dev.txt` fail documentation + +### Bug Fixes + +* Fix LSTMTimeSeriesRegressor primitive. Issue #90 +* Fix timeseries primitives. Issue #91 +* Negative index anomalies in `timeseries_errors`. Issue #89 +* Keep pandas version below 0.24.0. Issue #87 + +## 0.1.4 + +### New Primitives + +* mlprimitives.timeseries primitives for timeseries data preprocessing +* mlprimitives.timeseres_error primitives for timeseries anomaly detection +* keras.Sequential.LSTMTimeSeriesRegressor +* sklearn.neighbors.KNeighbors Classifier and Regressor +* several sklearn.decomposition primitives +* several sklearn.ensemble primitives + +### Bug Fixes + +* Fix typo in mlprimitives.text.TextCleaner primitive +* Fix bug in index handling in featuretools.dfs primitive +* Fix bug in SingleLayerCNNImageClassifier annotation +* Remove old vlaidation tags from JSON annotations + +## 0.1.3 + +### New Features + +* Fix and re-enable featuretools.dfs primitive. + +## 0.1.2 + +### New Features + +* Add pipeline specification language and Evaluation utilities. +* Add pipelines for graph, text and tabular problems. +* New primitives ClassEncoder and ClassDecoder +* New primitives UniqueCounter and VocabularyCounter + +### Bug Fixes + +* Fix TrivialPredictor bug when working with numpy arrays +* Change XGB default learning rate and number of estimators + + +## 0.1.1 + +### New Features + +* Add more keras.applications primitives. +* Add a Text Cleanup primitive. + +### Bug Fixes + +* Add keywords to `keras.preprocessing` primtives. +* Fix the `image_transform` method. +* Add `epoch` as a fixed hyperparameter for `keras.Sequential` primitives. + +## 0.1.0 + +* First release on PyPI. + + +%prep +%autosetup -n mlprimitives-0.3.5 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-mlprimitives -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Mon May 15 2023 Python_Bot <Python_Bot@openeuler.org> - 0.3.5-1 +- Package Spec generated @@ -0,0 +1 @@ +12cc78066c9be45919cc3e8f489c6045 mlprimitives-0.3.5.tar.gz |
