diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-05-10 05:49:32 +0000 |
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-05-10 05:49:32 +0000 |
| commit | dc308ce3579ce93aab307acd3a5042b55bf6d01b (patch) | |
| tree | c8a7773245a3eafb8e1ad9cb4cf9e6afd98e3d87 | |
| parent | 5e0148ceb79b0aec3ad957435a400b853dd0f146 (diff) | |
automatic import of python-ballet
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-ballet.spec | 1120 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 1122 insertions, 0 deletions
@@ -0,0 +1 @@ +/ballet-0.19.5.tar.gz diff --git a/python-ballet.spec b/python-ballet.spec new file mode 100644 index 0000000..7130f45 --- /dev/null +++ b/python-ballet.spec @@ -0,0 +1,1120 @@ +%global _empty_manifest_terminate_build 0 +Name: python-ballet +Version: 0.19.5 +Release: 1 +Summary: Core functionality for lightweight, collaborative data science projects +License: MIT license +URL: https://github.com/ballet/ballet +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/a6/00/405612a825efa4dac5524a14501cdbff0426ad22bc24e98bfb0e945876f5/ballet-0.19.5.tar.gz +BuildArch: noarch + +Requires: python3-black +Requires: python3-cookiecutter +Requires: python3-Click +Requires: python3-dill +Requires: python3-dynaconf +Requires: python3-funcy +Requires: python3-gitpython +Requires: python3-h5py +Requires: python3-numpy +Requires: python3-packaging +Requires: python3-pandas +Requires: python3-pygithub +Requires: python3-slugify +Requires: python3-pyyaml +Requires: python3-requests +Requires: python3-scipy +Requires: python3-sklearn-pandas +Requires: python3-stacklog +Requires: python3-tqdm +Requires: python3-dataclasses +Requires: python3-scikit-learn +Requires: python3-scikit-learn +Requires: python3-category-encoders +Requires: python3-feature-engine +Requires: python3-featuretools-sklearn-transformer +Requires: python3-skits +Requires: python3-tsfresh +Requires: python3-category-encoders +Requires: python3-types-pkg-resources +Requires: python3-types-requests +Requires: python3-types-python-slugify +Requires: python3-bump2version +Requires: python3-pip +Requires: python3-watchdog[watchmedo] +Requires: python3-invoke +Requires: python3-mypy +Requires: python3-m2r2 +Requires: python3-sphinx +Requires: python3-sphinx-rtd-theme +Requires: python3-sphinx-click +Requires: python3-sphinx-autodoc-typehints +Requires: python3-sphinx-copybutton +Requires: python3-rstcheck +Requires: python3-flake8 +Requires: python3-isort +Requires: python3-autopep8 +Requires: python3-twine +Requires: python3-wheel +Requires: python3-coverage +Requires: python3-pytest +Requires: python3-pytest-cov +Requires: python3-pytest-virtualenv +Requires: python3-tox +Requires: python3-responses +Requires: python3-feature-engine +Requires: python3-featuretools-sklearn-transformer +Requires: python3-skits +Requires: python3-coverage +Requires: python3-pytest +Requires: python3-pytest-cov +Requires: python3-pytest-virtualenv +Requires: python3-tox +Requires: python3-responses +Requires: python3-tsfresh + +%description +[](https://pypi.org/project/ballet) +[](https://github.com/ballet/ballet/actions?query=workflow%3A%22Tests%22) +[](https://codecov.io/gh/ballet/ballet) + + +# ballet + +A **light**weight framework for collaborative, open-source data science +projects through **feat**ure engineering. + +- Free software: MIT license +- Documentation: https://ballet.github.io/ballet +- Homepage: https://github.com/ballet/ballet + +## Overview + +Do you develop machine learning models? Do you work by yourself or on a team? +Do you share notebooks or are you committing code to a shared repository? In +contrast to successful, massively collaborative, open-source projects like +the Linux kernel, the Rails framework, Firefox, GNU, or Tensorflow, most +data science projects are developed by just a handful of people. But think if +the open-source community could leverage its ingenuity and determination to +collaboratively develop data science projects to predict the incidence of +disease in a population, to predict whether vulnerable children will be evicted +from their homes, or to predict whether learners will drop out of online +courses. + +Our vision is to make collaborative data science possible by making it more +like open-source software development. Our approach is based on decomposing the +data science process into modular patches +that can then be intelligently combined, representing objects like "feature definition", +"labeling function", or "prediction task definition". Collaborators work in +parallel to write patches and submit them to a repo. The core Ballet framework +provides the underlying functionality to merge high-quality contributions, +collect modules from the file system, and compose the accepted contributions +into a single product. It also provides [Assembl茅](https://github.com/ballet/ballet-assemble), a familiar notebook-based development +experience that is friendly to data scientists and other inexperienced +open-source contributors. We don't require any computing infrastructure beyond +that which is commonly used in open-source software development. + +Currently, Ballet focuses on supporting collaboratively developing +*feature engineering pipelines*, an important part of many data science +projects. Individual feature definitions are represented as separate Python modules, +declaring the subset of a dataframe that they operate on and a +scikit-learn-style learned transformer that extracts feature values from the +raw data. Ballet collects individual feature definitions and composes them into a +feature engineering pipeline. At any point, a project built on Ballet can be +installed for end-to-end feature engineering on new data instances for the +same problem. How do we ensure the feature engineering pipeline is always +useful? Ballet thoroughly validates proposed feature definitions for correctness and +machine learning performance, using an extensive test suite and a novel +streaming feature definition selection algorithm. Accepted feature definitions can be +automatically merged by the [Ballet Bot](https://github.com/ballet/ballet-bot) into projects. + +<img src="./docs/_static/feature_lifecycle.png" alt="Ballet Feature Lifecycle" width="400" /> + +## Next steps + +*Are you a data owner or project maintainer that wants to organize a +collaboration?* + +馃憠 Check out the [Ballet Maintainer Guide](https://ballet.github.io/ballet/maintainer_guide.html) + +*Are you a data scientist or enthusiast that wants to join a collaboration?* + +馃憠 Check out the [Ballet Contributor Guide](https://ballet.github.io/ballet/contributor_guide.html) + +*Want to learn about how Ballet enables Better Feature Engineering鈩笍?* + +馃憠 Check out the [Feature Engineering Guide](https://ballet.github.io/ballet/feature_engineering_guide.html) + +*Want to see a demo collaboration in progress and maybe even participate yourself?* + +馃憠 Check out the [ballet-predict-house-prices](https://github.com/HDI-Project/ballet-predict-house-prices) project + +## Source code organization + +This is a quick overview to the Ballet core source code organization. For more information about contributing to Ballet core itself, see [here](https://ballet.github.io/ballet/contributing.html). + +| path | description | +| ---- | ----------- | +| [`cli.py`](ballet/cli.py) | the `ballet` command line utility | +| [`client.py`](ballet/client.py) | the interactive client for users | +| [`contrib.py`](ballet/contrib.py) | collecting feature definitions from individual modules in source files in the file system | +| [`eng/base.py`](ballet/eng/base.py) | abstractions for transformers used in feature definitions, such as `BaseTransformer` | +| [`eng/{misc,missing,ts}.py`](ballet/eng/) | custom transformers for missing data, time series problems, and more | +| [`eng/external.py`](ballet/eng/external.py) | re-export of transformers from external libraries such as scikit-learn and feature_engine | +| [`feature.py`](ballet/feature.py) | the `Feature` abstraction | +| [`pipeline.py`](ballet/pipeline.py) | the `FeatureEngineeringPipeline` abstraction | +| [`project.py`](ballet/project.py) | the interface between a specific Ballet project and the core Ballet library, such as utilities to load project-specific information and the `Project` abstraction | +| [`templates/`](ballet/templates/) | cookiecutter templates for creating a new Ballet project or creating a new feature definition | +| [`templating.py`](ballet/templating.py) | user-facing functionality on top of the templates | +| [`transformer.py`](ballet/transformer.py) | wrappers for transformers that make them play nicely together in a pipeline | +| [`update.py`](ballet/update.py) | functionality to update the project template from a new upstream release | +| [`util/`](ballet/util/) | various utilities | +| [`validation/main.py`](ballet/validation/main.py) | entry point for all validation routines | +| [`validation/base.py`](ballet/validation/base.py) | abstractions used in validation such as the `FeaturePerformanceEvaluator` | +| [`validation/common.py`](ballet/validation/common.py) | common functionality used in validation, such as the ability to collect relevant changes between a current environment and a reference environment (such as a pull request vs a default branch) | +| [`validation/entropy.py`](ballet/validation/entropy.py) | statistical estimation routines used in feature definition selection algorithms, such as estimators for entropy, mutual information, and conditional mutual information | +| [`validation/feature_acceptance/`](ballet/validation/feature_acceptance/) | validation routines for feature acceptance +| [`validation/feature_pruning/`](ballet/validation/feature_pruning/) | validation routines for feature pruning | +| [`validation/feature_api/`](ballet/validation/feature_api/) | validation routines for feature APIs | +| [`validation/project_structure/`](ballet/validation/project_structure/) | validation routines for project structure | + + +# History + +## 0.19.5 (2021-07-17) + +* Fix bug with deepcopying `ballet.pipeline.FeatureEngineeringPipeline` + +## 0.19.4 (2021-07-17) + +* Fix bug with deepcopying `ballet.eng.base.SubsetTransformer` ([#90](https://github.com/ballet/ballet/issues/90)) +* Add `ballet.drop_missing_targets` primitive + +## 0.19.3 (2021-06-28) + +* Support missing targets in discovery and feature performance evaluation ([#89](https://github.com/ballet/ballet/pull/89)) +* Add `ninputs` to summary statistics in `ballet.discovery.discover` + +## 0.19.2 (2021-06-21) + +* Improve discrete column detection in the case of many repeated values +* Add `ncontinuous` and `ndiscrete` to summary statistics in `ballet.discovery.discover` + +## 0.19.1 (2021-06-20) + +* Defer computation of some expensive summary statistics in `ballet.discovery.discover` + +## 0.19.0 (2021-06-16) + +* Support callable as feature input ([#88](https://github.com/ballet/ballet/pull/88)) + +## 0.18.0 (2021-06-06) + +* Added [Consumer Guide](https://ballet.github.io/ballet/consumer_guide.html) +* Can use Ballet together with MLBlocks to engineer features and then use additional preprocessing and ML components ([#86](https://github.com/ballet/ballet/pull/86)) +* Can wrap the extracted feature matrix in a data frame with named columns derived from ``feature.output`` or ``feature.name`` +* Implemented `ballet.encoder.EncoderPipeline` to (mostly) mirror `ballet.pipeline.FeatureEngineeringPipeline` +* Can specify the dataset used for fitting the pipeline in the engineer-features CLI via `--train-dir path/to/train/dir` + +## 0.17.0 (2021-05-24) + +* Support nested transformers, both with nested features and with input/transformer tuples wrapped with SubsetTransformers ([#82](https://github.com/ballet/ballet/pull/82)) +* Allow `Client.discover` to skip summary statistics if development dataset cannot be loaded or if features produce errors + +## 0.16.0 (2021-05-22) + +* Add `Client.discover` functionality ([#80](https://github.com/ballet/ballet/pull/80)) +* Switch the order of `NullFiller` parameters to more closely resemble `fillna` signature + +## 0.15.2 (2021-05-14) + +* Operate columnwise in `VarianceThresholdAccepter`, rather than computing the variance of + the entire feature group. + +## 0.15.1 (2021-05-12) + +* Add debug logging for new accepters + +## 0.15.0 (2021-05-12) + +* Add `VarianceThresholdAccepter`, `MutualInformationAccepter`, and `CompoundAccepter` ([#76](https://github.com/ballet/ballet/pull/76)) + +## 0.14.0 (2021-05-11) + +* Support using holdout data splits in validation ([#75](https://github.com/ballet/ballet/pull/75)) +* Fix CLI program name in projects ([#74](https://github.com/ballet/ballet/pull/74)) +* Fix bug with `load_config` usage in python REPL ([#73](https://github.com/ballet/ballet/pull/73)) +* Reorganize external feature engineering primitives to `ballet/eng/external/**.py`. Imports like `from ballet.eng.external import MyPrimitive` are unaffected. + +## 0.13.1 (2021-04-02) + +* Fix upgrade check in `ballet update-project-template` to migrate away from deprecated PyPI XML-RPC API. + +## 0.13.0 (2021-03-30) + +* Fix links in project template + +## 0.12.0 (2021-03-10) + +* Automate creation of GitHub repository in quickstart + +## 0.11.0 (2021-03-04) + +* Allow validation to be run from topic branches locally + +## 0.10.0 (2021-02-23) + +* Add `Project.version` property + +## 0.9.0 (2021-02-16) + +* Add support for managed branching via `ballet start-new-feature --branching` (defaults to enabled) +* Remove confusing `ballet.project.config` attribute +* Implement `ballet.project.load_config` as a better alternative, and use this in the project template's `load_data` + +## 0.8.2 (2021-02-16) + +* Fix bug with `str(t)` or `repr(t)` for `DelegatingRobustTransformer` + +## 0.8.1 (2021-02-16) + +* Fix bug with `str(t)` or `repr(t)` for `SimpleFunctionTransformer` + +## 0.8.0 (2021-02-02) + +* Fix bug with detecting updates to Ballet due to PyPI API outage +* Fix some dependency conflicts +* Reference ballet-assemble in project template +* Bump feature_engine to 1.0 + +## 0.7.11 (2020-09-16) + +* Reduce verbosity of conversion approach logging by moving some messages to TRACE level +* Implement "else" transformer for `ConditionalTransformer` +* Improve GFSSF iteration logging + +## 0.7.10 (2020-09-08) + +* Fix bug with different treatment of y_df and y; now, y_df is passed to the feature engineering pipeline, and y is passed to the feature validation routines as applicable. +* Switch back to using Gitter + +## 0.7.9 (2020-08-15) + +* Add give_advice feature for FeatureAPICheck and other checks to log message on how to fix failure +* Improve logging of GFSSFAccepter and GFSSFPruner +* Improve `__str__` for DelegatingRobustTransformer and consequently consumers +* Change default log format to SIMPLE_LOG_FORMAT +* Various bug fixes and improvements + +## 0.7.8 (2020-08-13) + +* Add CanTransformNewRowsCheck to feature API checks + +## 0.7.7 (2020-08-12) + +* Support `None` as the transformer in a `Feature`, it will be automatically converted to an `IdentityTransformer` +* Implement `ColumnSelector` +* Update docs +* Various bug fixes and improvements + +## 0.7.6 (2020-08-12) + +* Re-export feature engineering primitives from various libraries +* Show type annotations in docs +* Update guides +* Various bug fixes and improvements + +## 0.7.5 (2020-08-03) + +* Make validator parameters configurable in ballet.yml file (e.g. 位_1 and 位_2 for GFSSF algorithms) +* Support dynaconf 3.x + +## 0.7.4 (2020-07-22) + +* Accept logger names, as well as logger instances, in `ballet.util.log.enable` +* Updated docs + +## 0.7.3 (2020-07-21) + +* Add `load_data` method with built-in caching to project API +* Fix bug in GFSSF accepter +* Always use encoded target during validation +* Various bug fixes and improvements + +## 0.7.2 (2020-07-21) + +* Add sample analysis notebook to project template +* Add binder url/badge to project template +* Fix bug with enabling logging with multiple loggers + +## 0.7.1 (2020-07-20) + +* Add client for easy interactive usage (`ballet.b`) +* Add binder setup to project template + +## 0.7 (2020-07-17) + +* Revamp project template: update project structure, create single API via FeatureEngineeringProject, use and add support for pyinvoke, revamp build into engineer_features, support repolockr bot +* Improve ballet.project.Project: can create by ascending from given path, can create from current working directory, can resolve arbitrary project symbol, exposes project's API +* Check for and notify of new release of ballet during project update (`ballet update-project-template`) +* Add ComputedValueTransformer to ballet.eng +* Move stacklog to separate project and install it +* Add validators that {never,always} accept submissions +* Add feature API checks to ensure that the feature can fit and transform a single row +* Add feature engineering guide to documentation and significantly expand contributor guide +* Add bot installation instructions to maintainer guide +* Add type annotations throughout +* Drop support for py35, add support for py38 +* Deprecate modeling code +* Various bug fixes and improvements + +## 0.6 (2019-11-12) + +* Implement GFSSF validators and random validators +* Improve validators and allow validators to be configured in ballet.yml +* Improve project template +* Create ballet CLI +* Bug fixes and performance improvements + +## 0.5 (2018-10-14) + +* Add project template and ballet-quickstart command +* Add project structure checks and feature API checks +* Implement multi-stage validation routine driver + +## 0.4 (2018-09-21) + +* Implement `Modeler` for versatile modeling and evaluation +* Change project name + +## 0.3 (2018-04-28) + +* Implement `PullRequestFeatureValidator` +* Add `util.travis`, `util.modutil`, `util.git` util modules + +## 0.2 (2018-04-11) + +* Implement `ArrayLikeEqualityTestingMixin` +* Implement `collect_contrib_features` + +## 0.1 (2018-04-08) + +* First release on PyPI + + + + +%package -n python3-ballet +Summary: Core functionality for lightweight, collaborative data science projects +Provides: python-ballet +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-ballet +[](https://pypi.org/project/ballet) +[](https://github.com/ballet/ballet/actions?query=workflow%3A%22Tests%22) +[](https://codecov.io/gh/ballet/ballet) + + +# ballet + +A **light**weight framework for collaborative, open-source data science +projects through **feat**ure engineering. + +- Free software: MIT license +- Documentation: https://ballet.github.io/ballet +- Homepage: https://github.com/ballet/ballet + +## Overview + +Do you develop machine learning models? Do you work by yourself or on a team? +Do you share notebooks or are you committing code to a shared repository? In +contrast to successful, massively collaborative, open-source projects like +the Linux kernel, the Rails framework, Firefox, GNU, or Tensorflow, most +data science projects are developed by just a handful of people. But think if +the open-source community could leverage its ingenuity and determination to +collaboratively develop data science projects to predict the incidence of +disease in a population, to predict whether vulnerable children will be evicted +from their homes, or to predict whether learners will drop out of online +courses. + +Our vision is to make collaborative data science possible by making it more +like open-source software development. Our approach is based on decomposing the +data science process into modular patches +that can then be intelligently combined, representing objects like "feature definition", +"labeling function", or "prediction task definition". Collaborators work in +parallel to write patches and submit them to a repo. The core Ballet framework +provides the underlying functionality to merge high-quality contributions, +collect modules from the file system, and compose the accepted contributions +into a single product. It also provides [Assembl茅](https://github.com/ballet/ballet-assemble), a familiar notebook-based development +experience that is friendly to data scientists and other inexperienced +open-source contributors. We don't require any computing infrastructure beyond +that which is commonly used in open-source software development. + +Currently, Ballet focuses on supporting collaboratively developing +*feature engineering pipelines*, an important part of many data science +projects. Individual feature definitions are represented as separate Python modules, +declaring the subset of a dataframe that they operate on and a +scikit-learn-style learned transformer that extracts feature values from the +raw data. Ballet collects individual feature definitions and composes them into a +feature engineering pipeline. At any point, a project built on Ballet can be +installed for end-to-end feature engineering on new data instances for the +same problem. How do we ensure the feature engineering pipeline is always +useful? Ballet thoroughly validates proposed feature definitions for correctness and +machine learning performance, using an extensive test suite and a novel +streaming feature definition selection algorithm. Accepted feature definitions can be +automatically merged by the [Ballet Bot](https://github.com/ballet/ballet-bot) into projects. + +<img src="./docs/_static/feature_lifecycle.png" alt="Ballet Feature Lifecycle" width="400" /> + +## Next steps + +*Are you a data owner or project maintainer that wants to organize a +collaboration?* + +馃憠 Check out the [Ballet Maintainer Guide](https://ballet.github.io/ballet/maintainer_guide.html) + +*Are you a data scientist or enthusiast that wants to join a collaboration?* + +馃憠 Check out the [Ballet Contributor Guide](https://ballet.github.io/ballet/contributor_guide.html) + +*Want to learn about how Ballet enables Better Feature Engineering鈩笍?* + +馃憠 Check out the [Feature Engineering Guide](https://ballet.github.io/ballet/feature_engineering_guide.html) + +*Want to see a demo collaboration in progress and maybe even participate yourself?* + +馃憠 Check out the [ballet-predict-house-prices](https://github.com/HDI-Project/ballet-predict-house-prices) project + +## Source code organization + +This is a quick overview to the Ballet core source code organization. For more information about contributing to Ballet core itself, see [here](https://ballet.github.io/ballet/contributing.html). + +| path | description | +| ---- | ----------- | +| [`cli.py`](ballet/cli.py) | the `ballet` command line utility | +| [`client.py`](ballet/client.py) | the interactive client for users | +| [`contrib.py`](ballet/contrib.py) | collecting feature definitions from individual modules in source files in the file system | +| [`eng/base.py`](ballet/eng/base.py) | abstractions for transformers used in feature definitions, such as `BaseTransformer` | +| [`eng/{misc,missing,ts}.py`](ballet/eng/) | custom transformers for missing data, time series problems, and more | +| [`eng/external.py`](ballet/eng/external.py) | re-export of transformers from external libraries such as scikit-learn and feature_engine | +| [`feature.py`](ballet/feature.py) | the `Feature` abstraction | +| [`pipeline.py`](ballet/pipeline.py) | the `FeatureEngineeringPipeline` abstraction | +| [`project.py`](ballet/project.py) | the interface between a specific Ballet project and the core Ballet library, such as utilities to load project-specific information and the `Project` abstraction | +| [`templates/`](ballet/templates/) | cookiecutter templates for creating a new Ballet project or creating a new feature definition | +| [`templating.py`](ballet/templating.py) | user-facing functionality on top of the templates | +| [`transformer.py`](ballet/transformer.py) | wrappers for transformers that make them play nicely together in a pipeline | +| [`update.py`](ballet/update.py) | functionality to update the project template from a new upstream release | +| [`util/`](ballet/util/) | various utilities | +| [`validation/main.py`](ballet/validation/main.py) | entry point for all validation routines | +| [`validation/base.py`](ballet/validation/base.py) | abstractions used in validation such as the `FeaturePerformanceEvaluator` | +| [`validation/common.py`](ballet/validation/common.py) | common functionality used in validation, such as the ability to collect relevant changes between a current environment and a reference environment (such as a pull request vs a default branch) | +| [`validation/entropy.py`](ballet/validation/entropy.py) | statistical estimation routines used in feature definition selection algorithms, such as estimators for entropy, mutual information, and conditional mutual information | +| [`validation/feature_acceptance/`](ballet/validation/feature_acceptance/) | validation routines for feature acceptance +| [`validation/feature_pruning/`](ballet/validation/feature_pruning/) | validation routines for feature pruning | +| [`validation/feature_api/`](ballet/validation/feature_api/) | validation routines for feature APIs | +| [`validation/project_structure/`](ballet/validation/project_structure/) | validation routines for project structure | + + +# History + +## 0.19.5 (2021-07-17) + +* Fix bug with deepcopying `ballet.pipeline.FeatureEngineeringPipeline` + +## 0.19.4 (2021-07-17) + +* Fix bug with deepcopying `ballet.eng.base.SubsetTransformer` ([#90](https://github.com/ballet/ballet/issues/90)) +* Add `ballet.drop_missing_targets` primitive + +## 0.19.3 (2021-06-28) + +* Support missing targets in discovery and feature performance evaluation ([#89](https://github.com/ballet/ballet/pull/89)) +* Add `ninputs` to summary statistics in `ballet.discovery.discover` + +## 0.19.2 (2021-06-21) + +* Improve discrete column detection in the case of many repeated values +* Add `ncontinuous` and `ndiscrete` to summary statistics in `ballet.discovery.discover` + +## 0.19.1 (2021-06-20) + +* Defer computation of some expensive summary statistics in `ballet.discovery.discover` + +## 0.19.0 (2021-06-16) + +* Support callable as feature input ([#88](https://github.com/ballet/ballet/pull/88)) + +## 0.18.0 (2021-06-06) + +* Added [Consumer Guide](https://ballet.github.io/ballet/consumer_guide.html) +* Can use Ballet together with MLBlocks to engineer features and then use additional preprocessing and ML components ([#86](https://github.com/ballet/ballet/pull/86)) +* Can wrap the extracted feature matrix in a data frame with named columns derived from ``feature.output`` or ``feature.name`` +* Implemented `ballet.encoder.EncoderPipeline` to (mostly) mirror `ballet.pipeline.FeatureEngineeringPipeline` +* Can specify the dataset used for fitting the pipeline in the engineer-features CLI via `--train-dir path/to/train/dir` + +## 0.17.0 (2021-05-24) + +* Support nested transformers, both with nested features and with input/transformer tuples wrapped with SubsetTransformers ([#82](https://github.com/ballet/ballet/pull/82)) +* Allow `Client.discover` to skip summary statistics if development dataset cannot be loaded or if features produce errors + +## 0.16.0 (2021-05-22) + +* Add `Client.discover` functionality ([#80](https://github.com/ballet/ballet/pull/80)) +* Switch the order of `NullFiller` parameters to more closely resemble `fillna` signature + +## 0.15.2 (2021-05-14) + +* Operate columnwise in `VarianceThresholdAccepter`, rather than computing the variance of + the entire feature group. + +## 0.15.1 (2021-05-12) + +* Add debug logging for new accepters + +## 0.15.0 (2021-05-12) + +* Add `VarianceThresholdAccepter`, `MutualInformationAccepter`, and `CompoundAccepter` ([#76](https://github.com/ballet/ballet/pull/76)) + +## 0.14.0 (2021-05-11) + +* Support using holdout data splits in validation ([#75](https://github.com/ballet/ballet/pull/75)) +* Fix CLI program name in projects ([#74](https://github.com/ballet/ballet/pull/74)) +* Fix bug with `load_config` usage in python REPL ([#73](https://github.com/ballet/ballet/pull/73)) +* Reorganize external feature engineering primitives to `ballet/eng/external/**.py`. Imports like `from ballet.eng.external import MyPrimitive` are unaffected. + +## 0.13.1 (2021-04-02) + +* Fix upgrade check in `ballet update-project-template` to migrate away from deprecated PyPI XML-RPC API. + +## 0.13.0 (2021-03-30) + +* Fix links in project template + +## 0.12.0 (2021-03-10) + +* Automate creation of GitHub repository in quickstart + +## 0.11.0 (2021-03-04) + +* Allow validation to be run from topic branches locally + +## 0.10.0 (2021-02-23) + +* Add `Project.version` property + +## 0.9.0 (2021-02-16) + +* Add support for managed branching via `ballet start-new-feature --branching` (defaults to enabled) +* Remove confusing `ballet.project.config` attribute +* Implement `ballet.project.load_config` as a better alternative, and use this in the project template's `load_data` + +## 0.8.2 (2021-02-16) + +* Fix bug with `str(t)` or `repr(t)` for `DelegatingRobustTransformer` + +## 0.8.1 (2021-02-16) + +* Fix bug with `str(t)` or `repr(t)` for `SimpleFunctionTransformer` + +## 0.8.0 (2021-02-02) + +* Fix bug with detecting updates to Ballet due to PyPI API outage +* Fix some dependency conflicts +* Reference ballet-assemble in project template +* Bump feature_engine to 1.0 + +## 0.7.11 (2020-09-16) + +* Reduce verbosity of conversion approach logging by moving some messages to TRACE level +* Implement "else" transformer for `ConditionalTransformer` +* Improve GFSSF iteration logging + +## 0.7.10 (2020-09-08) + +* Fix bug with different treatment of y_df and y; now, y_df is passed to the feature engineering pipeline, and y is passed to the feature validation routines as applicable. +* Switch back to using Gitter + +## 0.7.9 (2020-08-15) + +* Add give_advice feature for FeatureAPICheck and other checks to log message on how to fix failure +* Improve logging of GFSSFAccepter and GFSSFPruner +* Improve `__str__` for DelegatingRobustTransformer and consequently consumers +* Change default log format to SIMPLE_LOG_FORMAT +* Various bug fixes and improvements + +## 0.7.8 (2020-08-13) + +* Add CanTransformNewRowsCheck to feature API checks + +## 0.7.7 (2020-08-12) + +* Support `None` as the transformer in a `Feature`, it will be automatically converted to an `IdentityTransformer` +* Implement `ColumnSelector` +* Update docs +* Various bug fixes and improvements + +## 0.7.6 (2020-08-12) + +* Re-export feature engineering primitives from various libraries +* Show type annotations in docs +* Update guides +* Various bug fixes and improvements + +## 0.7.5 (2020-08-03) + +* Make validator parameters configurable in ballet.yml file (e.g. 位_1 and 位_2 for GFSSF algorithms) +* Support dynaconf 3.x + +## 0.7.4 (2020-07-22) + +* Accept logger names, as well as logger instances, in `ballet.util.log.enable` +* Updated docs + +## 0.7.3 (2020-07-21) + +* Add `load_data` method with built-in caching to project API +* Fix bug in GFSSF accepter +* Always use encoded target during validation +* Various bug fixes and improvements + +## 0.7.2 (2020-07-21) + +* Add sample analysis notebook to project template +* Add binder url/badge to project template +* Fix bug with enabling logging with multiple loggers + +## 0.7.1 (2020-07-20) + +* Add client for easy interactive usage (`ballet.b`) +* Add binder setup to project template + +## 0.7 (2020-07-17) + +* Revamp project template: update project structure, create single API via FeatureEngineeringProject, use and add support for pyinvoke, revamp build into engineer_features, support repolockr bot +* Improve ballet.project.Project: can create by ascending from given path, can create from current working directory, can resolve arbitrary project symbol, exposes project's API +* Check for and notify of new release of ballet during project update (`ballet update-project-template`) +* Add ComputedValueTransformer to ballet.eng +* Move stacklog to separate project and install it +* Add validators that {never,always} accept submissions +* Add feature API checks to ensure that the feature can fit and transform a single row +* Add feature engineering guide to documentation and significantly expand contributor guide +* Add bot installation instructions to maintainer guide +* Add type annotations throughout +* Drop support for py35, add support for py38 +* Deprecate modeling code +* Various bug fixes and improvements + +## 0.6 (2019-11-12) + +* Implement GFSSF validators and random validators +* Improve validators and allow validators to be configured in ballet.yml +* Improve project template +* Create ballet CLI +* Bug fixes and performance improvements + +## 0.5 (2018-10-14) + +* Add project template and ballet-quickstart command +* Add project structure checks and feature API checks +* Implement multi-stage validation routine driver + +## 0.4 (2018-09-21) + +* Implement `Modeler` for versatile modeling and evaluation +* Change project name + +## 0.3 (2018-04-28) + +* Implement `PullRequestFeatureValidator` +* Add `util.travis`, `util.modutil`, `util.git` util modules + +## 0.2 (2018-04-11) + +* Implement `ArrayLikeEqualityTestingMixin` +* Implement `collect_contrib_features` + +## 0.1 (2018-04-08) + +* First release on PyPI + + + + +%package help +Summary: Development documents and examples for ballet +Provides: python3-ballet-doc +%description help +[](https://pypi.org/project/ballet) +[](https://github.com/ballet/ballet/actions?query=workflow%3A%22Tests%22) +[](https://codecov.io/gh/ballet/ballet) + + +# ballet + +A **light**weight framework for collaborative, open-source data science +projects through **feat**ure engineering. + +- Free software: MIT license +- Documentation: https://ballet.github.io/ballet +- Homepage: https://github.com/ballet/ballet + +## Overview + +Do you develop machine learning models? Do you work by yourself or on a team? +Do you share notebooks or are you committing code to a shared repository? In +contrast to successful, massively collaborative, open-source projects like +the Linux kernel, the Rails framework, Firefox, GNU, or Tensorflow, most +data science projects are developed by just a handful of people. But think if +the open-source community could leverage its ingenuity and determination to +collaboratively develop data science projects to predict the incidence of +disease in a population, to predict whether vulnerable children will be evicted +from their homes, or to predict whether learners will drop out of online +courses. + +Our vision is to make collaborative data science possible by making it more +like open-source software development. Our approach is based on decomposing the +data science process into modular patches +that can then be intelligently combined, representing objects like "feature definition", +"labeling function", or "prediction task definition". Collaborators work in +parallel to write patches and submit them to a repo. The core Ballet framework +provides the underlying functionality to merge high-quality contributions, +collect modules from the file system, and compose the accepted contributions +into a single product. It also provides [Assembl茅](https://github.com/ballet/ballet-assemble), a familiar notebook-based development +experience that is friendly to data scientists and other inexperienced +open-source contributors. We don't require any computing infrastructure beyond +that which is commonly used in open-source software development. + +Currently, Ballet focuses on supporting collaboratively developing +*feature engineering pipelines*, an important part of many data science +projects. Individual feature definitions are represented as separate Python modules, +declaring the subset of a dataframe that they operate on and a +scikit-learn-style learned transformer that extracts feature values from the +raw data. Ballet collects individual feature definitions and composes them into a +feature engineering pipeline. At any point, a project built on Ballet can be +installed for end-to-end feature engineering on new data instances for the +same problem. How do we ensure the feature engineering pipeline is always +useful? Ballet thoroughly validates proposed feature definitions for correctness and +machine learning performance, using an extensive test suite and a novel +streaming feature definition selection algorithm. Accepted feature definitions can be +automatically merged by the [Ballet Bot](https://github.com/ballet/ballet-bot) into projects. + +<img src="./docs/_static/feature_lifecycle.png" alt="Ballet Feature Lifecycle" width="400" /> + +## Next steps + +*Are you a data owner or project maintainer that wants to organize a +collaboration?* + +馃憠 Check out the [Ballet Maintainer Guide](https://ballet.github.io/ballet/maintainer_guide.html) + +*Are you a data scientist or enthusiast that wants to join a collaboration?* + +馃憠 Check out the [Ballet Contributor Guide](https://ballet.github.io/ballet/contributor_guide.html) + +*Want to learn about how Ballet enables Better Feature Engineering鈩笍?* + +馃憠 Check out the [Feature Engineering Guide](https://ballet.github.io/ballet/feature_engineering_guide.html) + +*Want to see a demo collaboration in progress and maybe even participate yourself?* + +馃憠 Check out the [ballet-predict-house-prices](https://github.com/HDI-Project/ballet-predict-house-prices) project + +## Source code organization + +This is a quick overview to the Ballet core source code organization. For more information about contributing to Ballet core itself, see [here](https://ballet.github.io/ballet/contributing.html). + +| path | description | +| ---- | ----------- | +| [`cli.py`](ballet/cli.py) | the `ballet` command line utility | +| [`client.py`](ballet/client.py) | the interactive client for users | +| [`contrib.py`](ballet/contrib.py) | collecting feature definitions from individual modules in source files in the file system | +| [`eng/base.py`](ballet/eng/base.py) | abstractions for transformers used in feature definitions, such as `BaseTransformer` | +| [`eng/{misc,missing,ts}.py`](ballet/eng/) | custom transformers for missing data, time series problems, and more | +| [`eng/external.py`](ballet/eng/external.py) | re-export of transformers from external libraries such as scikit-learn and feature_engine | +| [`feature.py`](ballet/feature.py) | the `Feature` abstraction | +| [`pipeline.py`](ballet/pipeline.py) | the `FeatureEngineeringPipeline` abstraction | +| [`project.py`](ballet/project.py) | the interface between a specific Ballet project and the core Ballet library, such as utilities to load project-specific information and the `Project` abstraction | +| [`templates/`](ballet/templates/) | cookiecutter templates for creating a new Ballet project or creating a new feature definition | +| [`templating.py`](ballet/templating.py) | user-facing functionality on top of the templates | +| [`transformer.py`](ballet/transformer.py) | wrappers for transformers that make them play nicely together in a pipeline | +| [`update.py`](ballet/update.py) | functionality to update the project template from a new upstream release | +| [`util/`](ballet/util/) | various utilities | +| [`validation/main.py`](ballet/validation/main.py) | entry point for all validation routines | +| [`validation/base.py`](ballet/validation/base.py) | abstractions used in validation such as the `FeaturePerformanceEvaluator` | +| [`validation/common.py`](ballet/validation/common.py) | common functionality used in validation, such as the ability to collect relevant changes between a current environment and a reference environment (such as a pull request vs a default branch) | +| [`validation/entropy.py`](ballet/validation/entropy.py) | statistical estimation routines used in feature definition selection algorithms, such as estimators for entropy, mutual information, and conditional mutual information | +| [`validation/feature_acceptance/`](ballet/validation/feature_acceptance/) | validation routines for feature acceptance +| [`validation/feature_pruning/`](ballet/validation/feature_pruning/) | validation routines for feature pruning | +| [`validation/feature_api/`](ballet/validation/feature_api/) | validation routines for feature APIs | +| [`validation/project_structure/`](ballet/validation/project_structure/) | validation routines for project structure | + + +# History + +## 0.19.5 (2021-07-17) + +* Fix bug with deepcopying `ballet.pipeline.FeatureEngineeringPipeline` + +## 0.19.4 (2021-07-17) + +* Fix bug with deepcopying `ballet.eng.base.SubsetTransformer` ([#90](https://github.com/ballet/ballet/issues/90)) +* Add `ballet.drop_missing_targets` primitive + +## 0.19.3 (2021-06-28) + +* Support missing targets in discovery and feature performance evaluation ([#89](https://github.com/ballet/ballet/pull/89)) +* Add `ninputs` to summary statistics in `ballet.discovery.discover` + +## 0.19.2 (2021-06-21) + +* Improve discrete column detection in the case of many repeated values +* Add `ncontinuous` and `ndiscrete` to summary statistics in `ballet.discovery.discover` + +## 0.19.1 (2021-06-20) + +* Defer computation of some expensive summary statistics in `ballet.discovery.discover` + +## 0.19.0 (2021-06-16) + +* Support callable as feature input ([#88](https://github.com/ballet/ballet/pull/88)) + +## 0.18.0 (2021-06-06) + +* Added [Consumer Guide](https://ballet.github.io/ballet/consumer_guide.html) +* Can use Ballet together with MLBlocks to engineer features and then use additional preprocessing and ML components ([#86](https://github.com/ballet/ballet/pull/86)) +* Can wrap the extracted feature matrix in a data frame with named columns derived from ``feature.output`` or ``feature.name`` +* Implemented `ballet.encoder.EncoderPipeline` to (mostly) mirror `ballet.pipeline.FeatureEngineeringPipeline` +* Can specify the dataset used for fitting the pipeline in the engineer-features CLI via `--train-dir path/to/train/dir` + +## 0.17.0 (2021-05-24) + +* Support nested transformers, both with nested features and with input/transformer tuples wrapped with SubsetTransformers ([#82](https://github.com/ballet/ballet/pull/82)) +* Allow `Client.discover` to skip summary statistics if development dataset cannot be loaded or if features produce errors + +## 0.16.0 (2021-05-22) + +* Add `Client.discover` functionality ([#80](https://github.com/ballet/ballet/pull/80)) +* Switch the order of `NullFiller` parameters to more closely resemble `fillna` signature + +## 0.15.2 (2021-05-14) + +* Operate columnwise in `VarianceThresholdAccepter`, rather than computing the variance of + the entire feature group. + +## 0.15.1 (2021-05-12) + +* Add debug logging for new accepters + +## 0.15.0 (2021-05-12) + +* Add `VarianceThresholdAccepter`, `MutualInformationAccepter`, and `CompoundAccepter` ([#76](https://github.com/ballet/ballet/pull/76)) + +## 0.14.0 (2021-05-11) + +* Support using holdout data splits in validation ([#75](https://github.com/ballet/ballet/pull/75)) +* Fix CLI program name in projects ([#74](https://github.com/ballet/ballet/pull/74)) +* Fix bug with `load_config` usage in python REPL ([#73](https://github.com/ballet/ballet/pull/73)) +* Reorganize external feature engineering primitives to `ballet/eng/external/**.py`. Imports like `from ballet.eng.external import MyPrimitive` are unaffected. + +## 0.13.1 (2021-04-02) + +* Fix upgrade check in `ballet update-project-template` to migrate away from deprecated PyPI XML-RPC API. + +## 0.13.0 (2021-03-30) + +* Fix links in project template + +## 0.12.0 (2021-03-10) + +* Automate creation of GitHub repository in quickstart + +## 0.11.0 (2021-03-04) + +* Allow validation to be run from topic branches locally + +## 0.10.0 (2021-02-23) + +* Add `Project.version` property + +## 0.9.0 (2021-02-16) + +* Add support for managed branching via `ballet start-new-feature --branching` (defaults to enabled) +* Remove confusing `ballet.project.config` attribute +* Implement `ballet.project.load_config` as a better alternative, and use this in the project template's `load_data` + +## 0.8.2 (2021-02-16) + +* Fix bug with `str(t)` or `repr(t)` for `DelegatingRobustTransformer` + +## 0.8.1 (2021-02-16) + +* Fix bug with `str(t)` or `repr(t)` for `SimpleFunctionTransformer` + +## 0.8.0 (2021-02-02) + +* Fix bug with detecting updates to Ballet due to PyPI API outage +* Fix some dependency conflicts +* Reference ballet-assemble in project template +* Bump feature_engine to 1.0 + +## 0.7.11 (2020-09-16) + +* Reduce verbosity of conversion approach logging by moving some messages to TRACE level +* Implement "else" transformer for `ConditionalTransformer` +* Improve GFSSF iteration logging + +## 0.7.10 (2020-09-08) + +* Fix bug with different treatment of y_df and y; now, y_df is passed to the feature engineering pipeline, and y is passed to the feature validation routines as applicable. +* Switch back to using Gitter + +## 0.7.9 (2020-08-15) + +* Add give_advice feature for FeatureAPICheck and other checks to log message on how to fix failure +* Improve logging of GFSSFAccepter and GFSSFPruner +* Improve `__str__` for DelegatingRobustTransformer and consequently consumers +* Change default log format to SIMPLE_LOG_FORMAT +* Various bug fixes and improvements + +## 0.7.8 (2020-08-13) + +* Add CanTransformNewRowsCheck to feature API checks + +## 0.7.7 (2020-08-12) + +* Support `None` as the transformer in a `Feature`, it will be automatically converted to an `IdentityTransformer` +* Implement `ColumnSelector` +* Update docs +* Various bug fixes and improvements + +## 0.7.6 (2020-08-12) + +* Re-export feature engineering primitives from various libraries +* Show type annotations in docs +* Update guides +* Various bug fixes and improvements + +## 0.7.5 (2020-08-03) + +* Make validator parameters configurable in ballet.yml file (e.g. 位_1 and 位_2 for GFSSF algorithms) +* Support dynaconf 3.x + +## 0.7.4 (2020-07-22) + +* Accept logger names, as well as logger instances, in `ballet.util.log.enable` +* Updated docs + +## 0.7.3 (2020-07-21) + +* Add `load_data` method with built-in caching to project API +* Fix bug in GFSSF accepter +* Always use encoded target during validation +* Various bug fixes and improvements + +## 0.7.2 (2020-07-21) + +* Add sample analysis notebook to project template +* Add binder url/badge to project template +* Fix bug with enabling logging with multiple loggers + +## 0.7.1 (2020-07-20) + +* Add client for easy interactive usage (`ballet.b`) +* Add binder setup to project template + +## 0.7 (2020-07-17) + +* Revamp project template: update project structure, create single API via FeatureEngineeringProject, use and add support for pyinvoke, revamp build into engineer_features, support repolockr bot +* Improve ballet.project.Project: can create by ascending from given path, can create from current working directory, can resolve arbitrary project symbol, exposes project's API +* Check for and notify of new release of ballet during project update (`ballet update-project-template`) +* Add ComputedValueTransformer to ballet.eng +* Move stacklog to separate project and install it +* Add validators that {never,always} accept submissions +* Add feature API checks to ensure that the feature can fit and transform a single row +* Add feature engineering guide to documentation and significantly expand contributor guide +* Add bot installation instructions to maintainer guide +* Add type annotations throughout +* Drop support for py35, add support for py38 +* Deprecate modeling code +* Various bug fixes and improvements + +## 0.6 (2019-11-12) + +* Implement GFSSF validators and random validators +* Improve validators and allow validators to be configured in ballet.yml +* Improve project template +* Create ballet CLI +* Bug fixes and performance improvements + +## 0.5 (2018-10-14) + +* Add project template and ballet-quickstart command +* Add project structure checks and feature API checks +* Implement multi-stage validation routine driver + +## 0.4 (2018-09-21) + +* Implement `Modeler` for versatile modeling and evaluation +* Change project name + +## 0.3 (2018-04-28) + +* Implement `PullRequestFeatureValidator` +* Add `util.travis`, `util.modutil`, `util.git` util modules + +## 0.2 (2018-04-11) + +* Implement `ArrayLikeEqualityTestingMixin` +* Implement `collect_contrib_features` + +## 0.1 (2018-04-08) + +* First release on PyPI + + + + +%prep +%autosetup -n ballet-0.19.5 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-ballet -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed May 10 2023 Python_Bot <Python_Bot@openeuler.org> - 0.19.5-1 +- Package Spec generated @@ -0,0 +1 @@ +2bc0a06d446c20236173f12e68061f58 ballet-0.19.5.tar.gz |
