%global _empty_manifest_terminate_build 0 Name: python-dataenforce Version: 0.1.2 Release: 1 Summary: Enforce column names & data types of pandas DataFrames License: Apache Software License URL: https://github.com/CedricFR/dataenforce Source0: https://mirrors.nju.edu.cn/pypi/web/packages/e0/88/ecaec8b4c615c9368028ee1369e9251cb9278b16d691a941ae1f39bc9af6/dataenforce-0.1.2.tar.gz BuildArch: noarch %description # Overview `dataenforce` is a Python package used to enforce column names & types of pandas DataFrames using Python 3 type hinting. It is a common issue in Data Analysis to pass dataframes into functions without a clear idea of which columns are included or not, and as columns are added to or removed from input data, code can break in unexpected ways. With `dataenforce`, you can provide a clear interface to your functions and ensure that the input dataframes will have the right format when your code is used. # How to install Install with pip: ``` pip install dataenforce ``` You can also pip install it from the sources, or just import the `dataenforce` folder. # How to use There are two parts in `dataenforce`: the type-hinting part, and the validation. You can use type-hinting with the provided class to indicate what shape the input dataframes should have, and the validation decorator to additionally ensure the format is respected in every function call. ## Type-hinting: `Dataset` The `Dataset` type indicates that we expect a `pandas.DataFrame` ### Column name checking ``` from dataenforce import Dataset def process_data(data: Dataset["id", "name", "location"]) pass ``` The code above specifies that `data` must be a DataFrame with exactly the 3 mentioned columns. If you want to only specify a subset of columns which is required, you can use an ellipsis: ``` def process_data(data: Dataset["id", "name", "location", ...]) pass ``` ### dtype checking ``` def process_data(data: Dataset["id": int, "name": object, "latitude": float, "longitude": float]) pass ``` The code above specifies the column names which must be there, with associated types. A combination of only names & with types is possible: `Dataset["id": int, "name"]`. ### Reusing dataframe formats As you're likely to use the same column subsets several times in your code, you can define them to reuse & combine them later: ``` DName = Dataset["id", "name"] DLocation = Dataset["id", "latitude", "longitude"] # Expects columns id, name def process1(data: DName): pass # Expects columns id, name, latitude, longitude, timestamp def process2(data: Dataset[DName, DLocation, "timestamp"]) pass ``` ## Enforcing: `@validate` The `@validate` decorator ensures that input `Dataset`s have the right format when the function is called, otherwise raises `TypeError`. ``` from dataenforce import Dataset, validate import pandas as pd @validate def process_data(data: Dataset["id", "name"]): pass process_data(pd.DataFrame(dict(id=[1,2], name=["Alice", "Bob"]))) # Works process_data(pd.DataFrame(dict(id=[1,2]))) # Raises a TypeError, column name missing ``` # How to test `dataenforce` uses `pytest` as a testing library. If you have `pytest` installed, just run `PYTHONPATH="." pytest` in the command line while being in the root folder. # Notes * You can use `dataenforce` to type-hint the return value of a function, but it is not currently possible to `validate` it (it is not included in the checks) * You can't use `@validate` on a function where you use non-base class type-hints as strings (like `def f() -> "MyClass"`). Issue related to PEP 563 * This work is at experimental state. It is not production-ready. Please raise issues & send pull requests if you find/solve some bugs * `dataenforce` is released under the Apache License 2.0, meaning you can freely use the library and redistribute it, provided Copyright is kept * Dependencies: Pandas & Numpy * Tested with Python 3.6, 3.7, 3.8 %package -n python3-dataenforce Summary: Enforce column names & data types of pandas DataFrames Provides: python-dataenforce BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-dataenforce # Overview `dataenforce` is a Python package used to enforce column names & types of pandas DataFrames using Python 3 type hinting. It is a common issue in Data Analysis to pass dataframes into functions without a clear idea of which columns are included or not, and as columns are added to or removed from input data, code can break in unexpected ways. With `dataenforce`, you can provide a clear interface to your functions and ensure that the input dataframes will have the right format when your code is used. # How to install Install with pip: ``` pip install dataenforce ``` You can also pip install it from the sources, or just import the `dataenforce` folder. # How to use There are two parts in `dataenforce`: the type-hinting part, and the validation. You can use type-hinting with the provided class to indicate what shape the input dataframes should have, and the validation decorator to additionally ensure the format is respected in every function call. ## Type-hinting: `Dataset` The `Dataset` type indicates that we expect a `pandas.DataFrame` ### Column name checking ``` from dataenforce import Dataset def process_data(data: Dataset["id", "name", "location"]) pass ``` The code above specifies that `data` must be a DataFrame with exactly the 3 mentioned columns. If you want to only specify a subset of columns which is required, you can use an ellipsis: ``` def process_data(data: Dataset["id", "name", "location", ...]) pass ``` ### dtype checking ``` def process_data(data: Dataset["id": int, "name": object, "latitude": float, "longitude": float]) pass ``` The code above specifies the column names which must be there, with associated types. A combination of only names & with types is possible: `Dataset["id": int, "name"]`. ### Reusing dataframe formats As you're likely to use the same column subsets several times in your code, you can define them to reuse & combine them later: ``` DName = Dataset["id", "name"] DLocation = Dataset["id", "latitude", "longitude"] # Expects columns id, name def process1(data: DName): pass # Expects columns id, name, latitude, longitude, timestamp def process2(data: Dataset[DName, DLocation, "timestamp"]) pass ``` ## Enforcing: `@validate` The `@validate` decorator ensures that input `Dataset`s have the right format when the function is called, otherwise raises `TypeError`. ``` from dataenforce import Dataset, validate import pandas as pd @validate def process_data(data: Dataset["id", "name"]): pass process_data(pd.DataFrame(dict(id=[1,2], name=["Alice", "Bob"]))) # Works process_data(pd.DataFrame(dict(id=[1,2]))) # Raises a TypeError, column name missing ``` # How to test `dataenforce` uses `pytest` as a testing library. If you have `pytest` installed, just run `PYTHONPATH="." pytest` in the command line while being in the root folder. # Notes * You can use `dataenforce` to type-hint the return value of a function, but it is not currently possible to `validate` it (it is not included in the checks) * You can't use `@validate` on a function where you use non-base class type-hints as strings (like `def f() -> "MyClass"`). Issue related to PEP 563 * This work is at experimental state. It is not production-ready. Please raise issues & send pull requests if you find/solve some bugs * `dataenforce` is released under the Apache License 2.0, meaning you can freely use the library and redistribute it, provided Copyright is kept * Dependencies: Pandas & Numpy * Tested with Python 3.6, 3.7, 3.8 %package help Summary: Development documents and examples for dataenforce Provides: python3-dataenforce-doc %description help # Overview `dataenforce` is a Python package used to enforce column names & types of pandas DataFrames using Python 3 type hinting. It is a common issue in Data Analysis to pass dataframes into functions without a clear idea of which columns are included or not, and as columns are added to or removed from input data, code can break in unexpected ways. With `dataenforce`, you can provide a clear interface to your functions and ensure that the input dataframes will have the right format when your code is used. # How to install Install with pip: ``` pip install dataenforce ``` You can also pip install it from the sources, or just import the `dataenforce` folder. # How to use There are two parts in `dataenforce`: the type-hinting part, and the validation. You can use type-hinting with the provided class to indicate what shape the input dataframes should have, and the validation decorator to additionally ensure the format is respected in every function call. ## Type-hinting: `Dataset` The `Dataset` type indicates that we expect a `pandas.DataFrame` ### Column name checking ``` from dataenforce import Dataset def process_data(data: Dataset["id", "name", "location"]) pass ``` The code above specifies that `data` must be a DataFrame with exactly the 3 mentioned columns. If you want to only specify a subset of columns which is required, you can use an ellipsis: ``` def process_data(data: Dataset["id", "name", "location", ...]) pass ``` ### dtype checking ``` def process_data(data: Dataset["id": int, "name": object, "latitude": float, "longitude": float]) pass ``` The code above specifies the column names which must be there, with associated types. A combination of only names & with types is possible: `Dataset["id": int, "name"]`. ### Reusing dataframe formats As you're likely to use the same column subsets several times in your code, you can define them to reuse & combine them later: ``` DName = Dataset["id", "name"] DLocation = Dataset["id", "latitude", "longitude"] # Expects columns id, name def process1(data: DName): pass # Expects columns id, name, latitude, longitude, timestamp def process2(data: Dataset[DName, DLocation, "timestamp"]) pass ``` ## Enforcing: `@validate` The `@validate` decorator ensures that input `Dataset`s have the right format when the function is called, otherwise raises `TypeError`. ``` from dataenforce import Dataset, validate import pandas as pd @validate def process_data(data: Dataset["id", "name"]): pass process_data(pd.DataFrame(dict(id=[1,2], name=["Alice", "Bob"]))) # Works process_data(pd.DataFrame(dict(id=[1,2]))) # Raises a TypeError, column name missing ``` # How to test `dataenforce` uses `pytest` as a testing library. If you have `pytest` installed, just run `PYTHONPATH="." pytest` in the command line while being in the root folder. # Notes * You can use `dataenforce` to type-hint the return value of a function, but it is not currently possible to `validate` it (it is not included in the checks) * You can't use `@validate` on a function where you use non-base class type-hints as strings (like `def f() -> "MyClass"`). Issue related to PEP 563 * This work is at experimental state. It is not production-ready. Please raise issues & send pull requests if you find/solve some bugs * `dataenforce` is released under the Apache License 2.0, meaning you can freely use the library and redistribute it, provided Copyright is kept * Dependencies: Pandas & Numpy * Tested with Python 3.6, 3.7, 3.8 %prep %autosetup -n dataenforce-0.1.2 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-dataenforce -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Wed May 10 2023 Python_Bot - 0.1.2-1 - Package Spec generated