%global _empty_manifest_terminate_build 0
Name:		python-dataenforce
Version:	0.1.2
Release:	1
Summary:	Enforce column names & data types of pandas DataFrames
License:	Apache Software License
URL:		https://github.com/CedricFR/dataenforce
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/e0/88/ecaec8b4c615c9368028ee1369e9251cb9278b16d691a941ae1f39bc9af6/dataenforce-0.1.2.tar.gz
BuildArch:	noarch


%description
# Overview

`dataenforce` is a Python package used to enforce column names & types of pandas DataFrames using Python 3 type hinting.

It is a common issue in Data Analysis to pass dataframes into functions without a clear idea of which columns are included or not, and as columns are added to or removed from input data, code can break in unexpected ways. With `dataenforce`, you can provide a clear interface to your functions and ensure that the input dataframes will have the right format when your code is used.

# How to install

Install with pip:
```
pip install dataenforce
```

You can also pip install it from the sources, or just import the `dataenforce` folder.

# How to use

There are two parts in `dataenforce`: the type-hinting part, and the validation. You can use type-hinting with the provided class to indicate what shape the input dataframes should have, and the validation decorator to additionally ensure the format is respected in every function call.

## Type-hinting: `Dataset`

The `Dataset` type indicates that we expect a `pandas.DataFrame`

### Column name checking

```
from dataenforce import Dataset

def process_data(data: Dataset["id", "name", "location"])
  pass
```

The code above specifies that `data` must be a DataFrame with exactly the 3 mentioned columns. If you want to only specify a subset of columns which is required, you can use an ellipsis:
```
def process_data(data: Dataset["id", "name", "location", ...])
  pass
```

### dtype checking

```
def process_data(data: Dataset["id": int, "name": object, "latitude": float, "longitude": float])
  pass
```

The code above specifies the column names which must be there, with associated types. A combination of only names & with types is possible: `Dataset["id": int, "name"]`.

### Reusing dataframe formats

As you're likely to use the same column subsets several times in your code, you can define them to reuse & combine them later:
```
DName = Dataset["id", "name"]
DLocation = Dataset["id", "latitude", "longitude"]

# Expects columns id, name
def process1(data: DName):
  pass

# Expects columns id, name, latitude, longitude, timestamp
def process2(data: Dataset[DName, DLocation, "timestamp"])
  pass
```

## Enforcing: `@validate`

The `@validate` decorator ensures that input `Dataset`s have the right format when the function is called, otherwise raises `TypeError`.

```
from dataenforce import Dataset, validate
import pandas as pd

@validate
def process_data(data: Dataset["id", "name"]):
  pass

process_data(pd.DataFrame(dict(id=[1,2], name=["Alice", "Bob"]))) # Works
process_data(pd.DataFrame(dict(id=[1,2]))) # Raises a TypeError, column name missing
```

# How to test

`dataenforce` uses `pytest` as a testing library. If you have `pytest` installed, just run `PYTHONPATH="." pytest` in the command line while being in the root folder.

# Notes

* You can use `dataenforce` to type-hint the return value of a function, but it is not currently possible to `validate` it (it is not included in the checks)
* You can't use `@validate` on a function where you use non-base class type-hints as strings (like `def f() -> "MyClass"`). Issue related to PEP 563
* This work is at experimental state. It is not production-ready. Please raise issues & send pull requests if you find/solve some bugs
* `dataenforce` is released under the Apache License 2.0, meaning you can freely use the library and redistribute it, provided Copyright is kept
* Dependencies: Pandas & Numpy
* Tested with Python 3.6, 3.7, 3.8


%package -n python3-dataenforce
Summary:	Enforce column names & data types of pandas DataFrames
Provides:	python-dataenforce
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-dataenforce
# Overview

`dataenforce` is a Python package used to enforce column names & types of pandas DataFrames using Python 3 type hinting.

It is a common issue in Data Analysis to pass dataframes into functions without a clear idea of which columns are included or not, and as columns are added to or removed from input data, code can break in unexpected ways. With `dataenforce`, you can provide a clear interface to your functions and ensure that the input dataframes will have the right format when your code is used.

# How to install

Install with pip:
```
pip install dataenforce
```

You can also pip install it from the sources, or just import the `dataenforce` folder.

# How to use

There are two parts in `dataenforce`: the type-hinting part, and the validation. You can use type-hinting with the provided class to indicate what shape the input dataframes should have, and the validation decorator to additionally ensure the format is respected in every function call.

## Type-hinting: `Dataset`

The `Dataset` type indicates that we expect a `pandas.DataFrame`

### Column name checking

```
from dataenforce import Dataset

def process_data(data: Dataset["id", "name", "location"])
  pass
```

The code above specifies that `data` must be a DataFrame with exactly the 3 mentioned columns. If you want to only specify a subset of columns which is required, you can use an ellipsis:
```
def process_data(data: Dataset["id", "name", "location", ...])
  pass
```

### dtype checking

```
def process_data(data: Dataset["id": int, "name": object, "latitude": float, "longitude": float])
  pass
```

The code above specifies the column names which must be there, with associated types. A combination of only names & with types is possible: `Dataset["id": int, "name"]`.

### Reusing dataframe formats

As you're likely to use the same column subsets several times in your code, you can define them to reuse & combine them later:
```
DName = Dataset["id", "name"]
DLocation = Dataset["id", "latitude", "longitude"]

# Expects columns id, name
def process1(data: DName):
  pass

# Expects columns id, name, latitude, longitude, timestamp
def process2(data: Dataset[DName, DLocation, "timestamp"])
  pass
```

## Enforcing: `@validate`

The `@validate` decorator ensures that input `Dataset`s have the right format when the function is called, otherwise raises `TypeError`.

```
from dataenforce import Dataset, validate
import pandas as pd

@validate
def process_data(data: Dataset["id", "name"]):
  pass

process_data(pd.DataFrame(dict(id=[1,2], name=["Alice", "Bob"]))) # Works
process_data(pd.DataFrame(dict(id=[1,2]))) # Raises a TypeError, column name missing
```

# How to test

`dataenforce` uses `pytest` as a testing library. If you have `pytest` installed, just run `PYTHONPATH="." pytest` in the command line while being in the root folder.

# Notes

* You can use `dataenforce` to type-hint the return value of a function, but it is not currently possible to `validate` it (it is not included in the checks)
* You can't use `@validate` on a function where you use non-base class type-hints as strings (like `def f() -> "MyClass"`). Issue related to PEP 563
* This work is at experimental state. It is not production-ready. Please raise issues & send pull requests if you find/solve some bugs
* `dataenforce` is released under the Apache License 2.0, meaning you can freely use the library and redistribute it, provided Copyright is kept
* Dependencies: Pandas & Numpy
* Tested with Python 3.6, 3.7, 3.8


%package help
Summary:	Development documents and examples for dataenforce
Provides:	python3-dataenforce-doc
%description help
# Overview

`dataenforce` is a Python package used to enforce column names & types of pandas DataFrames using Python 3 type hinting.

It is a common issue in Data Analysis to pass dataframes into functions without a clear idea of which columns are included or not, and as columns are added to or removed from input data, code can break in unexpected ways. With `dataenforce`, you can provide a clear interface to your functions and ensure that the input dataframes will have the right format when your code is used.

# How to install

Install with pip:
```
pip install dataenforce
```

You can also pip install it from the sources, or just import the `dataenforce` folder.

# How to use

There are two parts in `dataenforce`: the type-hinting part, and the validation. You can use type-hinting with the provided class to indicate what shape the input dataframes should have, and the validation decorator to additionally ensure the format is respected in every function call.

## Type-hinting: `Dataset`

The `Dataset` type indicates that we expect a `pandas.DataFrame`

### Column name checking

```
from dataenforce import Dataset

def process_data(data: Dataset["id", "name", "location"])
  pass
```

The code above specifies that `data` must be a DataFrame with exactly the 3 mentioned columns. If you want to only specify a subset of columns which is required, you can use an ellipsis:
```
def process_data(data: Dataset["id", "name", "location", ...])
  pass
```

### dtype checking

```
def process_data(data: Dataset["id": int, "name": object, "latitude": float, "longitude": float])
  pass
```

The code above specifies the column names which must be there, with associated types. A combination of only names & with types is possible: `Dataset["id": int, "name"]`.

### Reusing dataframe formats

As you're likely to use the same column subsets several times in your code, you can define them to reuse & combine them later:
```
DName = Dataset["id", "name"]
DLocation = Dataset["id", "latitude", "longitude"]

# Expects columns id, name
def process1(data: DName):
  pass

# Expects columns id, name, latitude, longitude, timestamp
def process2(data: Dataset[DName, DLocation, "timestamp"])
  pass
```

## Enforcing: `@validate`

The `@validate` decorator ensures that input `Dataset`s have the right format when the function is called, otherwise raises `TypeError`.

```
from dataenforce import Dataset, validate
import pandas as pd

@validate
def process_data(data: Dataset["id", "name"]):
  pass

process_data(pd.DataFrame(dict(id=[1,2], name=["Alice", "Bob"]))) # Works
process_data(pd.DataFrame(dict(id=[1,2]))) # Raises a TypeError, column name missing
```

# How to test

`dataenforce` uses `pytest` as a testing library. If you have `pytest` installed, just run `PYTHONPATH="." pytest` in the command line while being in the root folder.

# Notes

* You can use `dataenforce` to type-hint the return value of a function, but it is not currently possible to `validate` it (it is not included in the checks)
* You can't use `@validate` on a function where you use non-base class type-hints as strings (like `def f() -> "MyClass"`). Issue related to PEP 563
* This work is at experimental state. It is not production-ready. Please raise issues & send pull requests if you find/solve some bugs
* `dataenforce` is released under the Apache License 2.0, meaning you can freely use the library and redistribute it, provided Copyright is kept
* Dependencies: Pandas & Numpy
* Tested with Python 3.6, 3.7, 3.8


%prep
%autosetup -n dataenforce-0.1.2

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-dataenforce -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Wed May 10 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.2-1
- Package Spec generated