%global _empty_manifest_terminate_build 0 Name: python-pandera Version: 0.14.5 Release: 1 Summary: A light-weight and flexible data validation and testing tool for statistical data objects. License: MIT URL: https://github.com/pandera-dev/pandera Source0: https://mirrors.nju.edu.cn/pypi/web/packages/7b/09/ce690eb6248a37a773e975998fd291e3094c2410649a61ac0c3378814e50/pandera-0.14.5.tar.gz BuildArch: noarch Requires: python3-multimethod Requires: python3-numpy Requires: python3-packaging Requires: python3-pandas Requires: python3-pydantic Requires: python3-typing-inspect Requires: python3-wrapt Requires: python3-typing-extensions Requires: python3-black Requires: python3-pandas-stubs Requires: python3-fastapi Requires: python3-ray Requires: python3-dask Requires: python3-geopandas Requires: python3-pyspark Requires: python3-scipy Requires: python3-pyyaml Requires: python3-shapely Requires: python3-modin Requires: python3-frictionless Requires: python3-hypothesis Requires: python3-dask Requires: python3-fastapi Requires: python3-geopandas Requires: python3-shapely Requires: python3-scipy Requires: python3-pyyaml Requires: python3-black Requires: python3-frictionless Requires: python3-modin Requires: python3-ray Requires: python3-dask Requires: python3-modin Requires: python3-dask Requires: python3-modin Requires: python3-ray Requires: python3-pandas-stubs Requires: python3-pyspark Requires: python3-hypothesis %description

# A Statistical Data Testing Toolkit *A data validation library for scientists, engineers, and analysts seeking correctness.*
[![CI Build](https://github.com/pandera-dev/pandera/workflows/CI%20Tests/badge.svg?branch=main)](https://github.com/pandera-dev/pandera/actions?query=workflow%3A%22CI+Tests%22+branch%3Amain) [![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=stable)](https://pandera.readthedocs.io/en/stable/?badge=stable) [![PyPI version shields.io](https://img.shields.io/pypi/v/pandera.svg)](https://pypi.org/project/pandera/) [![PyPI license](https://img.shields.io/pypi/l/pandera.svg)](https://pypi.python.org/pypi/) [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/12) [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=latest)](https://pandera.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/unionai-oss/pandera/branch/main/graph/badge.svg)](https://codecov.io/gh/pandera-dev/pandera) [![PyPI pyversions](https://img.shields.io/pypi/pyversions/pandera.svg)](https://pypi.python.org/pypi/pandera/) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3385265.svg)](https://doi.org/10.5281/zenodo.3385265) [![asv](http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=flat)](https://pandera-dev.github.io/pandera-asv-logs/) [![Downloads](https://pepy.tech/badge/pandera/month)](https://pepy.tech/project/pandera) [![Downloads](https://pepy.tech/badge/pandera)](https://pepy.tech/project/pandera) [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandera?label=conda%20downloads)](https://anaconda.org/conda-forge/pandera) [![Discord](https://img.shields.io/badge/discord-chat-purple?color=%235765F2&label=discord&logo=discord)](https://discord.gg/vyanhWuaKB) `pandera` provides a flexible and expressive API for performing data validation on dataframe-like objects to make data processing pipelines more readable and robust. Dataframes contain information that `pandera` explicitly validates at runtime. This is useful in production-critical or reproducible research settings. With `pandera`, you can: 1. Define a schema once and use it to validate [different dataframe types](https://pandera.readthedocs.io/en/stable/supported_libraries.html) including [pandas](http://pandas.pydata.org), [dask](https://dask.org), [modin](https://modin.readthedocs.io/), and [pyspark](https://spark.apache.org/docs/3.2.0/api/python/user_guide/pandas_on_spark/index.html). 1. [Check](https://pandera.readthedocs.io/en/stable/checks.html) the types and properties of columns in a `DataFrame` or values in a `Series`. 1. Perform more complex statistical validation like [hypothesis testing](https://pandera.readthedocs.io/en/stable/hypothesis.html#hypothesis). 1. Seamlessly integrate with existing data analysis/processing pipelines via [function decorators](https://pandera.readthedocs.io/en/stable/decorators.html#decorators). 1. Define dataframe models with the [class-based API](https://pandera.readthedocs.io/en/stable/dataframe_models.html#dataframe-models) with pydantic-style syntax and validate dataframes using the typing syntax. 1. [Synthesize data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html#data-synthesis-strategies) from schema objects for property-based testing with pandas data structures. 1. [Lazily Validate](https://pandera.readthedocs.io/en/stable/lazy_validation.html) dataframes so that all validation checks are executed before raising an error. 1. [Integrate](https://pandera.readthedocs.io/en/stable/integrations.html) with a rich ecosystem of python tools like [pydantic](https://pydantic-docs.helpmanual.io), [fastapi](https://fastapi.tiangolo.com/), and [mypy](http://mypy-lang.org/). ## Documentation The official documentation is hosted on ReadTheDocs: https://pandera.readthedocs.io ## Install Using pip: ``` pip install pandera ``` Using conda: ``` conda install -c conda-forge pandera ``` ### Extras Installing additional functionality:
pip ```bash pip install pandera[hypotheses] # hypothesis checks pip install pandera[io] # yaml/script schema io utilities pip install pandera[strategies] # data synthesis strategies pip install pandera[mypy] # enable static type-linting of pandas pip install pandera[fastapi] # fastapi integration pip install pandera[dask] # validate dask dataframes pip install pandera[pyspark] # validate pyspark dataframes pip install pandera[modin] # validate modin dataframes pip install pandera[modin-ray] # validate modin dataframes with ray pip install pandera[modin-dask] # validate modin dataframes with dask pip install pandera[geopandas] # validate geopandas geodataframes ```
conda ```bash conda install -c conda-forge pandera-hypotheses # hypothesis checks conda install -c conda-forge pandera-io # yaml/script schema io utilities conda install -c conda-forge pandera-strategies # data synthesis strategies conda install -c conda-forge pandera-mypy # enable static type-linting of pandas conda install -c conda-forge pandera-fastapi # fastapi integration conda install -c conda-forge pandera-dask # validate dask dataframes conda install -c conda-forge pandera-pyspark # validate pyspark dataframes conda install -c conda-forge pandera-modin # validate modin dataframes conda install -c conda-forge pandera-modin-ray # validate modin dataframes with ray conda install -c conda-forge pandera-modin-dask # validate modin dataframes with dask conda install -c conda-forge pandera-geopandas # validate geopandas geodataframes ```
## Quick Start ```python import pandas as pd import pandera as pa # data to validate df = pd.DataFrame({ "column1": [1, 4, 0, 10, 9], "column2": [-1.3, -1.4, -2.9, -10.1, -20.4], "column3": ["value_1", "value_2", "value_3", "value_2", "value_1"] }) # define schema schema = pa.DataFrameSchema({ "column1": pa.Column(int, checks=pa.Check.le(10)), "column2": pa.Column(float, checks=pa.Check.lt(-1.2)), "column3": pa.Column(str, checks=[ pa.Check.str_startswith("value_"), # define custom checks as functions that take a series as input and # outputs a boolean or boolean Series pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2) ]), }) validated_df = schema(df) print(validated_df) # column1 column2 column3 # 0 1 -1.3 value_1 # 1 4 -1.4 value_2 # 2 0 -2.9 value_3 # 3 10 -10.1 value_2 # 4 9 -20.4 value_1 ``` ## DataFrame Model `pandera` also provides an alternative API for expressing schemas inspired by [dataclasses](https://docs.python.org/3/library/dataclasses.html) and [pydantic](https://pydantic-docs.helpmanual.io/). The equivalent `DataFrameModel` for the above `DataFrameSchema` would be: ```python from pandera.typing import Series class Schema(pa.DataFrameModel): column1: Series[int] = pa.Field(le=10) column2: Series[float] = pa.Field(lt=-1.2) column3: Series[str] = pa.Field(str_startswith="value_") @pa.check("column3") def column_3_check(cls, series: Series[str]) -> Series[bool]: """Check that values have two elements after being split with '_'""" return series.str.split("_", expand=True).shape[1] == 2 Schema.validate(df) ``` ## Development Installation ``` git clone https://github.com/pandera-dev/pandera.git cd pandera pip install -r requirements-dev.txt pip install -e . ``` ## Tests ``` pip install pytest pytest tests ``` ## Contributing to pandera [![GitHub contributors](https://img.shields.io/github/contributors/pandera-dev/pandera.svg)](https://github.com/pandera-dev/pandera/graphs/contributors) All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. A detailed overview on how to contribute can be found in the [contributing guide](https://github.com/pandera-dev/pandera/blob/main/.github/CONTRIBUTING.md) on GitHub. ## Issues Go [here](https://github.com/pandera-dev/pandera/issues) to submit feature requests or bugfixes. ## Need Help? There are many ways of getting help with your questions. You can ask a question on [Github Discussions](https://github.com/pandera-dev/pandera/discussions/categories/q-a) page or reach out to the maintainers and pandera community on [Discord](https://discord.gg/vyanhWuaKB) ## Why `pandera`? - [dataframe-centric data types](https://pandera.readthedocs.io/en/stable/dtypes.html), [column nullability](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#null-values-in-columns), and [uniqueness](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#validating-the-joint-uniqueness-of-columns) are first-class concepts. - Define [dataframe models](https://pandera.readthedocs.io/en/stable/schema_models.html) with the class-based API with [pydantic](https://pydantic-docs.helpmanual.io/)-style syntax and validate dataframes using the typing syntax. - `check_input` and `check_output` [decorators](https://pandera.readthedocs.io/en/stable/decorators.html#decorators-for-pipeline-integration) enable seamless integration with existing code. - [`Check`s](https://pandera.readthedocs.io/en/stable/checks.html) provide flexibility and performance by providing access to `pandas` API by design and offers built-in checks for common data tests. - [`Hypothesis`](https://pandera.readthedocs.io/en/stable/hypothesis.html) class provides a tidy-first interface for statistical hypothesis testing. - `Check`s and `Hypothesis` objects support both [tidy and wide data validation](https://pandera.readthedocs.io/en/stable/checks.html#wide-checks). - Use schemas as generative contracts to [synthesize data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html) for unit testing. - [Schema inference](https://pandera.readthedocs.io/en/stable/schema_inference.html) allows you to bootstrap schemas from data. ## Alternative Data Validation Libraries Here are a few other alternatives for validating Python data structures. **Generic Python object data validation** - [voloptuous](https://github.com/alecthomas/voluptuous) - [schema](https://github.com/keleshev/schema) **`pandas`-specific data validation** - [opulent-pandas](https://github.com/danielvdende/opulent-pandas) - [PandasSchema](https://github.com/TMiguelT/PandasSchema) - [pandas-validator](https://github.com/c-data/pandas-validator) - [table_enforcer](https://github.com/xguse/table_enforcer) - [dataenforce](https://github.com/CedricFR/dataenforce) - [strictly typed pandas](https://github.com/nanne-aben/strictly_typed_pandas) - [marshmallow-dataframe](https://github.com/facultyai/marshmallow-dataframe) **Other tools for data validation** - [great_expectations](https://github.com/great-expectations/great_expectations) - [frictionless schema](https://framework.frictionlessdata.io/docs/guides/framework/schema-guide/) ## How to Cite If you use `pandera` in the context of academic or industry research, please consider citing the **paper** and/or **software package**. ### [Paper](https://conference.scipy.org/proceedings/scipy2020/niels_bantilan.html) ``` @InProceedings{ niels_bantilan-proc-scipy-2020, author = { {N}iels {B}antilan }, title = { pandera: {S}tatistical {D}ata {V}alidation of {P}andas {D}ataframes }, booktitle = { {P}roceedings of the 19th {P}ython in {S}cience {C}onference }, pages = { 116 - 124 }, year = { 2020 }, editor = { {M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe }, doi = { 10.25080/Majora-342d178e-010 } } ``` ### Software Package [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3385265.svg)](https://doi.org/10.5281/zenodo.3385265) ## License and Credits `pandera` is licensed under the [MIT license](license.txt) and is written and maintained by Niels Bantilan (niels@pandera.ci) %package -n python3-pandera Summary: A light-weight and flexible data validation and testing tool for statistical data objects. Provides: python-pandera BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-pandera

# A Statistical Data Testing Toolkit *A data validation library for scientists, engineers, and analysts seeking correctness.*
[![CI Build](https://github.com/pandera-dev/pandera/workflows/CI%20Tests/badge.svg?branch=main)](https://github.com/pandera-dev/pandera/actions?query=workflow%3A%22CI+Tests%22+branch%3Amain) [![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=stable)](https://pandera.readthedocs.io/en/stable/?badge=stable) [![PyPI version shields.io](https://img.shields.io/pypi/v/pandera.svg)](https://pypi.org/project/pandera/) [![PyPI license](https://img.shields.io/pypi/l/pandera.svg)](https://pypi.python.org/pypi/) [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/12) [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=latest)](https://pandera.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/unionai-oss/pandera/branch/main/graph/badge.svg)](https://codecov.io/gh/pandera-dev/pandera) [![PyPI pyversions](https://img.shields.io/pypi/pyversions/pandera.svg)](https://pypi.python.org/pypi/pandera/) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3385265.svg)](https://doi.org/10.5281/zenodo.3385265) [![asv](http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=flat)](https://pandera-dev.github.io/pandera-asv-logs/) [![Downloads](https://pepy.tech/badge/pandera/month)](https://pepy.tech/project/pandera) [![Downloads](https://pepy.tech/badge/pandera)](https://pepy.tech/project/pandera) [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandera?label=conda%20downloads)](https://anaconda.org/conda-forge/pandera) [![Discord](https://img.shields.io/badge/discord-chat-purple?color=%235765F2&label=discord&logo=discord)](https://discord.gg/vyanhWuaKB) `pandera` provides a flexible and expressive API for performing data validation on dataframe-like objects to make data processing pipelines more readable and robust. Dataframes contain information that `pandera` explicitly validates at runtime. This is useful in production-critical or reproducible research settings. With `pandera`, you can: 1. Define a schema once and use it to validate [different dataframe types](https://pandera.readthedocs.io/en/stable/supported_libraries.html) including [pandas](http://pandas.pydata.org), [dask](https://dask.org), [modin](https://modin.readthedocs.io/), and [pyspark](https://spark.apache.org/docs/3.2.0/api/python/user_guide/pandas_on_spark/index.html). 1. [Check](https://pandera.readthedocs.io/en/stable/checks.html) the types and properties of columns in a `DataFrame` or values in a `Series`. 1. Perform more complex statistical validation like [hypothesis testing](https://pandera.readthedocs.io/en/stable/hypothesis.html#hypothesis). 1. Seamlessly integrate with existing data analysis/processing pipelines via [function decorators](https://pandera.readthedocs.io/en/stable/decorators.html#decorators). 1. Define dataframe models with the [class-based API](https://pandera.readthedocs.io/en/stable/dataframe_models.html#dataframe-models) with pydantic-style syntax and validate dataframes using the typing syntax. 1. [Synthesize data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html#data-synthesis-strategies) from schema objects for property-based testing with pandas data structures. 1. [Lazily Validate](https://pandera.readthedocs.io/en/stable/lazy_validation.html) dataframes so that all validation checks are executed before raising an error. 1. [Integrate](https://pandera.readthedocs.io/en/stable/integrations.html) with a rich ecosystem of python tools like [pydantic](https://pydantic-docs.helpmanual.io), [fastapi](https://fastapi.tiangolo.com/), and [mypy](http://mypy-lang.org/). ## Documentation The official documentation is hosted on ReadTheDocs: https://pandera.readthedocs.io ## Install Using pip: ``` pip install pandera ``` Using conda: ``` conda install -c conda-forge pandera ``` ### Extras Installing additional functionality:
pip ```bash pip install pandera[hypotheses] # hypothesis checks pip install pandera[io] # yaml/script schema io utilities pip install pandera[strategies] # data synthesis strategies pip install pandera[mypy] # enable static type-linting of pandas pip install pandera[fastapi] # fastapi integration pip install pandera[dask] # validate dask dataframes pip install pandera[pyspark] # validate pyspark dataframes pip install pandera[modin] # validate modin dataframes pip install pandera[modin-ray] # validate modin dataframes with ray pip install pandera[modin-dask] # validate modin dataframes with dask pip install pandera[geopandas] # validate geopandas geodataframes ```
conda ```bash conda install -c conda-forge pandera-hypotheses # hypothesis checks conda install -c conda-forge pandera-io # yaml/script schema io utilities conda install -c conda-forge pandera-strategies # data synthesis strategies conda install -c conda-forge pandera-mypy # enable static type-linting of pandas conda install -c conda-forge pandera-fastapi # fastapi integration conda install -c conda-forge pandera-dask # validate dask dataframes conda install -c conda-forge pandera-pyspark # validate pyspark dataframes conda install -c conda-forge pandera-modin # validate modin dataframes conda install -c conda-forge pandera-modin-ray # validate modin dataframes with ray conda install -c conda-forge pandera-modin-dask # validate modin dataframes with dask conda install -c conda-forge pandera-geopandas # validate geopandas geodataframes ```
## Quick Start ```python import pandas as pd import pandera as pa # data to validate df = pd.DataFrame({ "column1": [1, 4, 0, 10, 9], "column2": [-1.3, -1.4, -2.9, -10.1, -20.4], "column3": ["value_1", "value_2", "value_3", "value_2", "value_1"] }) # define schema schema = pa.DataFrameSchema({ "column1": pa.Column(int, checks=pa.Check.le(10)), "column2": pa.Column(float, checks=pa.Check.lt(-1.2)), "column3": pa.Column(str, checks=[ pa.Check.str_startswith("value_"), # define custom checks as functions that take a series as input and # outputs a boolean or boolean Series pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2) ]), }) validated_df = schema(df) print(validated_df) # column1 column2 column3 # 0 1 -1.3 value_1 # 1 4 -1.4 value_2 # 2 0 -2.9 value_3 # 3 10 -10.1 value_2 # 4 9 -20.4 value_1 ``` ## DataFrame Model `pandera` also provides an alternative API for expressing schemas inspired by [dataclasses](https://docs.python.org/3/library/dataclasses.html) and [pydantic](https://pydantic-docs.helpmanual.io/). The equivalent `DataFrameModel` for the above `DataFrameSchema` would be: ```python from pandera.typing import Series class Schema(pa.DataFrameModel): column1: Series[int] = pa.Field(le=10) column2: Series[float] = pa.Field(lt=-1.2) column3: Series[str] = pa.Field(str_startswith="value_") @pa.check("column3") def column_3_check(cls, series: Series[str]) -> Series[bool]: """Check that values have two elements after being split with '_'""" return series.str.split("_", expand=True).shape[1] == 2 Schema.validate(df) ``` ## Development Installation ``` git clone https://github.com/pandera-dev/pandera.git cd pandera pip install -r requirements-dev.txt pip install -e . ``` ## Tests ``` pip install pytest pytest tests ``` ## Contributing to pandera [![GitHub contributors](https://img.shields.io/github/contributors/pandera-dev/pandera.svg)](https://github.com/pandera-dev/pandera/graphs/contributors) All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. A detailed overview on how to contribute can be found in the [contributing guide](https://github.com/pandera-dev/pandera/blob/main/.github/CONTRIBUTING.md) on GitHub. ## Issues Go [here](https://github.com/pandera-dev/pandera/issues) to submit feature requests or bugfixes. ## Need Help? There are many ways of getting help with your questions. You can ask a question on [Github Discussions](https://github.com/pandera-dev/pandera/discussions/categories/q-a) page or reach out to the maintainers and pandera community on [Discord](https://discord.gg/vyanhWuaKB) ## Why `pandera`? - [dataframe-centric data types](https://pandera.readthedocs.io/en/stable/dtypes.html), [column nullability](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#null-values-in-columns), and [uniqueness](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#validating-the-joint-uniqueness-of-columns) are first-class concepts. - Define [dataframe models](https://pandera.readthedocs.io/en/stable/schema_models.html) with the class-based API with [pydantic](https://pydantic-docs.helpmanual.io/)-style syntax and validate dataframes using the typing syntax. - `check_input` and `check_output` [decorators](https://pandera.readthedocs.io/en/stable/decorators.html#decorators-for-pipeline-integration) enable seamless integration with existing code. - [`Check`s](https://pandera.readthedocs.io/en/stable/checks.html) provide flexibility and performance by providing access to `pandas` API by design and offers built-in checks for common data tests. - [`Hypothesis`](https://pandera.readthedocs.io/en/stable/hypothesis.html) class provides a tidy-first interface for statistical hypothesis testing. - `Check`s and `Hypothesis` objects support both [tidy and wide data validation](https://pandera.readthedocs.io/en/stable/checks.html#wide-checks). - Use schemas as generative contracts to [synthesize data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html) for unit testing. - [Schema inference](https://pandera.readthedocs.io/en/stable/schema_inference.html) allows you to bootstrap schemas from data. ## Alternative Data Validation Libraries Here are a few other alternatives for validating Python data structures. **Generic Python object data validation** - [voloptuous](https://github.com/alecthomas/voluptuous) - [schema](https://github.com/keleshev/schema) **`pandas`-specific data validation** - [opulent-pandas](https://github.com/danielvdende/opulent-pandas) - [PandasSchema](https://github.com/TMiguelT/PandasSchema) - [pandas-validator](https://github.com/c-data/pandas-validator) - [table_enforcer](https://github.com/xguse/table_enforcer) - [dataenforce](https://github.com/CedricFR/dataenforce) - [strictly typed pandas](https://github.com/nanne-aben/strictly_typed_pandas) - [marshmallow-dataframe](https://github.com/facultyai/marshmallow-dataframe) **Other tools for data validation** - [great_expectations](https://github.com/great-expectations/great_expectations) - [frictionless schema](https://framework.frictionlessdata.io/docs/guides/framework/schema-guide/) ## How to Cite If you use `pandera` in the context of academic or industry research, please consider citing the **paper** and/or **software package**. ### [Paper](https://conference.scipy.org/proceedings/scipy2020/niels_bantilan.html) ``` @InProceedings{ niels_bantilan-proc-scipy-2020, author = { {N}iels {B}antilan }, title = { pandera: {S}tatistical {D}ata {V}alidation of {P}andas {D}ataframes }, booktitle = { {P}roceedings of the 19th {P}ython in {S}cience {C}onference }, pages = { 116 - 124 }, year = { 2020 }, editor = { {M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe }, doi = { 10.25080/Majora-342d178e-010 } } ``` ### Software Package [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3385265.svg)](https://doi.org/10.5281/zenodo.3385265) ## License and Credits `pandera` is licensed under the [MIT license](license.txt) and is written and maintained by Niels Bantilan (niels@pandera.ci) %package help Summary: Development documents and examples for pandera Provides: python3-pandera-doc %description help

# A Statistical Data Testing Toolkit *A data validation library for scientists, engineers, and analysts seeking correctness.*
[![CI Build](https://github.com/pandera-dev/pandera/workflows/CI%20Tests/badge.svg?branch=main)](https://github.com/pandera-dev/pandera/actions?query=workflow%3A%22CI+Tests%22+branch%3Amain) [![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=stable)](https://pandera.readthedocs.io/en/stable/?badge=stable) [![PyPI version shields.io](https://img.shields.io/pypi/v/pandera.svg)](https://pypi.org/project/pandera/) [![PyPI license](https://img.shields.io/pypi/l/pandera.svg)](https://pypi.python.org/pypi/) [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/12) [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=latest)](https://pandera.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/unionai-oss/pandera/branch/main/graph/badge.svg)](https://codecov.io/gh/pandera-dev/pandera) [![PyPI pyversions](https://img.shields.io/pypi/pyversions/pandera.svg)](https://pypi.python.org/pypi/pandera/) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3385265.svg)](https://doi.org/10.5281/zenodo.3385265) [![asv](http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=flat)](https://pandera-dev.github.io/pandera-asv-logs/) [![Downloads](https://pepy.tech/badge/pandera/month)](https://pepy.tech/project/pandera) [![Downloads](https://pepy.tech/badge/pandera)](https://pepy.tech/project/pandera) [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandera?label=conda%20downloads)](https://anaconda.org/conda-forge/pandera) [![Discord](https://img.shields.io/badge/discord-chat-purple?color=%235765F2&label=discord&logo=discord)](https://discord.gg/vyanhWuaKB) `pandera` provides a flexible and expressive API for performing data validation on dataframe-like objects to make data processing pipelines more readable and robust. Dataframes contain information that `pandera` explicitly validates at runtime. This is useful in production-critical or reproducible research settings. With `pandera`, you can: 1. Define a schema once and use it to validate [different dataframe types](https://pandera.readthedocs.io/en/stable/supported_libraries.html) including [pandas](http://pandas.pydata.org), [dask](https://dask.org), [modin](https://modin.readthedocs.io/), and [pyspark](https://spark.apache.org/docs/3.2.0/api/python/user_guide/pandas_on_spark/index.html). 1. [Check](https://pandera.readthedocs.io/en/stable/checks.html) the types and properties of columns in a `DataFrame` or values in a `Series`. 1. Perform more complex statistical validation like [hypothesis testing](https://pandera.readthedocs.io/en/stable/hypothesis.html#hypothesis). 1. Seamlessly integrate with existing data analysis/processing pipelines via [function decorators](https://pandera.readthedocs.io/en/stable/decorators.html#decorators). 1. Define dataframe models with the [class-based API](https://pandera.readthedocs.io/en/stable/dataframe_models.html#dataframe-models) with pydantic-style syntax and validate dataframes using the typing syntax. 1. [Synthesize data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html#data-synthesis-strategies) from schema objects for property-based testing with pandas data structures. 1. [Lazily Validate](https://pandera.readthedocs.io/en/stable/lazy_validation.html) dataframes so that all validation checks are executed before raising an error. 1. [Integrate](https://pandera.readthedocs.io/en/stable/integrations.html) with a rich ecosystem of python tools like [pydantic](https://pydantic-docs.helpmanual.io), [fastapi](https://fastapi.tiangolo.com/), and [mypy](http://mypy-lang.org/). ## Documentation The official documentation is hosted on ReadTheDocs: https://pandera.readthedocs.io ## Install Using pip: ``` pip install pandera ``` Using conda: ``` conda install -c conda-forge pandera ``` ### Extras Installing additional functionality:
pip ```bash pip install pandera[hypotheses] # hypothesis checks pip install pandera[io] # yaml/script schema io utilities pip install pandera[strategies] # data synthesis strategies pip install pandera[mypy] # enable static type-linting of pandas pip install pandera[fastapi] # fastapi integration pip install pandera[dask] # validate dask dataframes pip install pandera[pyspark] # validate pyspark dataframes pip install pandera[modin] # validate modin dataframes pip install pandera[modin-ray] # validate modin dataframes with ray pip install pandera[modin-dask] # validate modin dataframes with dask pip install pandera[geopandas] # validate geopandas geodataframes ```
conda ```bash conda install -c conda-forge pandera-hypotheses # hypothesis checks conda install -c conda-forge pandera-io # yaml/script schema io utilities conda install -c conda-forge pandera-strategies # data synthesis strategies conda install -c conda-forge pandera-mypy # enable static type-linting of pandas conda install -c conda-forge pandera-fastapi # fastapi integration conda install -c conda-forge pandera-dask # validate dask dataframes conda install -c conda-forge pandera-pyspark # validate pyspark dataframes conda install -c conda-forge pandera-modin # validate modin dataframes conda install -c conda-forge pandera-modin-ray # validate modin dataframes with ray conda install -c conda-forge pandera-modin-dask # validate modin dataframes with dask conda install -c conda-forge pandera-geopandas # validate geopandas geodataframes ```
## Quick Start ```python import pandas as pd import pandera as pa # data to validate df = pd.DataFrame({ "column1": [1, 4, 0, 10, 9], "column2": [-1.3, -1.4, -2.9, -10.1, -20.4], "column3": ["value_1", "value_2", "value_3", "value_2", "value_1"] }) # define schema schema = pa.DataFrameSchema({ "column1": pa.Column(int, checks=pa.Check.le(10)), "column2": pa.Column(float, checks=pa.Check.lt(-1.2)), "column3": pa.Column(str, checks=[ pa.Check.str_startswith("value_"), # define custom checks as functions that take a series as input and # outputs a boolean or boolean Series pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2) ]), }) validated_df = schema(df) print(validated_df) # column1 column2 column3 # 0 1 -1.3 value_1 # 1 4 -1.4 value_2 # 2 0 -2.9 value_3 # 3 10 -10.1 value_2 # 4 9 -20.4 value_1 ``` ## DataFrame Model `pandera` also provides an alternative API for expressing schemas inspired by [dataclasses](https://docs.python.org/3/library/dataclasses.html) and [pydantic](https://pydantic-docs.helpmanual.io/). The equivalent `DataFrameModel` for the above `DataFrameSchema` would be: ```python from pandera.typing import Series class Schema(pa.DataFrameModel): column1: Series[int] = pa.Field(le=10) column2: Series[float] = pa.Field(lt=-1.2) column3: Series[str] = pa.Field(str_startswith="value_") @pa.check("column3") def column_3_check(cls, series: Series[str]) -> Series[bool]: """Check that values have two elements after being split with '_'""" return series.str.split("_", expand=True).shape[1] == 2 Schema.validate(df) ``` ## Development Installation ``` git clone https://github.com/pandera-dev/pandera.git cd pandera pip install -r requirements-dev.txt pip install -e . ``` ## Tests ``` pip install pytest pytest tests ``` ## Contributing to pandera [![GitHub contributors](https://img.shields.io/github/contributors/pandera-dev/pandera.svg)](https://github.com/pandera-dev/pandera/graphs/contributors) All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. A detailed overview on how to contribute can be found in the [contributing guide](https://github.com/pandera-dev/pandera/blob/main/.github/CONTRIBUTING.md) on GitHub. ## Issues Go [here](https://github.com/pandera-dev/pandera/issues) to submit feature requests or bugfixes. ## Need Help? There are many ways of getting help with your questions. You can ask a question on [Github Discussions](https://github.com/pandera-dev/pandera/discussions/categories/q-a) page or reach out to the maintainers and pandera community on [Discord](https://discord.gg/vyanhWuaKB) ## Why `pandera`? - [dataframe-centric data types](https://pandera.readthedocs.io/en/stable/dtypes.html), [column nullability](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#null-values-in-columns), and [uniqueness](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#validating-the-joint-uniqueness-of-columns) are first-class concepts. - Define [dataframe models](https://pandera.readthedocs.io/en/stable/schema_models.html) with the class-based API with [pydantic](https://pydantic-docs.helpmanual.io/)-style syntax and validate dataframes using the typing syntax. - `check_input` and `check_output` [decorators](https://pandera.readthedocs.io/en/stable/decorators.html#decorators-for-pipeline-integration) enable seamless integration with existing code. - [`Check`s](https://pandera.readthedocs.io/en/stable/checks.html) provide flexibility and performance by providing access to `pandas` API by design and offers built-in checks for common data tests. - [`Hypothesis`](https://pandera.readthedocs.io/en/stable/hypothesis.html) class provides a tidy-first interface for statistical hypothesis testing. - `Check`s and `Hypothesis` objects support both [tidy and wide data validation](https://pandera.readthedocs.io/en/stable/checks.html#wide-checks). - Use schemas as generative contracts to [synthesize data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html) for unit testing. - [Schema inference](https://pandera.readthedocs.io/en/stable/schema_inference.html) allows you to bootstrap schemas from data. ## Alternative Data Validation Libraries Here are a few other alternatives for validating Python data structures. **Generic Python object data validation** - [voloptuous](https://github.com/alecthomas/voluptuous) - [schema](https://github.com/keleshev/schema) **`pandas`-specific data validation** - [opulent-pandas](https://github.com/danielvdende/opulent-pandas) - [PandasSchema](https://github.com/TMiguelT/PandasSchema) - [pandas-validator](https://github.com/c-data/pandas-validator) - [table_enforcer](https://github.com/xguse/table_enforcer) - [dataenforce](https://github.com/CedricFR/dataenforce) - [strictly typed pandas](https://github.com/nanne-aben/strictly_typed_pandas) - [marshmallow-dataframe](https://github.com/facultyai/marshmallow-dataframe) **Other tools for data validation** - [great_expectations](https://github.com/great-expectations/great_expectations) - [frictionless schema](https://framework.frictionlessdata.io/docs/guides/framework/schema-guide/) ## How to Cite If you use `pandera` in the context of academic or industry research, please consider citing the **paper** and/or **software package**. ### [Paper](https://conference.scipy.org/proceedings/scipy2020/niels_bantilan.html) ``` @InProceedings{ niels_bantilan-proc-scipy-2020, author = { {N}iels {B}antilan }, title = { pandera: {S}tatistical {D}ata {V}alidation of {P}andas {D}ataframes }, booktitle = { {P}roceedings of the 19th {P}ython in {S}cience {C}onference }, pages = { 116 - 124 }, year = { 2020 }, editor = { {M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe }, doi = { 10.25080/Majora-342d178e-010 } } ``` ### Software Package [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3385265.svg)](https://doi.org/10.5281/zenodo.3385265) ## License and Credits `pandera` is licensed under the [MIT license](license.txt) and is written and maintained by Niels Bantilan (niels@pandera.ci) %prep %autosetup -n pandera-0.14.5 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-pandera -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Fri Apr 21 2023 Python_Bot - 0.14.5-1 - Package Spec generated