automatic import of python-pydeequ

author: CoprDistGit <infra@openeuler.org> 2023-04-10 10:52:22 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-04-10 10:52:22 +0000
commit: 2879dd214574784cd15ac0ce69d86d0f0d7c12bb (patch)
tree: c4b533cd68b499d561d1176fa030c52342ea4bf5
parent: 764b4b9e15be344e2773170cba0c17f15be4b80c (diff)
3 files changed, 785 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..cd4ad12 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/pydeequ-1.0.1.tar.gz
diff --git a/python-pydeequ.spec b/python-pydeequ.spec
new file mode 100644
index 0000000..1bad0e1
--- /dev/null
+++ b/python-pydeequ.spec
@@ -0,0 +1,783 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-pydeequ
+Version:	1.0.1
+Release:	1
+Summary:	PyDeequ - Unit Tests for Data
+License:	Apache-2.0
+URL:		https://pydeequ.readthedocs.io
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/4e/80/904c35897312359b1f4119234de5979cd3772ed7e2a01bc9b5b059b68425/pydeequ-1.0.1.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-numpy
+Requires:	python3-pandas
+Requires:	python3-pyspark
+
+%description
+# PyDeequ
+
+PyDeequ is a Python API for [Deequ](https://github.com/awslabs/deequ), a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. PyDeequ is written to support usage of Deequ in Python.
+
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) ![Coverage](https://img.shields.io/badge/coverage-90%25-green)
+
+There are 4 main components of Deequ, and they are:
+- Metrics Computation:
+    - `Profiles` leverages Analyzers to analyze each column of a dataset.
+    - `Analyzers` serve here as a foundational module that computes metrics for data profiling and validation at scale.
+- Constraint Suggestion:
+    - Specify rules for various groups of Analyzers to be run over a dataset to return back a collection of constraints suggested to run in a Verification Suite.
+- Constraint Verification:
+    - Perform data validation on a dataset with respect to various constraints set by you.   
+- Metrics Repository
+    - Allows for persistence and tracking of Deequ runs over time.
+
+![](imgs/pydeequ_architecture.jpg)
+
+## 🎉 Announcements 🎉
+- With PyDeequ v0.1.8+, we now officially support Spark3 ! Just make sure you have an environment variable `SPARK_VERSION` to specify your Spark version! 
+- We've release a blogpost on integrating PyDeequ onto AWS leveraging services such as AWS Glue, Athena, and SageMaker! Check it out: [Monitor data quality in your data lake using PyDeequ and AWS Glue](https://aws.amazon.com/blogs/big-data/monitor-data-quality-in-your-data-lake-using-pydeequ-and-aws-glue/).
+- Check out the [PyDeequ Release Announcement Blogpost](https://aws.amazon.com/blogs/big-data/testing-data-quality-at-scale-with-pydeequ/) with a tutorial walkthrough the Amazon Reviews dataset!
+- Join the PyDeequ community on [PyDeequ Slack](https://join.slack.com/t/pydeequ/shared_invite/zt-te6bntpu-yaqPy7bhiN8Lu0NxpZs47Q) to chat with the devs!
+
+## Quickstart
+
+The following will quickstart you with some basic usage. For more in-depth examples, take a look in the [`tutorials/`](tutorials/) directory for executable Jupyter notebooks of each module. For documentation on supported interfaces, view the [`documentation`](https://pydeequ.readthedocs.io/).
+
+### Installation
+
+You can install [PyDeequ via pip](https://pypi.org/project/pydeequ/).
+
+```
+pip install pydeequ
+```
+
+### Set up a PySpark session
+```python
+from pyspark.sql import SparkSession, Row
+import pydeequ
+
+spark = (SparkSession
+    .builder
+    .config("spark.jars.packages", pydeequ.deequ_maven_coord)
+    .config("spark.jars.excludes", pydeequ.f2j_maven_coord)
+    .getOrCreate())
+
+df = spark.sparkContext.parallelize([
+            Row(a="foo", b=1, c=5),
+            Row(a="bar", b=2, c=6),
+            Row(a="baz", b=3, c=None)]).toDF()
+```
+
+### Analyzers
+
+```python
+from pydeequ.analyzers import *
+
+analysisResult = AnalysisRunner(spark) \
+                    .onData(df) \
+                    .addAnalyzer(Size()) \
+                    .addAnalyzer(Completeness("b")) \
+                    .run()
+
+analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
+analysisResult_df.show()
+```
+
+### Profile
+
+```python
+from pydeequ.profiles import *
+
+result = ColumnProfilerRunner(spark) \
+    .onData(df) \
+    .run()
+
+for col, profile in result.profiles.items():
+    print(profile)
+```
+
+### Constraint Suggestions
+
+```python
+from pydeequ.suggestions import *
+
+suggestionResult = ConstraintSuggestionRunner(spark) \
+             .onData(df) \
+             .addConstraintRule(DEFAULT()) \
+             .run()
+
+# Constraint Suggestions in JSON format
+print(suggestionResult)
+```
+
+### Constraint Verification
+
+```python
+from pydeequ.checks import *
+from pydeequ.verification import *
+
+check = Check(spark, CheckLevel.Warning, "Review Check")
+
+checkResult = VerificationSuite(spark) \
+    .onData(df) \
+    .addCheck(
+        check.hasSize(lambda x: x >= 3) \
+        .hasMin("b", lambda x: x == 0) \
+        .isComplete("c")  \
+        .isUnique("a")  \
+        .isContainedIn("a", ["foo", "bar", "baz"]) \
+        .isNonNegative("b")) \
+    .run()
+
+checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
+checkResult_df.show()
+```
+
+### Repository
+
+Save to a Metrics Repository by adding the `useRepository()` and `saveOrAppendResult()` calls to your Analysis Runner.
+```python
+from pydeequ.repository import *
+from pydeequ.analyzers import *
+
+metrics_file = FileSystemMetricsRepository.helper_metrics_file(spark, 'metrics.json')
+repository = FileSystemMetricsRepository(spark, metrics_file)
+key_tags = {'tag': 'pydeequ hello world'}
+resultKey = ResultKey(spark, ResultKey.current_milli_time(), key_tags)
+
+analysisResult = AnalysisRunner(spark) \
+    .onData(df) \
+    .addAnalyzer(ApproxCountDistinct('b')) \
+    .useRepository(repository) \
+    .saveOrAppendResult(resultKey) \
+    .run()
+```
+
+To load previous runs, use the `repository` object to load previous results back in.
+
+```python
+result_metrep_df = repository.load() \
+    .before(ResultKey.current_milli_time()) \
+    .forAnalyzers([ApproxCountDistinct('b')]) \
+    .getSuccessMetricsAsDataFrame()
+```
+
+## [Contributing](https://github.com/awslabs/python-deequ/blob/master/CONTRIBUTING.md)
+Please refer to the [contributing doc](https://github.com/awslabs/python-deequ/blob/master/CONTRIBUTING.md) for how to contribute to PyDeequ.
+
+## [License](https://github.com/awslabs/python-deequ/blob/master/LICENSE)
+
+This library is licensed under the Apache 2.0 License.
+
+******
+
+## Contributing Developer Setup
+
+1. Setup [SDKMAN](#setup-sdkman)
+1. Setup [Java](#setup-java)
+1. Setup [Apache Spark](#setup-apache-spark)
+1. Install [Poetry](#poetry)
+1. Run [tests locally](#running-tests-locally)
+
+### Setup SDKMAN
+
+SDKMAN is a tool for managing parallel Versions of multiple Software Development Kits on any Unix based
+system. It provides a convenient command line interface for installing, switching, removing and listing
+Candidates. SDKMAN! installs smoothly on Mac OSX, Linux, WSL, Cygwin, etc... Support Bash and ZSH shells. See
+documentation on the [SDKMAN! website](https://sdkman.io).
+
+Open your favourite terminal and enter the following:
+
+```bash
+$ curl -s https://get.sdkman.io | bash
+If the environment needs tweaking for SDKMAN to be installed,
+the installer will prompt you accordingly and ask you to restart.
+
+Next, open a new terminal or enter:
+
+$ source "$HOME/.sdkman/bin/sdkman-init.sh"
+
+Lastly, run the following code snippet to ensure that installation succeeded:
+
+$ sdk version
+```
+
+### Setup Java
+
+Install Java Now open favourite terminal and enter the following:
+
+```bash
+List the AdoptOpenJDK OpenJDK versions
+$ sdk list java
+
+To install For Java 11
+$ sdk install java 11.0.10.hs-adpt
+
+To install For Java 11
+$ sdk install java 8.0.292.hs-adpt
+```
+
+### Setup Apache Spark
+
+Install Java Now open favourite terminal and enter the following:
+
+```bash
+List the Apache Spark versions:
+$ sdk list spark
+
+To install For Spark 3
+$ sdk install spark 3.0.2
+```
+
+### Poetry
+
+Poetry [Commands](https://python-poetry.org/docs/cli/#search)
+
+```bash
+poetry install
+
+poetry update
+
+# --tree: List the dependencies as a tree.
+# --latest (-l): Show the latest version.
+# --outdated (-o): Show the latest version but only for packages that are outdated.
+poetry show -o
+```
+
+## Running Tests Locally
+
+Take a look at tests in `tests/dataquality` and `tests/jobs`
+
+```bash
+$ poetry run pytest
+```
+
+%package -n python3-pydeequ
+Summary:	PyDeequ - Unit Tests for Data
+Provides:	python-pydeequ
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-pydeequ
+# PyDeequ
+
+PyDeequ is a Python API for [Deequ](https://github.com/awslabs/deequ), a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. PyDeequ is written to support usage of Deequ in Python.
+
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) ![Coverage](https://img.shields.io/badge/coverage-90%25-green)
+
+There are 4 main components of Deequ, and they are:
+- Metrics Computation:
+    - `Profiles` leverages Analyzers to analyze each column of a dataset.
+    - `Analyzers` serve here as a foundational module that computes metrics for data profiling and validation at scale.
+- Constraint Suggestion:
+    - Specify rules for various groups of Analyzers to be run over a dataset to return back a collection of constraints suggested to run in a Verification Suite.
+- Constraint Verification:
+    - Perform data validation on a dataset with respect to various constraints set by you.   
+- Metrics Repository
+    - Allows for persistence and tracking of Deequ runs over time.
+
+![](imgs/pydeequ_architecture.jpg)
+
+## 🎉 Announcements 🎉
+- With PyDeequ v0.1.8+, we now officially support Spark3 ! Just make sure you have an environment variable `SPARK_VERSION` to specify your Spark version! 
+- We've release a blogpost on integrating PyDeequ onto AWS leveraging services such as AWS Glue, Athena, and SageMaker! Check it out: [Monitor data quality in your data lake using PyDeequ and AWS Glue](https://aws.amazon.com/blogs/big-data/monitor-data-quality-in-your-data-lake-using-pydeequ-and-aws-glue/).
+- Check out the [PyDeequ Release Announcement Blogpost](https://aws.amazon.com/blogs/big-data/testing-data-quality-at-scale-with-pydeequ/) with a tutorial walkthrough the Amazon Reviews dataset!
+- Join the PyDeequ community on [PyDeequ Slack](https://join.slack.com/t/pydeequ/shared_invite/zt-te6bntpu-yaqPy7bhiN8Lu0NxpZs47Q) to chat with the devs!
+
+## Quickstart
+
+The following will quickstart you with some basic usage. For more in-depth examples, take a look in the [`tutorials/`](tutorials/) directory for executable Jupyter notebooks of each module. For documentation on supported interfaces, view the [`documentation`](https://pydeequ.readthedocs.io/).
+
+### Installation
+
+You can install [PyDeequ via pip](https://pypi.org/project/pydeequ/).
+
+```
+pip install pydeequ
+```
+
+### Set up a PySpark session
+```python
+from pyspark.sql import SparkSession, Row
+import pydeequ
+
+spark = (SparkSession
+    .builder
+    .config("spark.jars.packages", pydeequ.deequ_maven_coord)
+    .config("spark.jars.excludes", pydeequ.f2j_maven_coord)
+    .getOrCreate())
+
+df = spark.sparkContext.parallelize([
+            Row(a="foo", b=1, c=5),
+            Row(a="bar", b=2, c=6),
+            Row(a="baz", b=3, c=None)]).toDF()
+```
+
+### Analyzers
+
+```python
+from pydeequ.analyzers import *
+
+analysisResult = AnalysisRunner(spark) \
+                    .onData(df) \
+                    .addAnalyzer(Size()) \
+                    .addAnalyzer(Completeness("b")) \
+                    .run()
+
+analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
+analysisResult_df.show()
+```
+
+### Profile
+
+```python
+from pydeequ.profiles import *
+
+result = ColumnProfilerRunner(spark) \
+    .onData(df) \
+    .run()
+
+for col, profile in result.profiles.items():
+    print(profile)
+```
+
+### Constraint Suggestions
+
+```python
+from pydeequ.suggestions import *
+
+suggestionResult = ConstraintSuggestionRunner(spark) \
+             .onData(df) \
+             .addConstraintRule(DEFAULT()) \
+             .run()
+
+# Constraint Suggestions in JSON format
+print(suggestionResult)
+```
+
+### Constraint Verification
+
+```python
+from pydeequ.checks import *
+from pydeequ.verification import *
+
+check = Check(spark, CheckLevel.Warning, "Review Check")
+
+checkResult = VerificationSuite(spark) \
+    .onData(df) \
+    .addCheck(
+        check.hasSize(lambda x: x >= 3) \
+        .hasMin("b", lambda x: x == 0) \
+        .isComplete("c")  \
+        .isUnique("a")  \
+        .isContainedIn("a", ["foo", "bar", "baz"]) \
+        .isNonNegative("b")) \
+    .run()
+
+checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
+checkResult_df.show()
+```
+
+### Repository
+
+Save to a Metrics Repository by adding the `useRepository()` and `saveOrAppendResult()` calls to your Analysis Runner.
+```python
+from pydeequ.repository import *
+from pydeequ.analyzers import *
+
+metrics_file = FileSystemMetricsRepository.helper_metrics_file(spark, 'metrics.json')
+repository = FileSystemMetricsRepository(spark, metrics_file)
+key_tags = {'tag': 'pydeequ hello world'}
+resultKey = ResultKey(spark, ResultKey.current_milli_time(), key_tags)
+
+analysisResult = AnalysisRunner(spark) \
+    .onData(df) \
+    .addAnalyzer(ApproxCountDistinct('b')) \
+    .useRepository(repository) \
+    .saveOrAppendResult(resultKey) \
+    .run()
+```
+
+To load previous runs, use the `repository` object to load previous results back in.
+
+```python
+result_metrep_df = repository.load() \
+    .before(ResultKey.current_milli_time()) \
+    .forAnalyzers([ApproxCountDistinct('b')]) \
+    .getSuccessMetricsAsDataFrame()
+```
+
+## [Contributing](https://github.com/awslabs/python-deequ/blob/master/CONTRIBUTING.md)
+Please refer to the [contributing doc](https://github.com/awslabs/python-deequ/blob/master/CONTRIBUTING.md) for how to contribute to PyDeequ.
+
+## [License](https://github.com/awslabs/python-deequ/blob/master/LICENSE)
+
+This library is licensed under the Apache 2.0 License.
+
+******
+
+## Contributing Developer Setup
+
+1. Setup [SDKMAN](#setup-sdkman)
+1. Setup [Java](#setup-java)
+1. Setup [Apache Spark](#setup-apache-spark)
+1. Install [Poetry](#poetry)
+1. Run [tests locally](#running-tests-locally)
+
+### Setup SDKMAN
+
+SDKMAN is a tool for managing parallel Versions of multiple Software Development Kits on any Unix based
+system. It provides a convenient command line interface for installing, switching, removing and listing
+Candidates. SDKMAN! installs smoothly on Mac OSX, Linux, WSL, Cygwin, etc... Support Bash and ZSH shells. See
+documentation on the [SDKMAN! website](https://sdkman.io).
+
+Open your favourite terminal and enter the following:
+
+```bash
+$ curl -s https://get.sdkman.io | bash
+If the environment needs tweaking for SDKMAN to be installed,
+the installer will prompt you accordingly and ask you to restart.
+
+Next, open a new terminal or enter:
+
+$ source "$HOME/.sdkman/bin/sdkman-init.sh"
+
+Lastly, run the following code snippet to ensure that installation succeeded:
+
+$ sdk version
+```
+
+### Setup Java
+
+Install Java Now open favourite terminal and enter the following:
+
+```bash
+List the AdoptOpenJDK OpenJDK versions
+$ sdk list java
+
+To install For Java 11
+$ sdk install java 11.0.10.hs-adpt
+
+To install For Java 11
+$ sdk install java 8.0.292.hs-adpt
+```
+
+### Setup Apache Spark
+
+Install Java Now open favourite terminal and enter the following:
+
+```bash
+List the Apache Spark versions:
+$ sdk list spark
+
+To install For Spark 3
+$ sdk install spark 3.0.2
+```
+
+### Poetry
+
+Poetry [Commands](https://python-poetry.org/docs/cli/#search)
+
+```bash
+poetry install
+
+poetry update
+
+# --tree: List the dependencies as a tree.
+# --latest (-l): Show the latest version.
+# --outdated (-o): Show the latest version but only for packages that are outdated.
+poetry show -o
+```
+
+## Running Tests Locally
+
+Take a look at tests in `tests/dataquality` and `tests/jobs`
+
+```bash
+$ poetry run pytest
+```
+
+%package help
+Summary:	Development documents and examples for pydeequ
+Provides:	python3-pydeequ-doc
+%description help
+# PyDeequ
+
+PyDeequ is a Python API for [Deequ](https://github.com/awslabs/deequ), a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. PyDeequ is written to support usage of Deequ in Python.
+
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) ![Coverage](https://img.shields.io/badge/coverage-90%25-green)
+
+There are 4 main components of Deequ, and they are:
+- Metrics Computation:
+    - `Profiles` leverages Analyzers to analyze each column of a dataset.
+    - `Analyzers` serve here as a foundational module that computes metrics for data profiling and validation at scale.
+- Constraint Suggestion:
+    - Specify rules for various groups of Analyzers to be run over a dataset to return back a collection of constraints suggested to run in a Verification Suite.
+- Constraint Verification:
+    - Perform data validation on a dataset with respect to various constraints set by you.   
+- Metrics Repository
+    - Allows for persistence and tracking of Deequ runs over time.
+
+![](imgs/pydeequ_architecture.jpg)
+
+## 🎉 Announcements 🎉
+- With PyDeequ v0.1.8+, we now officially support Spark3 ! Just make sure you have an environment variable `SPARK_VERSION` to specify your Spark version! 
+- We've release a blogpost on integrating PyDeequ onto AWS leveraging services such as AWS Glue, Athena, and SageMaker! Check it out: [Monitor data quality in your data lake using PyDeequ and AWS Glue](https://aws.amazon.com/blogs/big-data/monitor-data-quality-in-your-data-lake-using-pydeequ-and-aws-glue/).
+- Check out the [PyDeequ Release Announcement Blogpost](https://aws.amazon.com/blogs/big-data/testing-data-quality-at-scale-with-pydeequ/) with a tutorial walkthrough the Amazon Reviews dataset!
+- Join the PyDeequ community on [PyDeequ Slack](https://join.slack.com/t/pydeequ/shared_invite/zt-te6bntpu-yaqPy7bhiN8Lu0NxpZs47Q) to chat with the devs!
+
+## Quickstart
+
+The following will quickstart you with some basic usage. For more in-depth examples, take a look in the [`tutorials/`](tutorials/) directory for executable Jupyter notebooks of each module. For documentation on supported interfaces, view the [`documentation`](https://pydeequ.readthedocs.io/).
+
+### Installation
+
+You can install [PyDeequ via pip](https://pypi.org/project/pydeequ/).
+
+```
+pip install pydeequ
+```
+
+### Set up a PySpark session
+```python
+from pyspark.sql import SparkSession, Row
+import pydeequ
+
+spark = (SparkSession
+    .builder
+    .config("spark.jars.packages", pydeequ.deequ_maven_coord)
+    .config("spark.jars.excludes", pydeequ.f2j_maven_coord)
+    .getOrCreate())
+
+df = spark.sparkContext.parallelize([
+            Row(a="foo", b=1, c=5),
+            Row(a="bar", b=2, c=6),
+            Row(a="baz", b=3, c=None)]).toDF()
+```
+
+### Analyzers
+
+```python
+from pydeequ.analyzers import *
+
+analysisResult = AnalysisRunner(spark) \
+                    .onData(df) \
+                    .addAnalyzer(Size()) \
+                    .addAnalyzer(Completeness("b")) \
+                    .run()
+
+analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
+analysisResult_df.show()
+```
+
+### Profile
+
+```python
+from pydeequ.profiles import *
+
+result = ColumnProfilerRunner(spark) \
+    .onData(df) \
+    .run()
+
+for col, profile in result.profiles.items():
+    print(profile)
+```
+
+### Constraint Suggestions
+
+```python
+from pydeequ.suggestions import *
+
+suggestionResult = ConstraintSuggestionRunner(spark) \
+             .onData(df) \
+             .addConstraintRule(DEFAULT()) \
+             .run()
+
+# Constraint Suggestions in JSON format
+print(suggestionResult)
+```
+
+### Constraint Verification
+
+```python
+from pydeequ.checks import *
+from pydeequ.verification import *
+
+check = Check(spark, CheckLevel.Warning, "Review Check")
+
+checkResult = VerificationSuite(spark) \
+    .onData(df) \
+    .addCheck(
+        check.hasSize(lambda x: x >= 3) \
+        .hasMin("b", lambda x: x == 0) \
+        .isComplete("c")  \
+        .isUnique("a")  \
+        .isContainedIn("a", ["foo", "bar", "baz"]) \
+        .isNonNegative("b")) \
+    .run()
+
+checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
+checkResult_df.show()
+```
+
+### Repository
+
+Save to a Metrics Repository by adding the `useRepository()` and `saveOrAppendResult()` calls to your Analysis Runner.
+```python
+from pydeequ.repository import *
+from pydeequ.analyzers import *
+
+metrics_file = FileSystemMetricsRepository.helper_metrics_file(spark, 'metrics.json')
+repository = FileSystemMetricsRepository(spark, metrics_file)
+key_tags = {'tag': 'pydeequ hello world'}
+resultKey = ResultKey(spark, ResultKey.current_milli_time(), key_tags)
+
+analysisResult = AnalysisRunner(spark) \
+    .onData(df) \
+    .addAnalyzer(ApproxCountDistinct('b')) \
+    .useRepository(repository) \
+    .saveOrAppendResult(resultKey) \
+    .run()
+```
+
+To load previous runs, use the `repository` object to load previous results back in.
+
+```python
+result_metrep_df = repository.load() \
+    .before(ResultKey.current_milli_time()) \
+    .forAnalyzers([ApproxCountDistinct('b')]) \
+    .getSuccessMetricsAsDataFrame()
+```
+
+## [Contributing](https://github.com/awslabs/python-deequ/blob/master/CONTRIBUTING.md)
+Please refer to the [contributing doc](https://github.com/awslabs/python-deequ/blob/master/CONTRIBUTING.md) for how to contribute to PyDeequ.
+
+## [License](https://github.com/awslabs/python-deequ/blob/master/LICENSE)
+
+This library is licensed under the Apache 2.0 License.
+
+******
+
+## Contributing Developer Setup
+
+1. Setup [SDKMAN](#setup-sdkman)
+1. Setup [Java](#setup-java)
+1. Setup [Apache Spark](#setup-apache-spark)
+1. Install [Poetry](#poetry)
+1. Run [tests locally](#running-tests-locally)
+
+### Setup SDKMAN
+
+SDKMAN is a tool for managing parallel Versions of multiple Software Development Kits on any Unix based
+system. It provides a convenient command line interface for installing, switching, removing and listing
+Candidates. SDKMAN! installs smoothly on Mac OSX, Linux, WSL, Cygwin, etc... Support Bash and ZSH shells. See
+documentation on the [SDKMAN! website](https://sdkman.io).
+
+Open your favourite terminal and enter the following:
+
+```bash
+$ curl -s https://get.sdkman.io | bash
+If the environment needs tweaking for SDKMAN to be installed,
+the installer will prompt you accordingly and ask you to restart.
+
+Next, open a new terminal or enter:
+
+$ source "$HOME/.sdkman/bin/sdkman-init.sh"
+
+Lastly, run the following code snippet to ensure that installation succeeded:
+
+$ sdk version
+```
+
+### Setup Java
+
+Install Java Now open favourite terminal and enter the following:
+
+```bash
+List the AdoptOpenJDK OpenJDK versions
+$ sdk list java
+
+To install For Java 11
+$ sdk install java 11.0.10.hs-adpt
+
+To install For Java 11
+$ sdk install java 8.0.292.hs-adpt
+```
+
+### Setup Apache Spark
+
+Install Java Now open favourite terminal and enter the following:
+
+```bash
+List the Apache Spark versions:
+$ sdk list spark
+
+To install For Spark 3
+$ sdk install spark 3.0.2
+```
+
+### Poetry
+
+Poetry [Commands](https://python-poetry.org/docs/cli/#search)
+
+```bash
+poetry install
+
+poetry update
+
+# --tree: List the dependencies as a tree.
+# --latest (-l): Show the latest version.
+# --outdated (-o): Show the latest version but only for packages that are outdated.
+poetry show -o
+```
+
+## Running Tests Locally
+
+Take a look at tests in `tests/dataquality` and `tests/jobs`
+
+```bash
+$ poetry run pytest
+```
+
+%prep
+%autosetup -n pydeequ-1.0.1
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-pydeequ -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon Apr 10 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.1-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..38b4303
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+d896c16e1e117018eb12f77f32db4a5a  pydeequ-1.0.1.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-04-10 10:52:22 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-04-10 10:52:22 +0000
commit	2879dd214574784cd15ac0ce69d86d0f0d7c12bb (patch)
tree	c4b533cd68b499d561d1176fa030c52342ea4bf5
parent	764b4b9e15be344e2773170cba0c17f15be4b80c (diff)