%global _empty_manifest_terminate_build 0 Name: python-pyspark-test Version: 0.2.0 Release: 1 Summary: Check that left and right spark DataFrame are equal. License: Apache Software License (Apache 2.0) URL: https://github.com/debugger24/pyspark-test Source0: https://mirrors.nju.edu.cn/pypi/web/packages/f8/a9/3ca6c0f3289da348d25693adb4f80e3d8b2389dea603f222feae4dd78e76/pyspark_test-0.2.0.tar.gz BuildArch: noarch Requires: python3-pyspark %description # pyspark-test [![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Unit Test](https://github.com/debugger24/pyspark-test/workflows/Unit%20Test/badge.svg?branch=main)](https://github.com/debugger24/pyspark-test/actions?query=workflow%3A%22Unit+Test%22) [![PyPI version](https://badge.fury.io/py/pyspark-test.svg)](https://badge.fury.io/py/pyspark-test) [![Downloads](https://pepy.tech/badge/pyspark-test)](https://pepy.tech/project/pyspark-test) Check that left and right spark DataFrame are equal. This function is intended to compare two spark DataFrames and output any differences. It is inspired from pandas testing module but for pyspark, and for use in unit tests. Additional parameters allow varying the strictness of the equality checks performed. # Installation ``` pip install pyspark-test ``` # Usage ```py assert_pyspark_df_equal(left_df, actual_df) ``` ## Additional Arguments * `check_dtype` : To compare the data types of spark dataframe. Default true * `check_column_names` : To compare column names. Default false. Not required of we are checking data types. * `check_columns_in_order` : To check the columns should be in order or not. Default to false * `order_by` : Column names with which dataframe must be sorted before comparing. Default None. # Example ```py import datetime from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql.types import * from pyspark_test import assert_pyspark_df_equal sc = SparkContext.getOrCreate(conf=conf) spark_session = SparkSession(sc) df_1 = spark_session.createDataFrame( data=[ [datetime.date(2020, 1, 1), 'demo', 1.123, 10], [None, None, None, None], ], schema=StructType( [ StructField('col_a', DateType(), True), StructField('col_b', StringType(), True), StructField('col_c', DoubleType(), True), StructField('col_d', LongType(), True), ] ), ) df_2 = spark_session.createDataFrame( data=[ [datetime.date(2020, 1, 1), 'demo', 1.123, 10], [None, None, None, None], ], schema=StructType( [ StructField('col_a', DateType(), True), StructField('col_b', StringType(), True), StructField('col_c', DoubleType(), True), StructField('col_d', LongType(), True), ] ), ) assert_pyspark_df_equal(df_1, df_2) ``` %package -n python3-pyspark-test Summary: Check that left and right spark DataFrame are equal. Provides: python-pyspark-test BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-pyspark-test # pyspark-test [![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Unit Test](https://github.com/debugger24/pyspark-test/workflows/Unit%20Test/badge.svg?branch=main)](https://github.com/debugger24/pyspark-test/actions?query=workflow%3A%22Unit+Test%22) [![PyPI version](https://badge.fury.io/py/pyspark-test.svg)](https://badge.fury.io/py/pyspark-test) [![Downloads](https://pepy.tech/badge/pyspark-test)](https://pepy.tech/project/pyspark-test) Check that left and right spark DataFrame are equal. This function is intended to compare two spark DataFrames and output any differences. It is inspired from pandas testing module but for pyspark, and for use in unit tests. Additional parameters allow varying the strictness of the equality checks performed. # Installation ``` pip install pyspark-test ``` # Usage ```py assert_pyspark_df_equal(left_df, actual_df) ``` ## Additional Arguments * `check_dtype` : To compare the data types of spark dataframe. Default true * `check_column_names` : To compare column names. Default false. Not required of we are checking data types. * `check_columns_in_order` : To check the columns should be in order or not. Default to false * `order_by` : Column names with which dataframe must be sorted before comparing. Default None. # Example ```py import datetime from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql.types import * from pyspark_test import assert_pyspark_df_equal sc = SparkContext.getOrCreate(conf=conf) spark_session = SparkSession(sc) df_1 = spark_session.createDataFrame( data=[ [datetime.date(2020, 1, 1), 'demo', 1.123, 10], [None, None, None, None], ], schema=StructType( [ StructField('col_a', DateType(), True), StructField('col_b', StringType(), True), StructField('col_c', DoubleType(), True), StructField('col_d', LongType(), True), ] ), ) df_2 = spark_session.createDataFrame( data=[ [datetime.date(2020, 1, 1), 'demo', 1.123, 10], [None, None, None, None], ], schema=StructType( [ StructField('col_a', DateType(), True), StructField('col_b', StringType(), True), StructField('col_c', DoubleType(), True), StructField('col_d', LongType(), True), ] ), ) assert_pyspark_df_equal(df_1, df_2) ``` %package help Summary: Development documents and examples for pyspark-test Provides: python3-pyspark-test-doc %description help # pyspark-test [![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Unit Test](https://github.com/debugger24/pyspark-test/workflows/Unit%20Test/badge.svg?branch=main)](https://github.com/debugger24/pyspark-test/actions?query=workflow%3A%22Unit+Test%22) [![PyPI version](https://badge.fury.io/py/pyspark-test.svg)](https://badge.fury.io/py/pyspark-test) [![Downloads](https://pepy.tech/badge/pyspark-test)](https://pepy.tech/project/pyspark-test) Check that left and right spark DataFrame are equal. This function is intended to compare two spark DataFrames and output any differences. It is inspired from pandas testing module but for pyspark, and for use in unit tests. Additional parameters allow varying the strictness of the equality checks performed. # Installation ``` pip install pyspark-test ``` # Usage ```py assert_pyspark_df_equal(left_df, actual_df) ``` ## Additional Arguments * `check_dtype` : To compare the data types of spark dataframe. Default true * `check_column_names` : To compare column names. Default false. Not required of we are checking data types. * `check_columns_in_order` : To check the columns should be in order or not. Default to false * `order_by` : Column names with which dataframe must be sorted before comparing. Default None. # Example ```py import datetime from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql.types import * from pyspark_test import assert_pyspark_df_equal sc = SparkContext.getOrCreate(conf=conf) spark_session = SparkSession(sc) df_1 = spark_session.createDataFrame( data=[ [datetime.date(2020, 1, 1), 'demo', 1.123, 10], [None, None, None, None], ], schema=StructType( [ StructField('col_a', DateType(), True), StructField('col_b', StringType(), True), StructField('col_c', DoubleType(), True), StructField('col_d', LongType(), True), ] ), ) df_2 = spark_session.createDataFrame( data=[ [datetime.date(2020, 1, 1), 'demo', 1.123, 10], [None, None, None, None], ], schema=StructType( [ StructField('col_a', DateType(), True), StructField('col_b', StringType(), True), StructField('col_c', DoubleType(), True), StructField('col_d', LongType(), True), ] ), ) assert_pyspark_df_equal(df_1, df_2) ``` %prep %autosetup -n pyspark-test-0.2.0 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-pyspark-test -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Sun Apr 23 2023 Python_Bot - 0.2.0-1 - Package Spec generated