%global _empty_manifest_terminate_build 0 Name: python-soda-sql-core Version: 2.2.2 Release: 1 Summary: Soda SQL library & CLI License: Apache Software License URL: https://pypi.org/project/soda-sql-core/ Source0: https://mirrors.nju.edu.cn/pypi/web/packages/bb/5b/16a61b4e206e03f78ecea9e8bdff5f96601d119cb891777156f997b51587/soda-sql-core-2.2.2.tar.gz BuildArch: noarch Requires: python3-markupsafe Requires: python3-Jinja2 Requires: python3-click Requires: python3-pyyaml Requires: python3-requests Requires: python3-Deprecated Requires: python3-opentelemetry-api Requires: python3-opentelemetry-exporter-otlp-proto-http Requires: python3-protobuf %description
Data testing, monitoring and profiling for SQL accessible data.
**What does Soda SQL do?** Soda SQL allows you to * Stop your pipeline when bad data is detected * Extract metrics and column profiles through super efficient SQL * Full control over metrics and queries through declarative config files **Why Soda SQL?** To protect against silent data issues for the consumers of your data, it's best-practice to profile and test your data: * as it lands in your warehouse, * after every important data processing step * right before consumption. This way you will prevent delivery of bad data to downstream consumers. You will spend less time firefighting and gain a better reputation. **How does Soda SQL work?** Soda SQL is a Command Line Interface (CLI) and a Python library to measure and test your data using SQL. As input, Soda SQL uses YAML configuration files that include: * SQL connection details * What metrics to compute * What tests to run on the measurements Based on those configuration files, Soda SQL will perform scans. A scan performs all measurements and runs all tests associated with one table. Typically a scan is executed after new data has arrived. All soda-sql configuration files can be checked into your version control system as part of your pipeline code. > Want to try Soda SQL? Head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get started straight away! **"[Show me the metrics](https://www.youtube.com/watch?v=1-mOKMq19zU)"** Let's walk through an example. Simple metrics and tests can be configured in scan YAML configuration files. An example of the contents of such a file: ```yaml metrics: - row_count - missing_count - missing_percentage - values_count - values_percentage - valid_count - valid_percentage - invalid_count - invalid_percentage - min - max - avg - sum - min_length - max_length - avg_length - distinct - unique_count - duplicate_count - uniqueness - maxs - mins - frequent_values - histogram columns: ID: metrics: - distinct - duplicate_count valid_format: uuid tests: duplicate_count == 0 CATEGORY: missing_values: - N/A - No category tests: missing_percentage < 3 SIZE: tests: max - min < 20 sql_metrics: - sql: | SELECT sum(volume) as total_volume_us FROM CUSTOMER_TRANSACTIONS WHERE country = 'US' tests: - total_volume_us > 5000 ``` Based on these configuration files, Soda SQL will scan your data each time new data arrived like this: ```bash $ soda scan ./soda/metrics my_warehouse my_dataset Soda 1.0 scan for dataset my_dataset on prod my_warehouse | SELECT column_name, data_type, is_nullable | FROM information_schema.columns | WHERE lower(table_name) = 'customers' | AND table_catalog = 'datasource.database' | AND table_schema = 'datasource.schema' - 0.256 seconds Found 4 columns: ID, NAME, CREATE_DATE, COUNTRY | SELECT | COUNT(*), | COUNT(CASE WHEN ID IS NULL THEN 1 END), | COUNT(CASE WHEN ID IS NOT NULL AND ID regexp '\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' THEN 1 END), | MIN(LENGTH(ID)), | AVG(LENGTH(ID)), | MAX(LENGTH(ID)), | FROM customers - 0.557 seconds row_count : 23543 missing : 23 invalid : 0 min_length: 9 avg_length: 9 max_length: 9 ...more queries... 47 measurements computed 23 tests executed All is good. No tests failed. Scan took 23.307 seconds ``` The next step is to add Soda SQL scans in your favorite data pipeline orchestration solution like: * Airflow * AWS Glue * Prefect * Dagster * Fivetran * Matillion * Luigi If you like the goals of this project, encourage us! Star [sodadata/soda-sql on Github](https://github.com/sodadata/soda-sql). > Next, head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get your first project going! %package -n python3-soda-sql-core Summary: Soda SQL library & CLI Provides: python-soda-sql-core BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-soda-sql-coreData testing, monitoring and profiling for SQL accessible data.
**What does Soda SQL do?** Soda SQL allows you to * Stop your pipeline when bad data is detected * Extract metrics and column profiles through super efficient SQL * Full control over metrics and queries through declarative config files **Why Soda SQL?** To protect against silent data issues for the consumers of your data, it's best-practice to profile and test your data: * as it lands in your warehouse, * after every important data processing step * right before consumption. This way you will prevent delivery of bad data to downstream consumers. You will spend less time firefighting and gain a better reputation. **How does Soda SQL work?** Soda SQL is a Command Line Interface (CLI) and a Python library to measure and test your data using SQL. As input, Soda SQL uses YAML configuration files that include: * SQL connection details * What metrics to compute * What tests to run on the measurements Based on those configuration files, Soda SQL will perform scans. A scan performs all measurements and runs all tests associated with one table. Typically a scan is executed after new data has arrived. All soda-sql configuration files can be checked into your version control system as part of your pipeline code. > Want to try Soda SQL? Head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get started straight away! **"[Show me the metrics](https://www.youtube.com/watch?v=1-mOKMq19zU)"** Let's walk through an example. Simple metrics and tests can be configured in scan YAML configuration files. An example of the contents of such a file: ```yaml metrics: - row_count - missing_count - missing_percentage - values_count - values_percentage - valid_count - valid_percentage - invalid_count - invalid_percentage - min - max - avg - sum - min_length - max_length - avg_length - distinct - unique_count - duplicate_count - uniqueness - maxs - mins - frequent_values - histogram columns: ID: metrics: - distinct - duplicate_count valid_format: uuid tests: duplicate_count == 0 CATEGORY: missing_values: - N/A - No category tests: missing_percentage < 3 SIZE: tests: max - min < 20 sql_metrics: - sql: | SELECT sum(volume) as total_volume_us FROM CUSTOMER_TRANSACTIONS WHERE country = 'US' tests: - total_volume_us > 5000 ``` Based on these configuration files, Soda SQL will scan your data each time new data arrived like this: ```bash $ soda scan ./soda/metrics my_warehouse my_dataset Soda 1.0 scan for dataset my_dataset on prod my_warehouse | SELECT column_name, data_type, is_nullable | FROM information_schema.columns | WHERE lower(table_name) = 'customers' | AND table_catalog = 'datasource.database' | AND table_schema = 'datasource.schema' - 0.256 seconds Found 4 columns: ID, NAME, CREATE_DATE, COUNTRY | SELECT | COUNT(*), | COUNT(CASE WHEN ID IS NULL THEN 1 END), | COUNT(CASE WHEN ID IS NOT NULL AND ID regexp '\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' THEN 1 END), | MIN(LENGTH(ID)), | AVG(LENGTH(ID)), | MAX(LENGTH(ID)), | FROM customers - 0.557 seconds row_count : 23543 missing : 23 invalid : 0 min_length: 9 avg_length: 9 max_length: 9 ...more queries... 47 measurements computed 23 tests executed All is good. No tests failed. Scan took 23.307 seconds ``` The next step is to add Soda SQL scans in your favorite data pipeline orchestration solution like: * Airflow * AWS Glue * Prefect * Dagster * Fivetran * Matillion * Luigi If you like the goals of this project, encourage us! Star [sodadata/soda-sql on Github](https://github.com/sodadata/soda-sql). > Next, head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get your first project going! %package help Summary: Development documents and examples for soda-sql-core Provides: python3-soda-sql-core-doc %description helpData testing, monitoring and profiling for SQL accessible data.
**What does Soda SQL do?** Soda SQL allows you to * Stop your pipeline when bad data is detected * Extract metrics and column profiles through super efficient SQL * Full control over metrics and queries through declarative config files **Why Soda SQL?** To protect against silent data issues for the consumers of your data, it's best-practice to profile and test your data: * as it lands in your warehouse, * after every important data processing step * right before consumption. This way you will prevent delivery of bad data to downstream consumers. You will spend less time firefighting and gain a better reputation. **How does Soda SQL work?** Soda SQL is a Command Line Interface (CLI) and a Python library to measure and test your data using SQL. As input, Soda SQL uses YAML configuration files that include: * SQL connection details * What metrics to compute * What tests to run on the measurements Based on those configuration files, Soda SQL will perform scans. A scan performs all measurements and runs all tests associated with one table. Typically a scan is executed after new data has arrived. All soda-sql configuration files can be checked into your version control system as part of your pipeline code. > Want to try Soda SQL? Head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get started straight away! **"[Show me the metrics](https://www.youtube.com/watch?v=1-mOKMq19zU)"** Let's walk through an example. Simple metrics and tests can be configured in scan YAML configuration files. An example of the contents of such a file: ```yaml metrics: - row_count - missing_count - missing_percentage - values_count - values_percentage - valid_count - valid_percentage - invalid_count - invalid_percentage - min - max - avg - sum - min_length - max_length - avg_length - distinct - unique_count - duplicate_count - uniqueness - maxs - mins - frequent_values - histogram columns: ID: metrics: - distinct - duplicate_count valid_format: uuid tests: duplicate_count == 0 CATEGORY: missing_values: - N/A - No category tests: missing_percentage < 3 SIZE: tests: max - min < 20 sql_metrics: - sql: | SELECT sum(volume) as total_volume_us FROM CUSTOMER_TRANSACTIONS WHERE country = 'US' tests: - total_volume_us > 5000 ``` Based on these configuration files, Soda SQL will scan your data each time new data arrived like this: ```bash $ soda scan ./soda/metrics my_warehouse my_dataset Soda 1.0 scan for dataset my_dataset on prod my_warehouse | SELECT column_name, data_type, is_nullable | FROM information_schema.columns | WHERE lower(table_name) = 'customers' | AND table_catalog = 'datasource.database' | AND table_schema = 'datasource.schema' - 0.256 seconds Found 4 columns: ID, NAME, CREATE_DATE, COUNTRY | SELECT | COUNT(*), | COUNT(CASE WHEN ID IS NULL THEN 1 END), | COUNT(CASE WHEN ID IS NOT NULL AND ID regexp '\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' THEN 1 END), | MIN(LENGTH(ID)), | AVG(LENGTH(ID)), | MAX(LENGTH(ID)), | FROM customers - 0.557 seconds row_count : 23543 missing : 23 invalid : 0 min_length: 9 avg_length: 9 max_length: 9 ...more queries... 47 measurements computed 23 tests executed All is good. No tests failed. Scan took 23.307 seconds ``` The next step is to add Soda SQL scans in your favorite data pipeline orchestration solution like: * Airflow * AWS Glue * Prefect * Dagster * Fivetran * Matillion * Luigi If you like the goals of this project, encourage us! Star [sodadata/soda-sql on Github](https://github.com/sodadata/soda-sql). > Next, head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get your first project going! %prep %autosetup -n soda-sql-core-2.2.2 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-soda-sql-core -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Fri May 05 2023 Python_Bot