diff options
author | CoprDistGit <infra@openeuler.org> | 2023-05-05 07:29:37 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-05-05 07:29:37 +0000 |
commit | 20241071398ebd85c43d682bfacf102eab0661ba (patch) | |
tree | 1bfaab22490f539b725984bc9160e293a02fb061 | |
parent | 1c66c67f8c22baecfad56d2afc203cf640f1b9f8 (diff) |
automatic import of python-soda-sql-coreopeneuler20.03
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-soda-sql-core.spec | 552 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 554 insertions, 0 deletions
@@ -0,0 +1 @@ +/soda-sql-core-2.2.2.tar.gz diff --git a/python-soda-sql-core.spec b/python-soda-sql-core.spec new file mode 100644 index 0000000..51b7b40 --- /dev/null +++ b/python-soda-sql-core.spec @@ -0,0 +1,552 @@ +%global _empty_manifest_terminate_build 0 +Name: python-soda-sql-core +Version: 2.2.2 +Release: 1 +Summary: Soda SQL library & CLI +License: Apache Software License +URL: https://pypi.org/project/soda-sql-core/ +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/bb/5b/16a61b4e206e03f78ecea9e8bdff5f96601d119cb891777156f997b51587/soda-sql-core-2.2.2.tar.gz +BuildArch: noarch + +Requires: python3-markupsafe +Requires: python3-Jinja2 +Requires: python3-click +Requires: python3-pyyaml +Requires: python3-requests +Requires: python3-Deprecated +Requires: python3-opentelemetry-api +Requires: python3-opentelemetry-exporter-otlp-proto-http +Requires: python3-protobuf + +%description +<p align="center"><img src="https://raw.githubusercontent.com/sodadata/docs/main/assets/images/soda-banner.png" alt="Soda logo" /></p> + +<h1 align="center">Soda SQL</h1> +<p align="center"><b>Data testing, monitoring and profiling for SQL accessible data.</b></p> + +<p align="center"> + <a href="https://github.com/sodadata/soda-sql/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-blue.svg" alt="License: Apache 2.0"></a> + <a href="https://join.slack.com/t/soda-community/shared_invite/zt-m77gajo1-nXJF7JtbbRht2zwaiLb9pg"><img alt="Slack" src="https://img.shields.io/badge/chat-slack-green.svg"></a> + <a href="https://pypi.org/project/soda-sql/"><img alt="Pypi Soda SQL" src="https://img.shields.io/badge/pypi-soda%20sql-green.svg"></a> + <a href="https://github.com/sodadata/soda-sql/actions/workflows/build.yml"><img alt="Build soda-sql" src="https://github.com/sodadata/soda-sql/actions/workflows/build.yml/badge.svg"></a> +</p> + +**What does Soda SQL do?** + +Soda SQL allows you to + + * Stop your pipeline when bad data is detected + * Extract metrics and column profiles through super efficient SQL + * Full control over metrics and queries through declarative config files + +**Why Soda SQL?** + +To protect against silent data issues for the consumers of your data, +it's best-practice to profile and test your data: + + * as it lands in your warehouse, + * after every important data processing step + * right before consumption. + +This way you will prevent delivery of bad data to downstream consumers. +You will spend less time firefighting and gain a better reputation. + +**How does Soda SQL work?** + +Soda SQL is a Command Line Interface (CLI) and a Python library to measure +and test your data using SQL. + +As input, Soda SQL uses YAML configuration files that include: + * SQL connection details + * What metrics to compute + * What tests to run on the measurements + +Based on those configuration files, Soda SQL will perform scans. A scan +performs all measurements and runs all tests associated with one table. Typically +a scan is executed after new data has arrived. All soda-sql configuration files +can be checked into your version control system as part of your pipeline +code. + +> Want to try Soda SQL? Head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get started straight away! + +**"[Show me the metrics](https://www.youtube.com/watch?v=1-mOKMq19zU)"** + +Let's walk through an example. Simple metrics and tests can be configured in scan YAML configuration +files. An example of the contents of such a file: + +```yaml +metrics: + - row_count + - missing_count + - missing_percentage + - values_count + - values_percentage + - valid_count + - valid_percentage + - invalid_count + - invalid_percentage + - min + - max + - avg + - sum + - min_length + - max_length + - avg_length + - distinct + - unique_count + - duplicate_count + - uniqueness + - maxs + - mins + - frequent_values + - histogram +columns: + ID: + metrics: + - distinct + - duplicate_count + valid_format: uuid + tests: + duplicate_count == 0 + CATEGORY: + missing_values: + - N/A + - No category + tests: + missing_percentage < 3 + SIZE: + tests: + max - min < 20 +sql_metrics: + - sql: | + SELECT sum(volume) as total_volume_us + FROM CUSTOMER_TRANSACTIONS + WHERE country = 'US' + tests: + - total_volume_us > 5000 +``` + +Based on these configuration files, Soda SQL will scan your data +each time new data arrived like this: + +```bash +$ soda scan ./soda/metrics my_warehouse my_dataset +Soda 1.0 scan for dataset my_dataset on prod my_warehouse + | SELECT column_name, data_type, is_nullable + | FROM information_schema.columns + | WHERE lower(table_name) = 'customers' + | AND table_catalog = 'datasource.database' + | AND table_schema = 'datasource.schema' + - 0.256 seconds +Found 4 columns: ID, NAME, CREATE_DATE, COUNTRY + | SELECT + | COUNT(*), + | COUNT(CASE WHEN ID IS NULL THEN 1 END), + | COUNT(CASE WHEN ID IS NOT NULL AND ID regexp '\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' THEN 1 END), + | MIN(LENGTH(ID)), + | AVG(LENGTH(ID)), + | MAX(LENGTH(ID)), + | FROM customers + - 0.557 seconds +row_count : 23543 +missing : 23 +invalid : 0 +min_length: 9 +avg_length: 9 +max_length: 9 + +...more queries... + +47 measurements computed +23 tests executed +All is good. No tests failed. Scan took 23.307 seconds +``` + +The next step is to add Soda SQL scans in your favorite +data pipeline orchestration solution like: + +* Airflow +* AWS Glue +* Prefect +* Dagster +* Fivetran +* Matillion +* Luigi + +If you like the goals of this project, encourage us! Star [sodadata/soda-sql on Github](https://github.com/sodadata/soda-sql). + +> Next, head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get your first project going! + + +%package -n python3-soda-sql-core +Summary: Soda SQL library & CLI +Provides: python-soda-sql-core +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-soda-sql-core +<p align="center"><img src="https://raw.githubusercontent.com/sodadata/docs/main/assets/images/soda-banner.png" alt="Soda logo" /></p> + +<h1 align="center">Soda SQL</h1> +<p align="center"><b>Data testing, monitoring and profiling for SQL accessible data.</b></p> + +<p align="center"> + <a href="https://github.com/sodadata/soda-sql/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-blue.svg" alt="License: Apache 2.0"></a> + <a href="https://join.slack.com/t/soda-community/shared_invite/zt-m77gajo1-nXJF7JtbbRht2zwaiLb9pg"><img alt="Slack" src="https://img.shields.io/badge/chat-slack-green.svg"></a> + <a href="https://pypi.org/project/soda-sql/"><img alt="Pypi Soda SQL" src="https://img.shields.io/badge/pypi-soda%20sql-green.svg"></a> + <a href="https://github.com/sodadata/soda-sql/actions/workflows/build.yml"><img alt="Build soda-sql" src="https://github.com/sodadata/soda-sql/actions/workflows/build.yml/badge.svg"></a> +</p> + +**What does Soda SQL do?** + +Soda SQL allows you to + + * Stop your pipeline when bad data is detected + * Extract metrics and column profiles through super efficient SQL + * Full control over metrics and queries through declarative config files + +**Why Soda SQL?** + +To protect against silent data issues for the consumers of your data, +it's best-practice to profile and test your data: + + * as it lands in your warehouse, + * after every important data processing step + * right before consumption. + +This way you will prevent delivery of bad data to downstream consumers. +You will spend less time firefighting and gain a better reputation. + +**How does Soda SQL work?** + +Soda SQL is a Command Line Interface (CLI) and a Python library to measure +and test your data using SQL. + +As input, Soda SQL uses YAML configuration files that include: + * SQL connection details + * What metrics to compute + * What tests to run on the measurements + +Based on those configuration files, Soda SQL will perform scans. A scan +performs all measurements and runs all tests associated with one table. Typically +a scan is executed after new data has arrived. All soda-sql configuration files +can be checked into your version control system as part of your pipeline +code. + +> Want to try Soda SQL? Head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get started straight away! + +**"[Show me the metrics](https://www.youtube.com/watch?v=1-mOKMq19zU)"** + +Let's walk through an example. Simple metrics and tests can be configured in scan YAML configuration +files. An example of the contents of such a file: + +```yaml +metrics: + - row_count + - missing_count + - missing_percentage + - values_count + - values_percentage + - valid_count + - valid_percentage + - invalid_count + - invalid_percentage + - min + - max + - avg + - sum + - min_length + - max_length + - avg_length + - distinct + - unique_count + - duplicate_count + - uniqueness + - maxs + - mins + - frequent_values + - histogram +columns: + ID: + metrics: + - distinct + - duplicate_count + valid_format: uuid + tests: + duplicate_count == 0 + CATEGORY: + missing_values: + - N/A + - No category + tests: + missing_percentage < 3 + SIZE: + tests: + max - min < 20 +sql_metrics: + - sql: | + SELECT sum(volume) as total_volume_us + FROM CUSTOMER_TRANSACTIONS + WHERE country = 'US' + tests: + - total_volume_us > 5000 +``` + +Based on these configuration files, Soda SQL will scan your data +each time new data arrived like this: + +```bash +$ soda scan ./soda/metrics my_warehouse my_dataset +Soda 1.0 scan for dataset my_dataset on prod my_warehouse + | SELECT column_name, data_type, is_nullable + | FROM information_schema.columns + | WHERE lower(table_name) = 'customers' + | AND table_catalog = 'datasource.database' + | AND table_schema = 'datasource.schema' + - 0.256 seconds +Found 4 columns: ID, NAME, CREATE_DATE, COUNTRY + | SELECT + | COUNT(*), + | COUNT(CASE WHEN ID IS NULL THEN 1 END), + | COUNT(CASE WHEN ID IS NOT NULL AND ID regexp '\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' THEN 1 END), + | MIN(LENGTH(ID)), + | AVG(LENGTH(ID)), + | MAX(LENGTH(ID)), + | FROM customers + - 0.557 seconds +row_count : 23543 +missing : 23 +invalid : 0 +min_length: 9 +avg_length: 9 +max_length: 9 + +...more queries... + +47 measurements computed +23 tests executed +All is good. No tests failed. Scan took 23.307 seconds +``` + +The next step is to add Soda SQL scans in your favorite +data pipeline orchestration solution like: + +* Airflow +* AWS Glue +* Prefect +* Dagster +* Fivetran +* Matillion +* Luigi + +If you like the goals of this project, encourage us! Star [sodadata/soda-sql on Github](https://github.com/sodadata/soda-sql). + +> Next, head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get your first project going! + + +%package help +Summary: Development documents and examples for soda-sql-core +Provides: python3-soda-sql-core-doc +%description help +<p align="center"><img src="https://raw.githubusercontent.com/sodadata/docs/main/assets/images/soda-banner.png" alt="Soda logo" /></p> + +<h1 align="center">Soda SQL</h1> +<p align="center"><b>Data testing, monitoring and profiling for SQL accessible data.</b></p> + +<p align="center"> + <a href="https://github.com/sodadata/soda-sql/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-blue.svg" alt="License: Apache 2.0"></a> + <a href="https://join.slack.com/t/soda-community/shared_invite/zt-m77gajo1-nXJF7JtbbRht2zwaiLb9pg"><img alt="Slack" src="https://img.shields.io/badge/chat-slack-green.svg"></a> + <a href="https://pypi.org/project/soda-sql/"><img alt="Pypi Soda SQL" src="https://img.shields.io/badge/pypi-soda%20sql-green.svg"></a> + <a href="https://github.com/sodadata/soda-sql/actions/workflows/build.yml"><img alt="Build soda-sql" src="https://github.com/sodadata/soda-sql/actions/workflows/build.yml/badge.svg"></a> +</p> + +**What does Soda SQL do?** + +Soda SQL allows you to + + * Stop your pipeline when bad data is detected + * Extract metrics and column profiles through super efficient SQL + * Full control over metrics and queries through declarative config files + +**Why Soda SQL?** + +To protect against silent data issues for the consumers of your data, +it's best-practice to profile and test your data: + + * as it lands in your warehouse, + * after every important data processing step + * right before consumption. + +This way you will prevent delivery of bad data to downstream consumers. +You will spend less time firefighting and gain a better reputation. + +**How does Soda SQL work?** + +Soda SQL is a Command Line Interface (CLI) and a Python library to measure +and test your data using SQL. + +As input, Soda SQL uses YAML configuration files that include: + * SQL connection details + * What metrics to compute + * What tests to run on the measurements + +Based on those configuration files, Soda SQL will perform scans. A scan +performs all measurements and runs all tests associated with one table. Typically +a scan is executed after new data has arrived. All soda-sql configuration files +can be checked into your version control system as part of your pipeline +code. + +> Want to try Soda SQL? Head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get started straight away! + +**"[Show me the metrics](https://www.youtube.com/watch?v=1-mOKMq19zU)"** + +Let's walk through an example. Simple metrics and tests can be configured in scan YAML configuration +files. An example of the contents of such a file: + +```yaml +metrics: + - row_count + - missing_count + - missing_percentage + - values_count + - values_percentage + - valid_count + - valid_percentage + - invalid_count + - invalid_percentage + - min + - max + - avg + - sum + - min_length + - max_length + - avg_length + - distinct + - unique_count + - duplicate_count + - uniqueness + - maxs + - mins + - frequent_values + - histogram +columns: + ID: + metrics: + - distinct + - duplicate_count + valid_format: uuid + tests: + duplicate_count == 0 + CATEGORY: + missing_values: + - N/A + - No category + tests: + missing_percentage < 3 + SIZE: + tests: + max - min < 20 +sql_metrics: + - sql: | + SELECT sum(volume) as total_volume_us + FROM CUSTOMER_TRANSACTIONS + WHERE country = 'US' + tests: + - total_volume_us > 5000 +``` + +Based on these configuration files, Soda SQL will scan your data +each time new data arrived like this: + +```bash +$ soda scan ./soda/metrics my_warehouse my_dataset +Soda 1.0 scan for dataset my_dataset on prod my_warehouse + | SELECT column_name, data_type, is_nullable + | FROM information_schema.columns + | WHERE lower(table_name) = 'customers' + | AND table_catalog = 'datasource.database' + | AND table_schema = 'datasource.schema' + - 0.256 seconds +Found 4 columns: ID, NAME, CREATE_DATE, COUNTRY + | SELECT + | COUNT(*), + | COUNT(CASE WHEN ID IS NULL THEN 1 END), + | COUNT(CASE WHEN ID IS NOT NULL AND ID regexp '\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' THEN 1 END), + | MIN(LENGTH(ID)), + | AVG(LENGTH(ID)), + | MAX(LENGTH(ID)), + | FROM customers + - 0.557 seconds +row_count : 23543 +missing : 23 +invalid : 0 +min_length: 9 +avg_length: 9 +max_length: 9 + +...more queries... + +47 measurements computed +23 tests executed +All is good. No tests failed. Scan took 23.307 seconds +``` + +The next step is to add Soda SQL scans in your favorite +data pipeline orchestration solution like: + +* Airflow +* AWS Glue +* Prefect +* Dagster +* Fivetran +* Matillion +* Luigi + +If you like the goals of this project, encourage us! Star [sodadata/soda-sql on Github](https://github.com/sodadata/soda-sql). + +> Next, head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get your first project going! + + +%prep +%autosetup -n soda-sql-core-2.2.2 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-soda-sql-core -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 2.2.2-1 +- Package Spec generated @@ -0,0 +1 @@ +978d3398f6f03c9d505cda51c94db1c6 soda-sql-core-2.2.2.tar.gz |