summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-soda-sql-core.spec552
-rw-r--r--sources1
3 files changed, 554 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..54441a3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/soda-sql-core-2.2.2.tar.gz
diff --git a/python-soda-sql-core.spec b/python-soda-sql-core.spec
new file mode 100644
index 0000000..51b7b40
--- /dev/null
+++ b/python-soda-sql-core.spec
@@ -0,0 +1,552 @@
+%global _empty_manifest_terminate_build 0
+Name: python-soda-sql-core
+Version: 2.2.2
+Release: 1
+Summary: Soda SQL library & CLI
+License: Apache Software License
+URL: https://pypi.org/project/soda-sql-core/
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/bb/5b/16a61b4e206e03f78ecea9e8bdff5f96601d119cb891777156f997b51587/soda-sql-core-2.2.2.tar.gz
+BuildArch: noarch
+
+Requires: python3-markupsafe
+Requires: python3-Jinja2
+Requires: python3-click
+Requires: python3-pyyaml
+Requires: python3-requests
+Requires: python3-Deprecated
+Requires: python3-opentelemetry-api
+Requires: python3-opentelemetry-exporter-otlp-proto-http
+Requires: python3-protobuf
+
+%description
+<p align="center"><img src="https://raw.githubusercontent.com/sodadata/docs/main/assets/images/soda-banner.png" alt="Soda logo" /></p>
+
+<h1 align="center">Soda SQL</h1>
+<p align="center"><b>Data testing, monitoring and profiling for SQL accessible data.</b></p>
+
+<p align="center">
+ <a href="https://github.com/sodadata/soda-sql/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-blue.svg" alt="License: Apache 2.0"></a>
+ <a href="https://join.slack.com/t/soda-community/shared_invite/zt-m77gajo1-nXJF7JtbbRht2zwaiLb9pg"><img alt="Slack" src="https://img.shields.io/badge/chat-slack-green.svg"></a>
+ <a href="https://pypi.org/project/soda-sql/"><img alt="Pypi Soda SQL" src="https://img.shields.io/badge/pypi-soda%20sql-green.svg"></a>
+ <a href="https://github.com/sodadata/soda-sql/actions/workflows/build.yml"><img alt="Build soda-sql" src="https://github.com/sodadata/soda-sql/actions/workflows/build.yml/badge.svg"></a>
+</p>
+
+**What does Soda SQL do?**
+
+Soda SQL allows you to
+
+ * Stop your pipeline when bad data is detected
+ * Extract metrics and column profiles through super efficient SQL
+ * Full control over metrics and queries through declarative config files
+
+**Why Soda SQL?**
+
+To protect against silent data issues for the consumers of your data,
+it's best-practice to profile and test your data:
+
+ * as it lands in your warehouse,
+ * after every important data processing step
+ * right before consumption.
+
+This way you will prevent delivery of bad data to downstream consumers.
+You will spend less time firefighting and gain a better reputation.
+
+**How does Soda SQL work?**
+
+Soda SQL is a Command Line Interface (CLI) and a Python library to measure
+and test your data using SQL.
+
+As input, Soda SQL uses YAML configuration files that include:
+ * SQL connection details
+ * What metrics to compute
+ * What tests to run on the measurements
+
+Based on those configuration files, Soda SQL will perform scans. A scan
+performs all measurements and runs all tests associated with one table. Typically
+a scan is executed after new data has arrived. All soda-sql configuration files
+can be checked into your version control system as part of your pipeline
+code.
+
+> Want to try Soda SQL? Head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get started straight away!
+
+**"[Show me the metrics](https://www.youtube.com/watch?v=1-mOKMq19zU)"**
+
+Let's walk through an example. Simple metrics and tests can be configured in scan YAML configuration
+files. An example of the contents of such a file:
+
+```yaml
+metrics:
+ - row_count
+ - missing_count
+ - missing_percentage
+ - values_count
+ - values_percentage
+ - valid_count
+ - valid_percentage
+ - invalid_count
+ - invalid_percentage
+ - min
+ - max
+ - avg
+ - sum
+ - min_length
+ - max_length
+ - avg_length
+ - distinct
+ - unique_count
+ - duplicate_count
+ - uniqueness
+ - maxs
+ - mins
+ - frequent_values
+ - histogram
+columns:
+ ID:
+ metrics:
+ - distinct
+ - duplicate_count
+ valid_format: uuid
+ tests:
+ duplicate_count == 0
+ CATEGORY:
+ missing_values:
+ - N/A
+ - No category
+ tests:
+ missing_percentage < 3
+ SIZE:
+ tests:
+ max - min < 20
+sql_metrics:
+ - sql: |
+ SELECT sum(volume) as total_volume_us
+ FROM CUSTOMER_TRANSACTIONS
+ WHERE country = 'US'
+ tests:
+ - total_volume_us > 5000
+```
+
+Based on these configuration files, Soda SQL will scan your data
+each time new data arrived like this:
+
+```bash
+$ soda scan ./soda/metrics my_warehouse my_dataset
+Soda 1.0 scan for dataset my_dataset on prod my_warehouse
+ | SELECT column_name, data_type, is_nullable
+ | FROM information_schema.columns
+ | WHERE lower(table_name) = 'customers'
+ | AND table_catalog = 'datasource.database'
+ | AND table_schema = 'datasource.schema'
+ - 0.256 seconds
+Found 4 columns: ID, NAME, CREATE_DATE, COUNTRY
+ | SELECT
+ | COUNT(*),
+ | COUNT(CASE WHEN ID IS NULL THEN 1 END),
+ | COUNT(CASE WHEN ID IS NOT NULL AND ID regexp '\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' THEN 1 END),
+ | MIN(LENGTH(ID)),
+ | AVG(LENGTH(ID)),
+ | MAX(LENGTH(ID)),
+ | FROM customers
+ - 0.557 seconds
+row_count : 23543
+missing : 23
+invalid : 0
+min_length: 9
+avg_length: 9
+max_length: 9
+
+...more queries...
+
+47 measurements computed
+23 tests executed
+All is good. No tests failed. Scan took 23.307 seconds
+```
+
+The next step is to add Soda SQL scans in your favorite
+data pipeline orchestration solution like:
+
+* Airflow
+* AWS Glue
+* Prefect
+* Dagster
+* Fivetran
+* Matillion
+* Luigi
+
+If you like the goals of this project, encourage us! Star [sodadata/soda-sql on Github](https://github.com/sodadata/soda-sql).
+
+> Next, head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get your first project going!
+
+
+%package -n python3-soda-sql-core
+Summary: Soda SQL library & CLI
+Provides: python-soda-sql-core
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-soda-sql-core
+<p align="center"><img src="https://raw.githubusercontent.com/sodadata/docs/main/assets/images/soda-banner.png" alt="Soda logo" /></p>
+
+<h1 align="center">Soda SQL</h1>
+<p align="center"><b>Data testing, monitoring and profiling for SQL accessible data.</b></p>
+
+<p align="center">
+ <a href="https://github.com/sodadata/soda-sql/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-blue.svg" alt="License: Apache 2.0"></a>
+ <a href="https://join.slack.com/t/soda-community/shared_invite/zt-m77gajo1-nXJF7JtbbRht2zwaiLb9pg"><img alt="Slack" src="https://img.shields.io/badge/chat-slack-green.svg"></a>
+ <a href="https://pypi.org/project/soda-sql/"><img alt="Pypi Soda SQL" src="https://img.shields.io/badge/pypi-soda%20sql-green.svg"></a>
+ <a href="https://github.com/sodadata/soda-sql/actions/workflows/build.yml"><img alt="Build soda-sql" src="https://github.com/sodadata/soda-sql/actions/workflows/build.yml/badge.svg"></a>
+</p>
+
+**What does Soda SQL do?**
+
+Soda SQL allows you to
+
+ * Stop your pipeline when bad data is detected
+ * Extract metrics and column profiles through super efficient SQL
+ * Full control over metrics and queries through declarative config files
+
+**Why Soda SQL?**
+
+To protect against silent data issues for the consumers of your data,
+it's best-practice to profile and test your data:
+
+ * as it lands in your warehouse,
+ * after every important data processing step
+ * right before consumption.
+
+This way you will prevent delivery of bad data to downstream consumers.
+You will spend less time firefighting and gain a better reputation.
+
+**How does Soda SQL work?**
+
+Soda SQL is a Command Line Interface (CLI) and a Python library to measure
+and test your data using SQL.
+
+As input, Soda SQL uses YAML configuration files that include:
+ * SQL connection details
+ * What metrics to compute
+ * What tests to run on the measurements
+
+Based on those configuration files, Soda SQL will perform scans. A scan
+performs all measurements and runs all tests associated with one table. Typically
+a scan is executed after new data has arrived. All soda-sql configuration files
+can be checked into your version control system as part of your pipeline
+code.
+
+> Want to try Soda SQL? Head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get started straight away!
+
+**"[Show me the metrics](https://www.youtube.com/watch?v=1-mOKMq19zU)"**
+
+Let's walk through an example. Simple metrics and tests can be configured in scan YAML configuration
+files. An example of the contents of such a file:
+
+```yaml
+metrics:
+ - row_count
+ - missing_count
+ - missing_percentage
+ - values_count
+ - values_percentage
+ - valid_count
+ - valid_percentage
+ - invalid_count
+ - invalid_percentage
+ - min
+ - max
+ - avg
+ - sum
+ - min_length
+ - max_length
+ - avg_length
+ - distinct
+ - unique_count
+ - duplicate_count
+ - uniqueness
+ - maxs
+ - mins
+ - frequent_values
+ - histogram
+columns:
+ ID:
+ metrics:
+ - distinct
+ - duplicate_count
+ valid_format: uuid
+ tests:
+ duplicate_count == 0
+ CATEGORY:
+ missing_values:
+ - N/A
+ - No category
+ tests:
+ missing_percentage < 3
+ SIZE:
+ tests:
+ max - min < 20
+sql_metrics:
+ - sql: |
+ SELECT sum(volume) as total_volume_us
+ FROM CUSTOMER_TRANSACTIONS
+ WHERE country = 'US'
+ tests:
+ - total_volume_us > 5000
+```
+
+Based on these configuration files, Soda SQL will scan your data
+each time new data arrived like this:
+
+```bash
+$ soda scan ./soda/metrics my_warehouse my_dataset
+Soda 1.0 scan for dataset my_dataset on prod my_warehouse
+ | SELECT column_name, data_type, is_nullable
+ | FROM information_schema.columns
+ | WHERE lower(table_name) = 'customers'
+ | AND table_catalog = 'datasource.database'
+ | AND table_schema = 'datasource.schema'
+ - 0.256 seconds
+Found 4 columns: ID, NAME, CREATE_DATE, COUNTRY
+ | SELECT
+ | COUNT(*),
+ | COUNT(CASE WHEN ID IS NULL THEN 1 END),
+ | COUNT(CASE WHEN ID IS NOT NULL AND ID regexp '\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' THEN 1 END),
+ | MIN(LENGTH(ID)),
+ | AVG(LENGTH(ID)),
+ | MAX(LENGTH(ID)),
+ | FROM customers
+ - 0.557 seconds
+row_count : 23543
+missing : 23
+invalid : 0
+min_length: 9
+avg_length: 9
+max_length: 9
+
+...more queries...
+
+47 measurements computed
+23 tests executed
+All is good. No tests failed. Scan took 23.307 seconds
+```
+
+The next step is to add Soda SQL scans in your favorite
+data pipeline orchestration solution like:
+
+* Airflow
+* AWS Glue
+* Prefect
+* Dagster
+* Fivetran
+* Matillion
+* Luigi
+
+If you like the goals of this project, encourage us! Star [sodadata/soda-sql on Github](https://github.com/sodadata/soda-sql).
+
+> Next, head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get your first project going!
+
+
+%package help
+Summary: Development documents and examples for soda-sql-core
+Provides: python3-soda-sql-core-doc
+%description help
+<p align="center"><img src="https://raw.githubusercontent.com/sodadata/docs/main/assets/images/soda-banner.png" alt="Soda logo" /></p>
+
+<h1 align="center">Soda SQL</h1>
+<p align="center"><b>Data testing, monitoring and profiling for SQL accessible data.</b></p>
+
+<p align="center">
+ <a href="https://github.com/sodadata/soda-sql/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-blue.svg" alt="License: Apache 2.0"></a>
+ <a href="https://join.slack.com/t/soda-community/shared_invite/zt-m77gajo1-nXJF7JtbbRht2zwaiLb9pg"><img alt="Slack" src="https://img.shields.io/badge/chat-slack-green.svg"></a>
+ <a href="https://pypi.org/project/soda-sql/"><img alt="Pypi Soda SQL" src="https://img.shields.io/badge/pypi-soda%20sql-green.svg"></a>
+ <a href="https://github.com/sodadata/soda-sql/actions/workflows/build.yml"><img alt="Build soda-sql" src="https://github.com/sodadata/soda-sql/actions/workflows/build.yml/badge.svg"></a>
+</p>
+
+**What does Soda SQL do?**
+
+Soda SQL allows you to
+
+ * Stop your pipeline when bad data is detected
+ * Extract metrics and column profiles through super efficient SQL
+ * Full control over metrics and queries through declarative config files
+
+**Why Soda SQL?**
+
+To protect against silent data issues for the consumers of your data,
+it's best-practice to profile and test your data:
+
+ * as it lands in your warehouse,
+ * after every important data processing step
+ * right before consumption.
+
+This way you will prevent delivery of bad data to downstream consumers.
+You will spend less time firefighting and gain a better reputation.
+
+**How does Soda SQL work?**
+
+Soda SQL is a Command Line Interface (CLI) and a Python library to measure
+and test your data using SQL.
+
+As input, Soda SQL uses YAML configuration files that include:
+ * SQL connection details
+ * What metrics to compute
+ * What tests to run on the measurements
+
+Based on those configuration files, Soda SQL will perform scans. A scan
+performs all measurements and runs all tests associated with one table. Typically
+a scan is executed after new data has arrived. All soda-sql configuration files
+can be checked into your version control system as part of your pipeline
+code.
+
+> Want to try Soda SQL? Head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get started straight away!
+
+**"[Show me the metrics](https://www.youtube.com/watch?v=1-mOKMq19zU)"**
+
+Let's walk through an example. Simple metrics and tests can be configured in scan YAML configuration
+files. An example of the contents of such a file:
+
+```yaml
+metrics:
+ - row_count
+ - missing_count
+ - missing_percentage
+ - values_count
+ - values_percentage
+ - valid_count
+ - valid_percentage
+ - invalid_count
+ - invalid_percentage
+ - min
+ - max
+ - avg
+ - sum
+ - min_length
+ - max_length
+ - avg_length
+ - distinct
+ - unique_count
+ - duplicate_count
+ - uniqueness
+ - maxs
+ - mins
+ - frequent_values
+ - histogram
+columns:
+ ID:
+ metrics:
+ - distinct
+ - duplicate_count
+ valid_format: uuid
+ tests:
+ duplicate_count == 0
+ CATEGORY:
+ missing_values:
+ - N/A
+ - No category
+ tests:
+ missing_percentage < 3
+ SIZE:
+ tests:
+ max - min < 20
+sql_metrics:
+ - sql: |
+ SELECT sum(volume) as total_volume_us
+ FROM CUSTOMER_TRANSACTIONS
+ WHERE country = 'US'
+ tests:
+ - total_volume_us > 5000
+```
+
+Based on these configuration files, Soda SQL will scan your data
+each time new data arrived like this:
+
+```bash
+$ soda scan ./soda/metrics my_warehouse my_dataset
+Soda 1.0 scan for dataset my_dataset on prod my_warehouse
+ | SELECT column_name, data_type, is_nullable
+ | FROM information_schema.columns
+ | WHERE lower(table_name) = 'customers'
+ | AND table_catalog = 'datasource.database'
+ | AND table_schema = 'datasource.schema'
+ - 0.256 seconds
+Found 4 columns: ID, NAME, CREATE_DATE, COUNTRY
+ | SELECT
+ | COUNT(*),
+ | COUNT(CASE WHEN ID IS NULL THEN 1 END),
+ | COUNT(CASE WHEN ID IS NOT NULL AND ID regexp '\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' THEN 1 END),
+ | MIN(LENGTH(ID)),
+ | AVG(LENGTH(ID)),
+ | MAX(LENGTH(ID)),
+ | FROM customers
+ - 0.557 seconds
+row_count : 23543
+missing : 23
+invalid : 0
+min_length: 9
+avg_length: 9
+max_length: 9
+
+...more queries...
+
+47 measurements computed
+23 tests executed
+All is good. No tests failed. Scan took 23.307 seconds
+```
+
+The next step is to add Soda SQL scans in your favorite
+data pipeline orchestration solution like:
+
+* Airflow
+* AWS Glue
+* Prefect
+* Dagster
+* Fivetran
+* Matillion
+* Luigi
+
+If you like the goals of this project, encourage us! Star [sodadata/soda-sql on Github](https://github.com/sodadata/soda-sql).
+
+> Next, head over to our ['Quick start tutorial'](https://docs.soda.io/soda-sql/getting-started/5_min_tutorial.html) and get your first project going!
+
+
+%prep
+%autosetup -n soda-sql-core-2.2.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-soda-sql-core -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 2.2.2-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..d2f2fb0
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+978d3398f6f03c9d505cda51c94db1c6 soda-sql-core-2.2.2.tar.gz