summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-29 13:04:58 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-29 13:04:58 +0000
commit56e45a88b06ba09b25a6eb764f342b48e756cd5a (patch)
tree81ff5d1f4ff71a2123d63e8df46939f8e967cbc6
parent6c29cd042049805f68aa9c91e1118ace8cb4afbf (diff)
automatic import of python-gcp-airflow-foundations-dev
-rw-r--r--.gitignore1
-rw-r--r--python-gcp-airflow-foundations-dev.spec258
-rw-r--r--sources1
3 files changed, 260 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..c9eba06 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/gcp-airflow-foundations-dev-11.1.tar.gz
diff --git a/python-gcp-airflow-foundations-dev.spec b/python-gcp-airflow-foundations-dev.spec
new file mode 100644
index 0000000..cd0fdb9
--- /dev/null
+++ b/python-gcp-airflow-foundations-dev.spec
@@ -0,0 +1,258 @@
+%global _empty_manifest_terminate_build 0
+Name: python-gcp-airflow-foundations-dev
+Version: 11.1
+Release: 1
+Summary: Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery data warehouse
+License: Apache 2.0
+URL: https://github.com/badal-io/gcp-airflow-foundations
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/fe/f8/34a85459324b090d0b8d3b22ddb3b3dfc42e7e6ed7d6e4ca34f2a222c8d4/gcp-airflow-foundations-dev-11.1.tar.gz
+BuildArch: noarch
+
+
+%description
+# gcp-airflow-foundations
+[![PyPI version](https://badge.fury.io/py/gcp-airflow-foundations.svg)](https://badge.fury.io/py/gcp-airflow-foundations)
+[![Cloud Build Status](https://storage.googleapis.com/public-cloudbuild/build/cloudbuild_status.svg)](https://storage.googleapis.com/public-cloudbuild/build/cloudbuild_status.svg)
+[![Documentation Status](https://readthedocs.org/projects/gcp-airflow-foundations/badge/?version=latest)](https://gcp-airflow-foundations.readthedocs.io/en/latest/?badge=latest)
+
+
+![airflow](./docs/_static/airflow_diagram.png)
+
+Airflow is an awesome open source orchestration framework that is the go-to for building data ingestion pipelines on GCP (using Composer - a hosted AIrflow service). However, most companies using it face the same set of problems
+- **Learning curve**: Airflow requires python knowledge and has some gotchas that take time to learn. Further, writing Python DAGs for every single table that needs to get ingested becomes cumbersome. Most companies end up building utilities for creating DAGs out of configuration files to simplify DAG creation and to allow non-developers to configure ingestion
+- **Datalake and data pipelines design best practices**: Airflow only provides the building blocks, users are still required to understand and implement the nuances of building a proper ingestion pipelines for the data lake/data warehouse platform they are using
+- **Core reusability and best practice enforcement across the enterprise**: Usually each team maintains its own Airflow source code and deployment
+
+We have written an opinionated yet flexible ingestion framework for building an ingestion pipeline into data warehouse in BigQuery that supports the following features:
+
+- **Zero-code**, config file based ingestion - anybody can start ingesting from the growing number of sources by just providing a simple configuration file. Zero python or Airflow knowledge is required.
+- **Modular and extendable** - The core of the framework is a lightweight library. Ingestion sources are added as plugins. Adding a new source can be done by extending the provided base classes.
+- **Opinionated automatic creation of ODS (Operational Data Store ) and HDS (Historical Data Store)** in BigQuery while enforcing best practices such as schema migration, data quality validation, idempotency, partitioning, etc.
+- **Dataflow job support** for ingesting large datasets from SQL sources and deploying jobs into a specific network or shared VPC.
+- Support of **advanced Airflow features** for job prioritization such as slots and priorities.
+- Integration with **GCP data services** such as DLP and Data Catalog [work in progress].
+- **Well tested** - We maintain a rich suite of both unit and integration tests.
+
+## Installing from PyPI
+```bash
+pip install 'gcp-airflow-foundations'
+```
+
+## Full Documentation
+See the [gcp-airflow-foundations documentation](https://gcp-airflow-foundations.readthedocs.io/en/latest/) for more details.
+
+## Running locally
+
+### Sample DAGs
+Sample DAGs that ingest publicly available GCS files can be found in the dags folder, and are started as soon Airflow is ran locally. In order to have them successfully run please ensure the following:
+- Enable: BigQuery, Cloud Storage, Cloud DLP, Data Catalog API's
+- Create a BigQuery Dataset for the HDS and ODS
+- Create a DLP Inspect template in DLP
+- Create a policy tag in Data Catalog
+- Update the gcp_project, location, dataset values, dlp config and policytag configs with your newly created values
+
+### Using Service Account
+- Create a service account in GCP, and save it as ```helpers/key/keys.json``` (don't worry, it is in .gitignore, and will not be push to the git repo)
+- Run Airflow locally (Airflow UI will be accessible at http://localhost:8080): ```docker-compose up```
+- Default authentication values for the Airflow UI are provided in lines 96, 97 of ```docker-composer.yaml```
+### Using user IAM
+- uncomment line 11 in ```docker-composer.yaml```
+- send env var PROJECT_ID to your test project
+- Authorize gcloud to access the Cloud Platform with Google user credentials: ```helpers/scripts/gcp-auth.sh```
+- Run Airflow locally (Airflow UI will be accessible at http://localhost:8080): ```docker-compose up```
+- Default authentication values for the Airflow UI are provided in lines 96, 97 of ```docker-composer.yaml```
+### Running tests
+- Run unit tests ```./tests/airflow "pytest tests/unit```
+- Run unit tests with coverage report ```./tests/airflow "pytest --cov=gcp_airflow_foundations tests/unit```
+- Run integration tests ```./tests/airflow "pytest tests/integration```
+- Rebuild docker image if requirements changed: ```docker-compose build```
+
+# Contributing
+## Install pre-commit hook
+Install pre-commit hooks for linting, format checking, etc.
+
+- Install pre-commit python lib locally ```pip install pre-commit```
+- Install the pre-commit hooks for the repo```pre-commit install```
+
+%package -n python3-gcp-airflow-foundations-dev
+Summary: Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery data warehouse
+Provides: python-gcp-airflow-foundations-dev
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-gcp-airflow-foundations-dev
+# gcp-airflow-foundations
+[![PyPI version](https://badge.fury.io/py/gcp-airflow-foundations.svg)](https://badge.fury.io/py/gcp-airflow-foundations)
+[![Cloud Build Status](https://storage.googleapis.com/public-cloudbuild/build/cloudbuild_status.svg)](https://storage.googleapis.com/public-cloudbuild/build/cloudbuild_status.svg)
+[![Documentation Status](https://readthedocs.org/projects/gcp-airflow-foundations/badge/?version=latest)](https://gcp-airflow-foundations.readthedocs.io/en/latest/?badge=latest)
+
+
+![airflow](./docs/_static/airflow_diagram.png)
+
+Airflow is an awesome open source orchestration framework that is the go-to for building data ingestion pipelines on GCP (using Composer - a hosted AIrflow service). However, most companies using it face the same set of problems
+- **Learning curve**: Airflow requires python knowledge and has some gotchas that take time to learn. Further, writing Python DAGs for every single table that needs to get ingested becomes cumbersome. Most companies end up building utilities for creating DAGs out of configuration files to simplify DAG creation and to allow non-developers to configure ingestion
+- **Datalake and data pipelines design best practices**: Airflow only provides the building blocks, users are still required to understand and implement the nuances of building a proper ingestion pipelines for the data lake/data warehouse platform they are using
+- **Core reusability and best practice enforcement across the enterprise**: Usually each team maintains its own Airflow source code and deployment
+
+We have written an opinionated yet flexible ingestion framework for building an ingestion pipeline into data warehouse in BigQuery that supports the following features:
+
+- **Zero-code**, config file based ingestion - anybody can start ingesting from the growing number of sources by just providing a simple configuration file. Zero python or Airflow knowledge is required.
+- **Modular and extendable** - The core of the framework is a lightweight library. Ingestion sources are added as plugins. Adding a new source can be done by extending the provided base classes.
+- **Opinionated automatic creation of ODS (Operational Data Store ) and HDS (Historical Data Store)** in BigQuery while enforcing best practices such as schema migration, data quality validation, idempotency, partitioning, etc.
+- **Dataflow job support** for ingesting large datasets from SQL sources and deploying jobs into a specific network or shared VPC.
+- Support of **advanced Airflow features** for job prioritization such as slots and priorities.
+- Integration with **GCP data services** such as DLP and Data Catalog [work in progress].
+- **Well tested** - We maintain a rich suite of both unit and integration tests.
+
+## Installing from PyPI
+```bash
+pip install 'gcp-airflow-foundations'
+```
+
+## Full Documentation
+See the [gcp-airflow-foundations documentation](https://gcp-airflow-foundations.readthedocs.io/en/latest/) for more details.
+
+## Running locally
+
+### Sample DAGs
+Sample DAGs that ingest publicly available GCS files can be found in the dags folder, and are started as soon Airflow is ran locally. In order to have them successfully run please ensure the following:
+- Enable: BigQuery, Cloud Storage, Cloud DLP, Data Catalog API's
+- Create a BigQuery Dataset for the HDS and ODS
+- Create a DLP Inspect template in DLP
+- Create a policy tag in Data Catalog
+- Update the gcp_project, location, dataset values, dlp config and policytag configs with your newly created values
+
+### Using Service Account
+- Create a service account in GCP, and save it as ```helpers/key/keys.json``` (don't worry, it is in .gitignore, and will not be push to the git repo)
+- Run Airflow locally (Airflow UI will be accessible at http://localhost:8080): ```docker-compose up```
+- Default authentication values for the Airflow UI are provided in lines 96, 97 of ```docker-composer.yaml```
+### Using user IAM
+- uncomment line 11 in ```docker-composer.yaml```
+- send env var PROJECT_ID to your test project
+- Authorize gcloud to access the Cloud Platform with Google user credentials: ```helpers/scripts/gcp-auth.sh```
+- Run Airflow locally (Airflow UI will be accessible at http://localhost:8080): ```docker-compose up```
+- Default authentication values for the Airflow UI are provided in lines 96, 97 of ```docker-composer.yaml```
+### Running tests
+- Run unit tests ```./tests/airflow "pytest tests/unit```
+- Run unit tests with coverage report ```./tests/airflow "pytest --cov=gcp_airflow_foundations tests/unit```
+- Run integration tests ```./tests/airflow "pytest tests/integration```
+- Rebuild docker image if requirements changed: ```docker-compose build```
+
+# Contributing
+## Install pre-commit hook
+Install pre-commit hooks for linting, format checking, etc.
+
+- Install pre-commit python lib locally ```pip install pre-commit```
+- Install the pre-commit hooks for the repo```pre-commit install```
+
+%package help
+Summary: Development documents and examples for gcp-airflow-foundations-dev
+Provides: python3-gcp-airflow-foundations-dev-doc
+%description help
+# gcp-airflow-foundations
+[![PyPI version](https://badge.fury.io/py/gcp-airflow-foundations.svg)](https://badge.fury.io/py/gcp-airflow-foundations)
+[![Cloud Build Status](https://storage.googleapis.com/public-cloudbuild/build/cloudbuild_status.svg)](https://storage.googleapis.com/public-cloudbuild/build/cloudbuild_status.svg)
+[![Documentation Status](https://readthedocs.org/projects/gcp-airflow-foundations/badge/?version=latest)](https://gcp-airflow-foundations.readthedocs.io/en/latest/?badge=latest)
+
+
+![airflow](./docs/_static/airflow_diagram.png)
+
+Airflow is an awesome open source orchestration framework that is the go-to for building data ingestion pipelines on GCP (using Composer - a hosted AIrflow service). However, most companies using it face the same set of problems
+- **Learning curve**: Airflow requires python knowledge and has some gotchas that take time to learn. Further, writing Python DAGs for every single table that needs to get ingested becomes cumbersome. Most companies end up building utilities for creating DAGs out of configuration files to simplify DAG creation and to allow non-developers to configure ingestion
+- **Datalake and data pipelines design best practices**: Airflow only provides the building blocks, users are still required to understand and implement the nuances of building a proper ingestion pipelines for the data lake/data warehouse platform they are using
+- **Core reusability and best practice enforcement across the enterprise**: Usually each team maintains its own Airflow source code and deployment
+
+We have written an opinionated yet flexible ingestion framework for building an ingestion pipeline into data warehouse in BigQuery that supports the following features:
+
+- **Zero-code**, config file based ingestion - anybody can start ingesting from the growing number of sources by just providing a simple configuration file. Zero python or Airflow knowledge is required.
+- **Modular and extendable** - The core of the framework is a lightweight library. Ingestion sources are added as plugins. Adding a new source can be done by extending the provided base classes.
+- **Opinionated automatic creation of ODS (Operational Data Store ) and HDS (Historical Data Store)** in BigQuery while enforcing best practices such as schema migration, data quality validation, idempotency, partitioning, etc.
+- **Dataflow job support** for ingesting large datasets from SQL sources and deploying jobs into a specific network or shared VPC.
+- Support of **advanced Airflow features** for job prioritization such as slots and priorities.
+- Integration with **GCP data services** such as DLP and Data Catalog [work in progress].
+- **Well tested** - We maintain a rich suite of both unit and integration tests.
+
+## Installing from PyPI
+```bash
+pip install 'gcp-airflow-foundations'
+```
+
+## Full Documentation
+See the [gcp-airflow-foundations documentation](https://gcp-airflow-foundations.readthedocs.io/en/latest/) for more details.
+
+## Running locally
+
+### Sample DAGs
+Sample DAGs that ingest publicly available GCS files can be found in the dags folder, and are started as soon Airflow is ran locally. In order to have them successfully run please ensure the following:
+- Enable: BigQuery, Cloud Storage, Cloud DLP, Data Catalog API's
+- Create a BigQuery Dataset for the HDS and ODS
+- Create a DLP Inspect template in DLP
+- Create a policy tag in Data Catalog
+- Update the gcp_project, location, dataset values, dlp config and policytag configs with your newly created values
+
+### Using Service Account
+- Create a service account in GCP, and save it as ```helpers/key/keys.json``` (don't worry, it is in .gitignore, and will not be push to the git repo)
+- Run Airflow locally (Airflow UI will be accessible at http://localhost:8080): ```docker-compose up```
+- Default authentication values for the Airflow UI are provided in lines 96, 97 of ```docker-composer.yaml```
+### Using user IAM
+- uncomment line 11 in ```docker-composer.yaml```
+- send env var PROJECT_ID to your test project
+- Authorize gcloud to access the Cloud Platform with Google user credentials: ```helpers/scripts/gcp-auth.sh```
+- Run Airflow locally (Airflow UI will be accessible at http://localhost:8080): ```docker-compose up```
+- Default authentication values for the Airflow UI are provided in lines 96, 97 of ```docker-composer.yaml```
+### Running tests
+- Run unit tests ```./tests/airflow "pytest tests/unit```
+- Run unit tests with coverage report ```./tests/airflow "pytest --cov=gcp_airflow_foundations tests/unit```
+- Run integration tests ```./tests/airflow "pytest tests/integration```
+- Rebuild docker image if requirements changed: ```docker-compose build```
+
+# Contributing
+## Install pre-commit hook
+Install pre-commit hooks for linting, format checking, etc.
+
+- Install pre-commit python lib locally ```pip install pre-commit```
+- Install the pre-commit hooks for the repo```pre-commit install```
+
+%prep
+%autosetup -n gcp-airflow-foundations-dev-11.1
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-gcp-airflow-foundations-dev -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 11.1-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..bf11313
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+0bf7f90420ed1071e2c16315be69dad0 gcp-airflow-foundations-dev-11.1.tar.gz