diff options
author | CoprDistGit <infra@openeuler.org> | 2023-04-12 02:32:20 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-04-12 02:32:20 +0000 |
commit | 3a043fc710e9fcf97968d90eff68b632c6a78219 (patch) | |
tree | 55bce903a9745fcd5938da9340d281f273960a89 | |
parent | adafd1f38a946f77b5cb2bb424edc9a32e285a09 (diff) |
automatic import of python-amsterdam-schema-tools
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-amsterdam-schema-tools.spec | 720 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 722 insertions, 0 deletions
@@ -0,0 +1 @@ +/amsterdam-schema-tools-5.9.1.tar.gz diff --git a/python-amsterdam-schema-tools.spec b/python-amsterdam-schema-tools.spec new file mode 100644 index 0000000..b88faac --- /dev/null +++ b/python-amsterdam-schema-tools.spec @@ -0,0 +1,720 @@ +%global _empty_manifest_terminate_build 0 +Name: python-amsterdam-schema-tools +Version: 5.9.1 +Release: 1 +Summary: Tools to work with Amsterdam Schema. +License: Mozilla Public 2.0 +URL: https://github.com/amsterdam/schema-tools +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/cc/93/7d005febae3e5fa62b86617715d028fd9a2b28c5d1f35cc0d409342719f3/amsterdam-schema-tools-5.9.1.tar.gz +BuildArch: noarch + +Requires: python3-sqlalchemy +Requires: python3-geoalchemy2 +Requires: python3-psycopg2 +Requires: python3-pg-grant +Requires: python3-click +Requires: python3-deepdiff +Requires: python3-jsonlines +Requires: python3-jsonschema[format] +Requires: python3-json-encoder +Requires: python3-ndjson +Requires: python3-shapely +Requires: python3-string-utils +Requires: python3-dateutil +Requires: python3-requests +Requires: python3-jinja2 +Requires: python3-mappyfile +Requires: python3-methodtools +Requires: python3-jsonpath-rw +Requires: python3-orjson +Requires: python3-more-ds +Requires: python3-factory-boy +Requires: python3-build +Requires: python3-twine +Requires: python3-environ +Requires: python3-django +Requires: python3-django-postgres-unlimited-varchar +Requires: python3-django-gisserver +Requires: python3-django-environ +Requires: python3-django-db-comments +Requires: python3-factory-boy +Requires: python3-confluent-kafka +Requires: python3-types-requests +Requires: python3-types-click +Requires: python3-types-python-dateutil +Requires: python3-flake8 +Requires: python3-flake8-colors +Requires: python3-flake8-raise +Requires: python3-flake8-bandit +Requires: python3-flake8-bugbear +Requires: python3-flake8-builtins +Requires: python3-flake8-comprehensions +Requires: python3-flake8-docstrings +Requires: python3-flake8-implicit-str-concat +Requires: python3-flake8-print +Requires: python3-flake8-rst +Requires: python3-flake8-string-format +Requires: python3-flake8-logging-format +Requires: python3-pytest +Requires: python3-pytest-cov +Requires: python3-pytest-django +Requires: python3-pytest-sqlalchemy + +%description +# amsterdam-schema-tools + +Set of libraries and tools to work with Amsterdam schema. + +Install the package with: `pip install amsterdam-schema-tools`. This installs +the library and a command-line tool called `schema`, with various subcommands. +A listing can be obtained from `schema --help`. + +Subcommands that talk to a PostgreSQL database expect either a `DATABASE_URL` +environment variable or a command line option `--db-url` with a DSN. + +Many subcommands want to know where to find schema files. Most will look in a +directory of schemas denoted by the `SCHEMA_URL` environment variable or the +`--schema-url` command line option. E.g., + + schema create tables --schema-url=myschemas mydataset + +will try to load the schema for `mydataset` from +`myschemas/mydataset/dataset.json`. + + +## Generate amsterdam schema from existing database tables + +The --prefix argument controls whether table prefixes are removed in the +schema, because that is required for Django models. + +As example we can generate a BAG schema. Point `DATABASE_URL` to `bag_v11` database and then run : + + schema show tablenames | sort | awk '/^bag_/{print}' | xargs schema introspect db bag --prefix bag_ | jq + +The **jq** formats it nicely and it can be redirected to the correct directory +in the schemas repository directly. + +## Express amsterdam schema information in relational tables + +Amsterdam schema is expressed as jsonschema. However, to make it easier for people with a +more relational mind- or toolset it is possible to express amsterdam schema as a set of +relational tables. These tables are *meta_dataset*, *meta_table* and *meta_field*. + +It is possible to convert a jsonschema into the relational table structure and vice-versa. + +This command converts a dataset from an existing dataset in jsonschema format: + + schema import schema <id of dataset> + +To convert from relational tables back to jsonschema: + + schema show schema <id of dataset> + + +## Generating amsterdam schema from existing GeoJSON files + +The following command can be used to inspect and import the GeoJSON files: + + schema introspect geojson <dataset-id> *.geojson > schema.json + edit schema.json # fine-tune the table names + schema import geojson schema.json <table1> file1.geojson + schema import geojson schema.json <table2> file2.geojson + +## Importing GOB events + +The schematools library has a module that reads GOB events into database tables that are +defines by an Amsterdam schema. This module can be used to read GOB events from a Kafka stream. +It is also possible to read GOB events from a batch file with line-separeted events using: + + schema import events <path-to-dataset> <path-to-file-with-events> + + +## Export datasets + +Datasets can be exported to different file formats. Currently supported are geopackage, +csv and jsonlines. The command for exporting the dataset tables is: + + schema export [geopackage|csv|jsonlines] <id of dataset> + +The command has several command-line options that can be used. Documentations about these +flags can be shown using the `--help` options. + + +## Schema Tools as a pre-commit hook + +Included in the project is a `pre-commit` hook +that can validate schema files +in a project such as [amsterdam-schema](https://github.com/Amsterdam/amsterdam-schema) + +To configure it +extend the `.pre-commit-config.yaml` +in the project with the schema file defintions as follows: + +```yaml + - repo: https://github.com/Amsterdam/schema-tools + rev: v3.5.0 + hooks: + - id: validate-schema + args: ['https://schemas.data.amsterdam.nl/schema@v1.2.0#'] + exclude: | + (?x)^( + schema.+| # exclude meta schemas + datasets/index.json + )$ +``` + +`args` is a one element list +containing the URL to the Amsterdam Meta Schema. + +`validate-schema` will only process `json` files. +However not all `json` files are Amsterdam schema files. +To exclude files or directories use `exclude` with pattern. + +`pre-commit` depends on properly tagged revisions of its hooks. +Hence, we should not only bump version numbers on updates to this package, +but also commit a tag with the version number; see below. + +## Doing a release + +(This is for schema-tools developers.) + +We use GitHub pull requests. If your PR should produce a new release of +schema-tools, make sure one of the commit increments the version number in +``setup.cfg`` appropriately. Then, + +* merge the commit in GitHub, after review; +* pull the code from GitHub and merge it into the master branch, + ``git checkout master && git fetch origin && git merge --ff-only origin/master``; +* tag the release X.Y.Z with ``git tag -a vX.Y.Z -m "Bump to vX.Y.Z"``; +* push the tag to GitHub with ``git push origin --tags``; +* release to PyPI: ``make upload`` (requires the PyPI secret). + + +## Mocking data + +The schematools library contains two Django management commands to generate +mock data. The first one is `create_mock_data` which generates mock data for all +the datasets that are found at the configured schema location `SCHEMA_URL` +(where `SCHEMA_URL` can be configure to point to a path at the local filesystem). + +The `create_mock_data` command processes all datasets. However, it is possible +to limit this by adding positional arguments. These positional arguments can be +dataset ids or paths to the location of the `dataset.json` on the local filesystem. + +Furthermore, the command has some options, e.g. to change +the default number of generated records (`--size`) or to reverse meaning of the positional +arguments using `--exclude`. + +To avoid duplicate primary keys on subsequent runs the `--start-at` options can be used +to start autonumbering of primary keys at an offset. + +E.g. to generate 5 records for the `bag` and `gebieden` datasets, starting the +autonumbering of primary keys at 50. + +``` + django create_mock_data bag gebieden --size 5 --start-at 50 +``` + +To generate records for all datasets, except for the `fietspaaltjes` dataset: + +``` + django create_mock_data fietspaaltjes --exclude # or -x +``` + +To generate records for the `bbga` dataset, by loading the schema from the local filesystem: + +``` + django create_mock_data <path-to-bbga-schema>/datasets.json +``` + +During record generation in `create_mock_data`, the relations are not added, +so foreign key fields will be filled with NULL values. + +There is a second management command `relate_mock_data` that can be used to +add the relations. This command support positional arguments for datasets +in the same way as `create_mock_data`. +Furthermore, the command also has the `--exclude` option to reverse the meaning +of the positional dataset arguments. + +E.g. to add relations to all datasets: + +``` + django relate_mock_data +``` + +To add relations for `bag` and `gebieden` only: + +``` + django relate_mock_data bag gebieden +``` + +To add relations for all datasets except `meetbouten`: + +``` + django relate_mock_data meetbouten --exclude # or -x +``` + +NB. When only a subset of the datasets is being mocked, the command can fail when datasets that +are involved in a relation are missing, so make sure to include all relevant +datasets. + +For convenience an additional management command `truncate_tables` has been added, +to truncate all tables. + + +%package -n python3-amsterdam-schema-tools +Summary: Tools to work with Amsterdam Schema. +Provides: python-amsterdam-schema-tools +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-amsterdam-schema-tools +# amsterdam-schema-tools + +Set of libraries and tools to work with Amsterdam schema. + +Install the package with: `pip install amsterdam-schema-tools`. This installs +the library and a command-line tool called `schema`, with various subcommands. +A listing can be obtained from `schema --help`. + +Subcommands that talk to a PostgreSQL database expect either a `DATABASE_URL` +environment variable or a command line option `--db-url` with a DSN. + +Many subcommands want to know where to find schema files. Most will look in a +directory of schemas denoted by the `SCHEMA_URL` environment variable or the +`--schema-url` command line option. E.g., + + schema create tables --schema-url=myschemas mydataset + +will try to load the schema for `mydataset` from +`myschemas/mydataset/dataset.json`. + + +## Generate amsterdam schema from existing database tables + +The --prefix argument controls whether table prefixes are removed in the +schema, because that is required for Django models. + +As example we can generate a BAG schema. Point `DATABASE_URL` to `bag_v11` database and then run : + + schema show tablenames | sort | awk '/^bag_/{print}' | xargs schema introspect db bag --prefix bag_ | jq + +The **jq** formats it nicely and it can be redirected to the correct directory +in the schemas repository directly. + +## Express amsterdam schema information in relational tables + +Amsterdam schema is expressed as jsonschema. However, to make it easier for people with a +more relational mind- or toolset it is possible to express amsterdam schema as a set of +relational tables. These tables are *meta_dataset*, *meta_table* and *meta_field*. + +It is possible to convert a jsonschema into the relational table structure and vice-versa. + +This command converts a dataset from an existing dataset in jsonschema format: + + schema import schema <id of dataset> + +To convert from relational tables back to jsonschema: + + schema show schema <id of dataset> + + +## Generating amsterdam schema from existing GeoJSON files + +The following command can be used to inspect and import the GeoJSON files: + + schema introspect geojson <dataset-id> *.geojson > schema.json + edit schema.json # fine-tune the table names + schema import geojson schema.json <table1> file1.geojson + schema import geojson schema.json <table2> file2.geojson + +## Importing GOB events + +The schematools library has a module that reads GOB events into database tables that are +defines by an Amsterdam schema. This module can be used to read GOB events from a Kafka stream. +It is also possible to read GOB events from a batch file with line-separeted events using: + + schema import events <path-to-dataset> <path-to-file-with-events> + + +## Export datasets + +Datasets can be exported to different file formats. Currently supported are geopackage, +csv and jsonlines. The command for exporting the dataset tables is: + + schema export [geopackage|csv|jsonlines] <id of dataset> + +The command has several command-line options that can be used. Documentations about these +flags can be shown using the `--help` options. + + +## Schema Tools as a pre-commit hook + +Included in the project is a `pre-commit` hook +that can validate schema files +in a project such as [amsterdam-schema](https://github.com/Amsterdam/amsterdam-schema) + +To configure it +extend the `.pre-commit-config.yaml` +in the project with the schema file defintions as follows: + +```yaml + - repo: https://github.com/Amsterdam/schema-tools + rev: v3.5.0 + hooks: + - id: validate-schema + args: ['https://schemas.data.amsterdam.nl/schema@v1.2.0#'] + exclude: | + (?x)^( + schema.+| # exclude meta schemas + datasets/index.json + )$ +``` + +`args` is a one element list +containing the URL to the Amsterdam Meta Schema. + +`validate-schema` will only process `json` files. +However not all `json` files are Amsterdam schema files. +To exclude files or directories use `exclude` with pattern. + +`pre-commit` depends on properly tagged revisions of its hooks. +Hence, we should not only bump version numbers on updates to this package, +but also commit a tag with the version number; see below. + +## Doing a release + +(This is for schema-tools developers.) + +We use GitHub pull requests. If your PR should produce a new release of +schema-tools, make sure one of the commit increments the version number in +``setup.cfg`` appropriately. Then, + +* merge the commit in GitHub, after review; +* pull the code from GitHub and merge it into the master branch, + ``git checkout master && git fetch origin && git merge --ff-only origin/master``; +* tag the release X.Y.Z with ``git tag -a vX.Y.Z -m "Bump to vX.Y.Z"``; +* push the tag to GitHub with ``git push origin --tags``; +* release to PyPI: ``make upload`` (requires the PyPI secret). + + +## Mocking data + +The schematools library contains two Django management commands to generate +mock data. The first one is `create_mock_data` which generates mock data for all +the datasets that are found at the configured schema location `SCHEMA_URL` +(where `SCHEMA_URL` can be configure to point to a path at the local filesystem). + +The `create_mock_data` command processes all datasets. However, it is possible +to limit this by adding positional arguments. These positional arguments can be +dataset ids or paths to the location of the `dataset.json` on the local filesystem. + +Furthermore, the command has some options, e.g. to change +the default number of generated records (`--size`) or to reverse meaning of the positional +arguments using `--exclude`. + +To avoid duplicate primary keys on subsequent runs the `--start-at` options can be used +to start autonumbering of primary keys at an offset. + +E.g. to generate 5 records for the `bag` and `gebieden` datasets, starting the +autonumbering of primary keys at 50. + +``` + django create_mock_data bag gebieden --size 5 --start-at 50 +``` + +To generate records for all datasets, except for the `fietspaaltjes` dataset: + +``` + django create_mock_data fietspaaltjes --exclude # or -x +``` + +To generate records for the `bbga` dataset, by loading the schema from the local filesystem: + +``` + django create_mock_data <path-to-bbga-schema>/datasets.json +``` + +During record generation in `create_mock_data`, the relations are not added, +so foreign key fields will be filled with NULL values. + +There is a second management command `relate_mock_data` that can be used to +add the relations. This command support positional arguments for datasets +in the same way as `create_mock_data`. +Furthermore, the command also has the `--exclude` option to reverse the meaning +of the positional dataset arguments. + +E.g. to add relations to all datasets: + +``` + django relate_mock_data +``` + +To add relations for `bag` and `gebieden` only: + +``` + django relate_mock_data bag gebieden +``` + +To add relations for all datasets except `meetbouten`: + +``` + django relate_mock_data meetbouten --exclude # or -x +``` + +NB. When only a subset of the datasets is being mocked, the command can fail when datasets that +are involved in a relation are missing, so make sure to include all relevant +datasets. + +For convenience an additional management command `truncate_tables` has been added, +to truncate all tables. + + +%package help +Summary: Development documents and examples for amsterdam-schema-tools +Provides: python3-amsterdam-schema-tools-doc +%description help +# amsterdam-schema-tools + +Set of libraries and tools to work with Amsterdam schema. + +Install the package with: `pip install amsterdam-schema-tools`. This installs +the library and a command-line tool called `schema`, with various subcommands. +A listing can be obtained from `schema --help`. + +Subcommands that talk to a PostgreSQL database expect either a `DATABASE_URL` +environment variable or a command line option `--db-url` with a DSN. + +Many subcommands want to know where to find schema files. Most will look in a +directory of schemas denoted by the `SCHEMA_URL` environment variable or the +`--schema-url` command line option. E.g., + + schema create tables --schema-url=myschemas mydataset + +will try to load the schema for `mydataset` from +`myschemas/mydataset/dataset.json`. + + +## Generate amsterdam schema from existing database tables + +The --prefix argument controls whether table prefixes are removed in the +schema, because that is required for Django models. + +As example we can generate a BAG schema. Point `DATABASE_URL` to `bag_v11` database and then run : + + schema show tablenames | sort | awk '/^bag_/{print}' | xargs schema introspect db bag --prefix bag_ | jq + +The **jq** formats it nicely and it can be redirected to the correct directory +in the schemas repository directly. + +## Express amsterdam schema information in relational tables + +Amsterdam schema is expressed as jsonschema. However, to make it easier for people with a +more relational mind- or toolset it is possible to express amsterdam schema as a set of +relational tables. These tables are *meta_dataset*, *meta_table* and *meta_field*. + +It is possible to convert a jsonschema into the relational table structure and vice-versa. + +This command converts a dataset from an existing dataset in jsonschema format: + + schema import schema <id of dataset> + +To convert from relational tables back to jsonschema: + + schema show schema <id of dataset> + + +## Generating amsterdam schema from existing GeoJSON files + +The following command can be used to inspect and import the GeoJSON files: + + schema introspect geojson <dataset-id> *.geojson > schema.json + edit schema.json # fine-tune the table names + schema import geojson schema.json <table1> file1.geojson + schema import geojson schema.json <table2> file2.geojson + +## Importing GOB events + +The schematools library has a module that reads GOB events into database tables that are +defines by an Amsterdam schema. This module can be used to read GOB events from a Kafka stream. +It is also possible to read GOB events from a batch file with line-separeted events using: + + schema import events <path-to-dataset> <path-to-file-with-events> + + +## Export datasets + +Datasets can be exported to different file formats. Currently supported are geopackage, +csv and jsonlines. The command for exporting the dataset tables is: + + schema export [geopackage|csv|jsonlines] <id of dataset> + +The command has several command-line options that can be used. Documentations about these +flags can be shown using the `--help` options. + + +## Schema Tools as a pre-commit hook + +Included in the project is a `pre-commit` hook +that can validate schema files +in a project such as [amsterdam-schema](https://github.com/Amsterdam/amsterdam-schema) + +To configure it +extend the `.pre-commit-config.yaml` +in the project with the schema file defintions as follows: + +```yaml + - repo: https://github.com/Amsterdam/schema-tools + rev: v3.5.0 + hooks: + - id: validate-schema + args: ['https://schemas.data.amsterdam.nl/schema@v1.2.0#'] + exclude: | + (?x)^( + schema.+| # exclude meta schemas + datasets/index.json + )$ +``` + +`args` is a one element list +containing the URL to the Amsterdam Meta Schema. + +`validate-schema` will only process `json` files. +However not all `json` files are Amsterdam schema files. +To exclude files or directories use `exclude` with pattern. + +`pre-commit` depends on properly tagged revisions of its hooks. +Hence, we should not only bump version numbers on updates to this package, +but also commit a tag with the version number; see below. + +## Doing a release + +(This is for schema-tools developers.) + +We use GitHub pull requests. If your PR should produce a new release of +schema-tools, make sure one of the commit increments the version number in +``setup.cfg`` appropriately. Then, + +* merge the commit in GitHub, after review; +* pull the code from GitHub and merge it into the master branch, + ``git checkout master && git fetch origin && git merge --ff-only origin/master``; +* tag the release X.Y.Z with ``git tag -a vX.Y.Z -m "Bump to vX.Y.Z"``; +* push the tag to GitHub with ``git push origin --tags``; +* release to PyPI: ``make upload`` (requires the PyPI secret). + + +## Mocking data + +The schematools library contains two Django management commands to generate +mock data. The first one is `create_mock_data` which generates mock data for all +the datasets that are found at the configured schema location `SCHEMA_URL` +(where `SCHEMA_URL` can be configure to point to a path at the local filesystem). + +The `create_mock_data` command processes all datasets. However, it is possible +to limit this by adding positional arguments. These positional arguments can be +dataset ids or paths to the location of the `dataset.json` on the local filesystem. + +Furthermore, the command has some options, e.g. to change +the default number of generated records (`--size`) or to reverse meaning of the positional +arguments using `--exclude`. + +To avoid duplicate primary keys on subsequent runs the `--start-at` options can be used +to start autonumbering of primary keys at an offset. + +E.g. to generate 5 records for the `bag` and `gebieden` datasets, starting the +autonumbering of primary keys at 50. + +``` + django create_mock_data bag gebieden --size 5 --start-at 50 +``` + +To generate records for all datasets, except for the `fietspaaltjes` dataset: + +``` + django create_mock_data fietspaaltjes --exclude # or -x +``` + +To generate records for the `bbga` dataset, by loading the schema from the local filesystem: + +``` + django create_mock_data <path-to-bbga-schema>/datasets.json +``` + +During record generation in `create_mock_data`, the relations are not added, +so foreign key fields will be filled with NULL values. + +There is a second management command `relate_mock_data` that can be used to +add the relations. This command support positional arguments for datasets +in the same way as `create_mock_data`. +Furthermore, the command also has the `--exclude` option to reverse the meaning +of the positional dataset arguments. + +E.g. to add relations to all datasets: + +``` + django relate_mock_data +``` + +To add relations for `bag` and `gebieden` only: + +``` + django relate_mock_data bag gebieden +``` + +To add relations for all datasets except `meetbouten`: + +``` + django relate_mock_data meetbouten --exclude # or -x +``` + +NB. When only a subset of the datasets is being mocked, the command can fail when datasets that +are involved in a relation are missing, so make sure to include all relevant +datasets. + +For convenience an additional management command `truncate_tables` has been added, +to truncate all tables. + + +%prep +%autosetup -n amsterdam-schema-tools-5.9.1 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-amsterdam-schema-tools -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed Apr 12 2023 Python_Bot <Python_Bot@openeuler.org> - 5.9.1-1 +- Package Spec generated @@ -0,0 +1 @@ +0c0dc27b6a22578c73bebdc13c92b9dd amsterdam-schema-tools-5.9.1.tar.gz |