3 files changed, 859 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..192c420 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/cldfbench-1.13.0.tar.gz
diff --git a/python-cldfbench.spec b/python-cldfbench.spec
new file mode 100644
index 0000000..039fa58
--- /dev/null
+++ b/python-cldfbench.spec
@@ -0,0 +1,857 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-cldfbench
+Version:	1.13.0
+Release:	1
+Summary:	Python library implementing a CLDF workbench
+License:	Apache 2.0
+URL:		https://github.com/cldf/cldfbench
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/98/1c/d6c3f474712c65e0834b729df285ce0b8cd813a7e78c37c201e305ef3817/cldfbench-1.13.0.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-appdirs
+Requires:	python3-cldfcatalog
+Requires:	python3-clldutils
+Requires:	python3-csvw
+Requires:	python3-pycldf
+Requires:	python3-pytest
+Requires:	python3-requests
+Requires:	python3-rfc3986
+Requires:	python3-termcolor
+Requires:	python3-tqdm
+Requires:	python3-zenodoclient
+Requires:	python3-importlib-metadata
+Requires:	python3-pyclts
+Requires:	python3-pyconcepticon
+Requires:	python3-build
+Requires:	python3-flake8
+Requires:	python3-twine
+Requires:	python3-wheel
+Requires:	python3-sphinx
+Requires:	python3-sphinx-autodoc-typehints
+Requires:	python3-sphinx-rtd-theme
+Requires:	python3-openpyxl
+Requires:	python3-xlrd
+Requires:	python3-pyglottolog
+Requires:	python3-odfpy
+Requires:	python3-odfpy
+Requires:	python3-openpyxl
+Requires:	python3-packaging
+Requires:	python3-pyconcepticon
+Requires:	python3-pyglottolog
+Requires:	python3-pytest-cov
+Requires:	python3-pytest-mock
+Requires:	python3-pytest
+Requires:	python3-tox
+Requires:	python3-xlrd
+
+%description
+# cldfbench
+Tooling to create [CLDF](https://cldf.clld.org) datasets from existing data.
+
+[![Build Status](https://github.com/cldf/cldfbench/workflows/tests/badge.svg)](https://github.com/cldf/cldfbench/actions?query=workflow%3Atests)
+[![Documentation Status](https://readthedocs.org/projects/cldfbench/badge/?version=latest)](https://cldfbench.readthedocs.io/en/latest/?badge=latest)
+[![PyPI](https://img.shields.io/pypi/v/cldfbench.svg)](https://pypi.org/project/cldfbench)
+
+
+## Overview
+
+This package provides tools to curate cross-linguistic data, with the goal of
+packaging it as [CLDF](https://cldf.clld.org) datasets.
+
+In particular, it supports a workflow where:
+- "raw" source data is downloaded to a `raw/` subdirectory,
+- and subsequently converted to one or more CLDF datasets in a `cldf/` subdirectory, with the help of:
+  - configuration data in a `etc/` directory and
+  - custom Python code (a subclass of [`cldfbench.Dataset`](src/cldfbench/dataset.py) which implements the workflow actions).
+
+This workflow is supported via:
+- a commandline interface `cldfbench` which calls the workflow actions as [subcommands](src/cldfbench/commands),
+- a `cldfbench.Dataset` base class, which must be overwritten in a custom module
+  to hook custom code into the workflow.
+
+With this workflow and the separation of the data into three directories we want
+to provide a workbench for transparently deriving CLDF data from data that has been
+published before. In particular we want to delineate clearly:
+- what forms part of the original or source data (`raw`), 
+- what kind of information is added by the curators of the CLDF dataset (`etc`)
+- and what data was derived using the workbench (`cldf`).
+
+
+### Further reading
+
+This paper introduces `cldfbench` and uses an extended, real-world example:
+
+> Forkel, R., & List, J.-M. (2020). CLDFBench: Give your cross-linguistic data a lift. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, et al. (Eds.), Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) (pp. 6995-7002). Paris: European Language Resources Association (ELRA). [[PDF]](https://pure.mpg.de/pubman/item/item_3231858_1/component/file_3231859/shh2600.pdf)
+
+
+## Installation
+
+`cldfbench` can be installed via `pip` - preferably in a 
+[virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) - by running:
+```shell script
+pip install cldfbench
+```
+
+`cldfbench` provides some functionality that relies on python
+packages which are not needed for the core functionality. These are specified as [extras](https://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies) and can be installed using syntax like:
+```shell
+pip install cldfbench[<extras>]
+```
+where `<extras>` is a comma-separated list of names from the following list:
+- `excel`: support for reading spreadsheet data.
+- `glottolog`: support to access [Glottolog data](https://github.com/glottolog/glottolog).
+- `concepticon`: support to access [Concepticon data](https://github.com/concepticon/concepticon-data).
+- `clts`: support to access [CLTS data](https://github.com/cldf-clts/clts).
+
+
+## The command line interface `cldfbench`
+
+Installing the python package will also install a command `cldfbench` available on
+the command line:
+```shell script
+$ cldfbench -h
+usage: cldfbench [-h] [--log-level LOG_LEVEL] COMMAND ...
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --log-level LOG_LEVEL
+                        log level [ERROR|WARN|INFO|DEBUG] (default: 20)
+
+available commands:
+  Run "COMAMND -h" to get help for a specific command.
+
+  COMMAND
+    check               Run generic CLDF checks
+    ...
+```
+
+As shown above, run `cldfbench -h` to get help, and `cldfbench COMMAND -h` to get
+help on individual subcommands, e.g. `cldfbench new -h` to read about the usage
+of the `new` subcommand.
+
+
+### Dataset discovery
+
+Most `cldfbench` commands operate on an existing dataset (unlike `new`, which
+creates a new one). Datasets can be discovered in two ways:
+
+1. Via the python module (i.e. the `*.py` file, containing the `Dataset` subclass).
+   To use this mode of discovery, pass the path to the python module
+   as `DATASET` argument, when required by a command.
+
+2. Via [entry point](https://packaging.python.org/specifications/entry-points/) and
+   dataset ID. To use this mode, specify the name of the entry point as value of
+   the `--entry-point` option (or use the default name `cldfbench.dataset`) and
+   the `Dataset.id` as `DATASET` argument.
+
+Discovery via entry point is particularly useful for commands that can operate
+on multiple datasets. To select **all** datasets advertising a given entry point,
+pass `"_"` (i.e. an underscore) as `DATASET` argument.
+
+
+## Workflow
+
+For a full example of the `cldfbench` curation workflow, see [the tutorial](doc/tutorial.md).
+
+
+### Creating a skeleton for a new dataset directory
+
+A directory containing stub entries for a dataset can be created running
+
+```bash
+cldfbench new
+```
+
+This will create the following layout (where `<ID>` stands for the chosen dataset ID):
+```
+<ID>/
+├── cldf               # A stub directory for the CLDF data
+│   └── README.md
+├── cldfbench_<ID>.py  # The python module, providing the Dataset subclass
+├── etc                # A stub directory for the configuration data
+│   └── README.md
+├── metadata.json      # The metadata provided to the subcommand serialized as JSON
+├── raw                # A stub directory for the raw data
+│   └── README.md
+├── setup.cfg          # Python setup config, providing defaults for test integration
+├── setup.py           # Python setup file, making the dataset "installable" 
+├── test.py            # The python code to run for dataset validation
+└── .travis.yml        # Integrate the validation with Travis-CI
+```
+
+
+### Implementing CLDF creation
+
+`cldfbench` provides tools to make CLDF creation simple. Still, each dataset is
+different, and so each dataset will have to provide its own custom code to do so.
+This custom code goes into the `cmd_makecldf` method of the `Dataset` subclass in
+the dataset's python module.
+(See also the [API documentation of `cldfbench.Dataset`](https://cldfbench.readthedocs.io/en/latest/dataset.html).)
+
+Typically, this code will make use of one or more
+[`cldfbench.CLDFSpec`](src/cldfbench/cldf.py) instances, which describes what kind of CLDF to create. A `CLDFSpec` also gives access to a
+[`cldfbench.CLDFWriter`](src/cldfbench/cldf.py) instance, which wraps a `pycldf.Dataset`.
+
+The main interfaces to these objects are:
+- `cldfbench.Dataset.cldf_specs`: a method returning specifications of all CLDF datasets
+  that are created by the dataset,
+- `cldfbench.Dataset.cldf_writer`: a method returning an initialized `CLDFWriter` 
+  associated with a particular `CLDFSpec`.
+
+`cldfbench` supports several scenarios of CLDF creation:
+- The typical use case is turning raw data into a single CLDF dataset. This would
+  require instantiating one `CLDFWriter` writer in the `cmd_makecldf` method, and
+  the defaults of `CLDFSpec` will probably be ok. Since this is the most common and
+  simplest case, it is supported with some extra "sugar": The initialized `CLDFWriter`
+  is available as `args.writer` when `cmd_makecldf` is called.
+- But it is also possible to create multiple CLDF datasets:
+  - For a dataset containing both, lexical and typological data, it may be appropriate
+    to create a `Ẁordlist` and a `StructureDataset`. To do so, one would have to
+    call `cldf_writer` twice, passing in an approriate `CLDFSpec`. Note that if
+    both CLDF datasets are created in the same directory, they can share the
+    `LanguageTable` - but would have to specify distinct file names for the
+    `ParameterTable`, passing distinct values to `CLDFSpec.data_fnames`.
+  - When creating multiple datasets of the same CLDF module, e.g. to split a large  dataset into smaller chunks, care must be taken to also disambiguate the name
+    of the metadata file, passing distinct values to `CLDFSpec.metadata_fname`.
+
+When creating CLDF, it is also often useful to have standard reference catalogs
+accessible, in particular Glottolog. See the section on [Catalogs](#catalogs) for
+a description of how this is supported by `cldfbench`.
+
+
+### Catalogs
+
+Linking data to reference catalogs is a major goal of CLDF, thus `cldfbench`
+provides tools to make catalog access and maintenance easier. Catalog data must be
+accessible in local clones of the data repository. `cldfbench` provides commands:
+- `catconfig` to create the clones and make them known through a configuration file,
+- `catinfo` to get an overview of the installed catalogs and their versions,
+- `catupdate` to update local clones from the upstream repositories.
+
+See:
+
+- https://cldfbench.readthedocs.io/en/latest//catalogs.html
+
+for a list of reference catalogs which are currently supported in `cldfbench`.
+
+
+### Curating a dataset on GitHub
+
+One of the design goals of CLDF was to specify a data format that plays well with
+version control. Thus, it's natural - and actually recommended - to curate a CLDF
+dataset in a version controlled repository. The most popular way to do this in a
+collaborative fashion is by using a [git](https://git-scm.com/) repository hosted on 
+[GitHub](https://github.com).
+
+The directory layout supported by `cldfbench` caters to this use case in several ways:
+- Each directory contains a file `README.md`, which will be rendered as human readable
+  description when browsing the repository at GitHub.
+- The file `.travis.yml` contains the configuration for hooking up a repository with
+  [Travis CI](https://www.travis-ci.org/), to provide continuous consistency checking
+  of the data.
+
+
+### Archiving a dataset with Zenodo
+
+Curating a dataset on GitHub also provides a simple way to archiving and publishing
+released versions of the data. You can hook up your repository with [Zenodo](https://zenodo.org) (following [this guide](https://guides.github.com/activities/citable-code/)). Then, Zenodo will pick up any released package, assign a DOI to it, archive it and
+make it accessible in the long-term.
+
+Some notes:
+- Hook-up with Zenodo requires the repository to be public (not private).
+- You should consider using an institutional account on GitHub and Zenodo to associate the repository with. Currently, only the user account registering a repository on Zenodo can change any metadata of releases lateron.
+- Once released and archived with Zenodo, it's a good idea to add the DOI assigned by Zenodo to the release description on GitHub.
+- To make sure a release is picked up by Zenodo, the version number must start with a letter, e.g. "v1.0" - **not** "1.0".
+
+Thus, with a setup as described here, you can make sure you create [FAIR data](https://en.wikipedia.org/wiki/FAIR_data).
+
+
+## Extending `cldfbench`
+
+`cldfbench` can be extended or built-upon in various ways - typically by customizing core functionality in new python packages. To support particular types of raw data, you might want a custom `Dataset` class, or to support a particular type of CLDF data, you would customize `CLDFWriter`.
+
+In addition to extending `cldfbench` using the standard methods of object-oriented programming, there are two more ways of extending `cldfbench`:
+
+
+### Commands
+
+A python package (or a dataset) can provide additional subcommands to be run from `cldfbench`.
+For more info see the [`commands.README`](src/cldfbench/commands/README.md).
+
+
+### Custom dataset templates
+
+A python package can provide alternative dataset templates to be run with `cldfbench new`.
+Such templates are implemented by:
+- a subclass of `cldfbench.Template`,
+- which is advertised using an entry point `cldfbench.scaffold`:
+
+```python
+    entry_points={
+        'cldfbench.scaffold': [
+            'template_name=mypackage.scaffold:DerivedTemplate',
+        ],
+    },
+```
+
+
+
+
+%package -n python3-cldfbench
+Summary:	Python library implementing a CLDF workbench
+Provides:	python-cldfbench
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-cldfbench
+# cldfbench
+Tooling to create [CLDF](https://cldf.clld.org) datasets from existing data.
+
+[![Build Status](https://github.com/cldf/cldfbench/workflows/tests/badge.svg)](https://github.com/cldf/cldfbench/actions?query=workflow%3Atests)
+[![Documentation Status](https://readthedocs.org/projects/cldfbench/badge/?version=latest)](https://cldfbench.readthedocs.io/en/latest/?badge=latest)
+[![PyPI](https://img.shields.io/pypi/v/cldfbench.svg)](https://pypi.org/project/cldfbench)
+
+
+## Overview
+
+This package provides tools to curate cross-linguistic data, with the goal of
+packaging it as [CLDF](https://cldf.clld.org) datasets.
+
+In particular, it supports a workflow where:
+- "raw" source data is downloaded to a `raw/` subdirectory,
+- and subsequently converted to one or more CLDF datasets in a `cldf/` subdirectory, with the help of:
+  - configuration data in a `etc/` directory and
+  - custom Python code (a subclass of [`cldfbench.Dataset`](src/cldfbench/dataset.py) which implements the workflow actions).
+
+This workflow is supported via:
+- a commandline interface `cldfbench` which calls the workflow actions as [subcommands](src/cldfbench/commands),
+- a `cldfbench.Dataset` base class, which must be overwritten in a custom module
+  to hook custom code into the workflow.
+
+With this workflow and the separation of the data into three directories we want
+to provide a workbench for transparently deriving CLDF data from data that has been
+published before. In particular we want to delineate clearly:
+- what forms part of the original or source data (`raw`), 
+- what kind of information is added by the curators of the CLDF dataset (`etc`)
+- and what data was derived using the workbench (`cldf`).
+
+
+### Further reading
+
+This paper introduces `cldfbench` and uses an extended, real-world example:
+
+> Forkel, R., & List, J.-M. (2020). CLDFBench: Give your cross-linguistic data a lift. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, et al. (Eds.), Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) (pp. 6995-7002). Paris: European Language Resources Association (ELRA). [[PDF]](https://pure.mpg.de/pubman/item/item_3231858_1/component/file_3231859/shh2600.pdf)
+
+
+## Installation
+
+`cldfbench` can be installed via `pip` - preferably in a 
+[virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) - by running:
+```shell script
+pip install cldfbench
+```
+
+`cldfbench` provides some functionality that relies on python
+packages which are not needed for the core functionality. These are specified as [extras](https://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies) and can be installed using syntax like:
+```shell
+pip install cldfbench[<extras>]
+```
+where `<extras>` is a comma-separated list of names from the following list:
+- `excel`: support for reading spreadsheet data.
+- `glottolog`: support to access [Glottolog data](https://github.com/glottolog/glottolog).
+- `concepticon`: support to access [Concepticon data](https://github.com/concepticon/concepticon-data).
+- `clts`: support to access [CLTS data](https://github.com/cldf-clts/clts).
+
+
+## The command line interface `cldfbench`
+
+Installing the python package will also install a command `cldfbench` available on
+the command line:
+```shell script
+$ cldfbench -h
+usage: cldfbench [-h] [--log-level LOG_LEVEL] COMMAND ...
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --log-level LOG_LEVEL
+                        log level [ERROR|WARN|INFO|DEBUG] (default: 20)
+
+available commands:
+  Run "COMAMND -h" to get help for a specific command.
+
+  COMMAND
+    check               Run generic CLDF checks
+    ...
+```
+
+As shown above, run `cldfbench -h` to get help, and `cldfbench COMMAND -h` to get
+help on individual subcommands, e.g. `cldfbench new -h` to read about the usage
+of the `new` subcommand.
+
+
+### Dataset discovery
+
+Most `cldfbench` commands operate on an existing dataset (unlike `new`, which
+creates a new one). Datasets can be discovered in two ways:
+
+1. Via the python module (i.e. the `*.py` file, containing the `Dataset` subclass).
+   To use this mode of discovery, pass the path to the python module
+   as `DATASET` argument, when required by a command.
+
+2. Via [entry point](https://packaging.python.org/specifications/entry-points/) and
+   dataset ID. To use this mode, specify the name of the entry point as value of
+   the `--entry-point` option (or use the default name `cldfbench.dataset`) and
+   the `Dataset.id` as `DATASET` argument.
+
+Discovery via entry point is particularly useful for commands that can operate
+on multiple datasets. To select **all** datasets advertising a given entry point,
+pass `"_"` (i.e. an underscore) as `DATASET` argument.
+
+
+## Workflow
+
+For a full example of the `cldfbench` curation workflow, see [the tutorial](doc/tutorial.md).
+
+
+### Creating a skeleton for a new dataset directory
+
+A directory containing stub entries for a dataset can be created running
+
+```bash
+cldfbench new
+```
+
+This will create the following layout (where `<ID>` stands for the chosen dataset ID):
+```
+<ID>/
+├── cldf               # A stub directory for the CLDF data
+│   └── README.md
+├── cldfbench_<ID>.py  # The python module, providing the Dataset subclass
+├── etc                # A stub directory for the configuration data
+│   └── README.md
+├── metadata.json      # The metadata provided to the subcommand serialized as JSON
+├── raw                # A stub directory for the raw data
+│   └── README.md
+├── setup.cfg          # Python setup config, providing defaults for test integration
+├── setup.py           # Python setup file, making the dataset "installable" 
+├── test.py            # The python code to run for dataset validation
+└── .travis.yml        # Integrate the validation with Travis-CI
+```
+
+
+### Implementing CLDF creation
+
+`cldfbench` provides tools to make CLDF creation simple. Still, each dataset is
+different, and so each dataset will have to provide its own custom code to do so.
+This custom code goes into the `cmd_makecldf` method of the `Dataset` subclass in
+the dataset's python module.
+(See also the [API documentation of `cldfbench.Dataset`](https://cldfbench.readthedocs.io/en/latest/dataset.html).)
+
+Typically, this code will make use of one or more
+[`cldfbench.CLDFSpec`](src/cldfbench/cldf.py) instances, which describes what kind of CLDF to create. A `CLDFSpec` also gives access to a
+[`cldfbench.CLDFWriter`](src/cldfbench/cldf.py) instance, which wraps a `pycldf.Dataset`.
+
+The main interfaces to these objects are:
+- `cldfbench.Dataset.cldf_specs`: a method returning specifications of all CLDF datasets
+  that are created by the dataset,
+- `cldfbench.Dataset.cldf_writer`: a method returning an initialized `CLDFWriter` 
+  associated with a particular `CLDFSpec`.
+
+`cldfbench` supports several scenarios of CLDF creation:
+- The typical use case is turning raw data into a single CLDF dataset. This would
+  require instantiating one `CLDFWriter` writer in the `cmd_makecldf` method, and
+  the defaults of `CLDFSpec` will probably be ok. Since this is the most common and
+  simplest case, it is supported with some extra "sugar": The initialized `CLDFWriter`
+  is available as `args.writer` when `cmd_makecldf` is called.
+- But it is also possible to create multiple CLDF datasets:
+  - For a dataset containing both, lexical and typological data, it may be appropriate
+    to create a `Ẁordlist` and a `StructureDataset`. To do so, one would have to
+    call `cldf_writer` twice, passing in an approriate `CLDFSpec`. Note that if
+    both CLDF datasets are created in the same directory, they can share the
+    `LanguageTable` - but would have to specify distinct file names for the
+    `ParameterTable`, passing distinct values to `CLDFSpec.data_fnames`.
+  - When creating multiple datasets of the same CLDF module, e.g. to split a large  dataset into smaller chunks, care must be taken to also disambiguate the name
+    of the metadata file, passing distinct values to `CLDFSpec.metadata_fname`.
+
+When creating CLDF, it is also often useful to have standard reference catalogs
+accessible, in particular Glottolog. See the section on [Catalogs](#catalogs) for
+a description of how this is supported by `cldfbench`.
+
+
+### Catalogs
+
+Linking data to reference catalogs is a major goal of CLDF, thus `cldfbench`
+provides tools to make catalog access and maintenance easier. Catalog data must be
+accessible in local clones of the data repository. `cldfbench` provides commands:
+- `catconfig` to create the clones and make them known through a configuration file,
+- `catinfo` to get an overview of the installed catalogs and their versions,
+- `catupdate` to update local clones from the upstream repositories.
+
+See:
+
+- https://cldfbench.readthedocs.io/en/latest//catalogs.html
+
+for a list of reference catalogs which are currently supported in `cldfbench`.
+
+
+### Curating a dataset on GitHub
+
+One of the design goals of CLDF was to specify a data format that plays well with
+version control. Thus, it's natural - and actually recommended - to curate a CLDF
+dataset in a version controlled repository. The most popular way to do this in a
+collaborative fashion is by using a [git](https://git-scm.com/) repository hosted on 
+[GitHub](https://github.com).
+
+The directory layout supported by `cldfbench` caters to this use case in several ways:
+- Each directory contains a file `README.md`, which will be rendered as human readable
+  description when browsing the repository at GitHub.
+- The file `.travis.yml` contains the configuration for hooking up a repository with
+  [Travis CI](https://www.travis-ci.org/), to provide continuous consistency checking
+  of the data.
+
+
+### Archiving a dataset with Zenodo
+
+Curating a dataset on GitHub also provides a simple way to archiving and publishing
+released versions of the data. You can hook up your repository with [Zenodo](https://zenodo.org) (following [this guide](https://guides.github.com/activities/citable-code/)). Then, Zenodo will pick up any released package, assign a DOI to it, archive it and
+make it accessible in the long-term.
+
+Some notes:
+- Hook-up with Zenodo requires the repository to be public (not private).
+- You should consider using an institutional account on GitHub and Zenodo to associate the repository with. Currently, only the user account registering a repository on Zenodo can change any metadata of releases lateron.
+- Once released and archived with Zenodo, it's a good idea to add the DOI assigned by Zenodo to the release description on GitHub.
+- To make sure a release is picked up by Zenodo, the version number must start with a letter, e.g. "v1.0" - **not** "1.0".
+
+Thus, with a setup as described here, you can make sure you create [FAIR data](https://en.wikipedia.org/wiki/FAIR_data).
+
+
+## Extending `cldfbench`
+
+`cldfbench` can be extended or built-upon in various ways - typically by customizing core functionality in new python packages. To support particular types of raw data, you might want a custom `Dataset` class, or to support a particular type of CLDF data, you would customize `CLDFWriter`.
+
+In addition to extending `cldfbench` using the standard methods of object-oriented programming, there are two more ways of extending `cldfbench`:
+
+
+### Commands
+
+A python package (or a dataset) can provide additional subcommands to be run from `cldfbench`.
+For more info see the [`commands.README`](src/cldfbench/commands/README.md).
+
+
+### Custom dataset templates
+
+A python package can provide alternative dataset templates to be run with `cldfbench new`.
+Such templates are implemented by:
+- a subclass of `cldfbench.Template`,
+- which is advertised using an entry point `cldfbench.scaffold`:
+
+```python
+    entry_points={
+        'cldfbench.scaffold': [
+            'template_name=mypackage.scaffold:DerivedTemplate',
+        ],
+    },
+```
+
+
+
+
+%package help
+Summary:	Development documents and examples for cldfbench
+Provides:	python3-cldfbench-doc
+%description help
+# cldfbench
+Tooling to create [CLDF](https://cldf.clld.org) datasets from existing data.
+
+[![Build Status](https://github.com/cldf/cldfbench/workflows/tests/badge.svg)](https://github.com/cldf/cldfbench/actions?query=workflow%3Atests)
+[![Documentation Status](https://readthedocs.org/projects/cldfbench/badge/?version=latest)](https://cldfbench.readthedocs.io/en/latest/?badge=latest)
+[![PyPI](https://img.shields.io/pypi/v/cldfbench.svg)](https://pypi.org/project/cldfbench)
+
+
+## Overview
+
+This package provides tools to curate cross-linguistic data, with the goal of
+packaging it as [CLDF](https://cldf.clld.org) datasets.
+
+In particular, it supports a workflow where:
+- "raw" source data is downloaded to a `raw/` subdirectory,
+- and subsequently converted to one or more CLDF datasets in a `cldf/` subdirectory, with the help of:
+  - configuration data in a `etc/` directory and
+  - custom Python code (a subclass of [`cldfbench.Dataset`](src/cldfbench/dataset.py) which implements the workflow actions).
+
+This workflow is supported via:
+- a commandline interface `cldfbench` which calls the workflow actions as [subcommands](src/cldfbench/commands),
+- a `cldfbench.Dataset` base class, which must be overwritten in a custom module
+  to hook custom code into the workflow.
+
+With this workflow and the separation of the data into three directories we want
+to provide a workbench for transparently deriving CLDF data from data that has been
+published before. In particular we want to delineate clearly:
+- what forms part of the original or source data (`raw`), 
+- what kind of information is added by the curators of the CLDF dataset (`etc`)
+- and what data was derived using the workbench (`cldf`).
+
+
+### Further reading
+
+This paper introduces `cldfbench` and uses an extended, real-world example:
+
+> Forkel, R., & List, J.-M. (2020). CLDFBench: Give your cross-linguistic data a lift. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, et al. (Eds.), Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) (pp. 6995-7002). Paris: European Language Resources Association (ELRA). [[PDF]](https://pure.mpg.de/pubman/item/item_3231858_1/component/file_3231859/shh2600.pdf)
+
+
+## Installation
+
+`cldfbench` can be installed via `pip` - preferably in a 
+[virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) - by running:
+```shell script
+pip install cldfbench
+```
+
+`cldfbench` provides some functionality that relies on python
+packages which are not needed for the core functionality. These are specified as [extras](https://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies) and can be installed using syntax like:
+```shell
+pip install cldfbench[<extras>]
+```
+where `<extras>` is a comma-separated list of names from the following list:
+- `excel`: support for reading spreadsheet data.
+- `glottolog`: support to access [Glottolog data](https://github.com/glottolog/glottolog).
+- `concepticon`: support to access [Concepticon data](https://github.com/concepticon/concepticon-data).
+- `clts`: support to access [CLTS data](https://github.com/cldf-clts/clts).
+
+
+## The command line interface `cldfbench`
+
+Installing the python package will also install a command `cldfbench` available on
+the command line:
+```shell script
+$ cldfbench -h
+usage: cldfbench [-h] [--log-level LOG_LEVEL] COMMAND ...
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --log-level LOG_LEVEL
+                        log level [ERROR|WARN|INFO|DEBUG] (default: 20)
+
+available commands:
+  Run "COMAMND -h" to get help for a specific command.
+
+  COMMAND
+    check               Run generic CLDF checks
+    ...
+```
+
+As shown above, run `cldfbench -h` to get help, and `cldfbench COMMAND -h` to get
+help on individual subcommands, e.g. `cldfbench new -h` to read about the usage
+of the `new` subcommand.
+
+
+### Dataset discovery
+
+Most `cldfbench` commands operate on an existing dataset (unlike `new`, which
+creates a new one). Datasets can be discovered in two ways:
+
+1. Via the python module (i.e. the `*.py` file, containing the `Dataset` subclass).
+   To use this mode of discovery, pass the path to the python module
+   as `DATASET` argument, when required by a command.
+
+2. Via [entry point](https://packaging.python.org/specifications/entry-points/) and
+   dataset ID. To use this mode, specify the name of the entry point as value of
+   the `--entry-point` option (or use the default name `cldfbench.dataset`) and
+   the `Dataset.id` as `DATASET` argument.
+
+Discovery via entry point is particularly useful for commands that can operate
+on multiple datasets. To select **all** datasets advertising a given entry point,
+pass `"_"` (i.e. an underscore) as `DATASET` argument.
+
+
+## Workflow
+
+For a full example of the `cldfbench` curation workflow, see [the tutorial](doc/tutorial.md).
+
+
+### Creating a skeleton for a new dataset directory
+
+A directory containing stub entries for a dataset can be created running
+
+```bash
+cldfbench new
+```
+
+This will create the following layout (where `<ID>` stands for the chosen dataset ID):
+```
+<ID>/
+├── cldf               # A stub directory for the CLDF data
+│   └── README.md
+├── cldfbench_<ID>.py  # The python module, providing the Dataset subclass
+├── etc                # A stub directory for the configuration data
+│   └── README.md
+├── metadata.json      # The metadata provided to the subcommand serialized as JSON
+├── raw                # A stub directory for the raw data
+│   └── README.md
+├── setup.cfg          # Python setup config, providing defaults for test integration
+├── setup.py           # Python setup file, making the dataset "installable" 
+├── test.py            # The python code to run for dataset validation
+└── .travis.yml        # Integrate the validation with Travis-CI
+```
+
+
+### Implementing CLDF creation
+
+`cldfbench` provides tools to make CLDF creation simple. Still, each dataset is
+different, and so each dataset will have to provide its own custom code to do so.
+This custom code goes into the `cmd_makecldf` method of the `Dataset` subclass in
+the dataset's python module.
+(See also the [API documentation of `cldfbench.Dataset`](https://cldfbench.readthedocs.io/en/latest/dataset.html).)
+
+Typically, this code will make use of one or more
+[`cldfbench.CLDFSpec`](src/cldfbench/cldf.py) instances, which describes what kind of CLDF to create. A `CLDFSpec` also gives access to a
+[`cldfbench.CLDFWriter`](src/cldfbench/cldf.py) instance, which wraps a `pycldf.Dataset`.
+
+The main interfaces to these objects are:
+- `cldfbench.Dataset.cldf_specs`: a method returning specifications of all CLDF datasets
+  that are created by the dataset,
+- `cldfbench.Dataset.cldf_writer`: a method returning an initialized `CLDFWriter` 
+  associated with a particular `CLDFSpec`.
+
+`cldfbench` supports several scenarios of CLDF creation:
+- The typical use case is turning raw data into a single CLDF dataset. This would
+  require instantiating one `CLDFWriter` writer in the `cmd_makecldf` method, and
+  the defaults of `CLDFSpec` will probably be ok. Since this is the most common and
+  simplest case, it is supported with some extra "sugar": The initialized `CLDFWriter`
+  is available as `args.writer` when `cmd_makecldf` is called.
+- But it is also possible to create multiple CLDF datasets:
+  - For a dataset containing both, lexical and typological data, it may be appropriate
+    to create a `Ẁordlist` and a `StructureDataset`. To do so, one would have to
+    call `cldf_writer` twice, passing in an approriate `CLDFSpec`. Note that if
+    both CLDF datasets are created in the same directory, they can share the
+    `LanguageTable` - but would have to specify distinct file names for the
+    `ParameterTable`, passing distinct values to `CLDFSpec.data_fnames`.
+  - When creating multiple datasets of the same CLDF module, e.g. to split a large  dataset into smaller chunks, care must be taken to also disambiguate the name
+    of the metadata file, passing distinct values to `CLDFSpec.metadata_fname`.
+
+When creating CLDF, it is also often useful to have standard reference catalogs
+accessible, in particular Glottolog. See the section on [Catalogs](#catalogs) for
+a description of how this is supported by `cldfbench`.
+
+
+### Catalogs
+
+Linking data to reference catalogs is a major goal of CLDF, thus `cldfbench`
+provides tools to make catalog access and maintenance easier. Catalog data must be
+accessible in local clones of the data repository. `cldfbench` provides commands:
+- `catconfig` to create the clones and make them known through a configuration file,
+- `catinfo` to get an overview of the installed catalogs and their versions,
+- `catupdate` to update local clones from the upstream repositories.
+
+See:
+
+- https://cldfbench.readthedocs.io/en/latest//catalogs.html
+
+for a list of reference catalogs which are currently supported in `cldfbench`.
+
+
+### Curating a dataset on GitHub
+
+One of the design goals of CLDF was to specify a data format that plays well with
+version control. Thus, it's natural - and actually recommended - to curate a CLDF
+dataset in a version controlled repository. The most popular way to do this in a
+collaborative fashion is by using a [git](https://git-scm.com/) repository hosted on 
+[GitHub](https://github.com).
+
+The directory layout supported by `cldfbench` caters to this use case in several ways:
+- Each directory contains a file `README.md`, which will be rendered as human readable
+  description when browsing the repository at GitHub.
+- The file `.travis.yml` contains the configuration for hooking up a repository with
+  [Travis CI](https://www.travis-ci.org/), to provide continuous consistency checking
+  of the data.
+
+
+### Archiving a dataset with Zenodo
+
+Curating a dataset on GitHub also provides a simple way to archiving and publishing
+released versions of the data. You can hook up your repository with [Zenodo](https://zenodo.org) (following [this guide](https://guides.github.com/activities/citable-code/)). Then, Zenodo will pick up any released package, assign a DOI to it, archive it and
+make it accessible in the long-term.
+
+Some notes:
+- Hook-up with Zenodo requires the repository to be public (not private).
+- You should consider using an institutional account on GitHub and Zenodo to associate the repository with. Currently, only the user account registering a repository on Zenodo can change any metadata of releases lateron.
+- Once released and archived with Zenodo, it's a good idea to add the DOI assigned by Zenodo to the release description on GitHub.
+- To make sure a release is picked up by Zenodo, the version number must start with a letter, e.g. "v1.0" - **not** "1.0".
+
+Thus, with a setup as described here, you can make sure you create [FAIR data](https://en.wikipedia.org/wiki/FAIR_data).
+
+
+## Extending `cldfbench`
+
+`cldfbench` can be extended or built-upon in various ways - typically by customizing core functionality in new python packages. To support particular types of raw data, you might want a custom `Dataset` class, or to support a particular type of CLDF data, you would customize `CLDFWriter`.
+
+In addition to extending `cldfbench` using the standard methods of object-oriented programming, there are two more ways of extending `cldfbench`:
+
+
+### Commands
+
+A python package (or a dataset) can provide additional subcommands to be run from `cldfbench`.
+For more info see the [`commands.README`](src/cldfbench/commands/README.md).
+
+
+### Custom dataset templates
+
+A python package can provide alternative dataset templates to be run with `cldfbench new`.
+Such templates are implemented by:
+- a subclass of `cldfbench.Template`,
+- which is advertised using an entry point `cldfbench.scaffold`:
+
+```python
+    entry_points={
+        'cldfbench.scaffold': [
+            'template_name=mypackage.scaffold:DerivedTemplate',
+        ],
+    },
+```
+
+
+
+
+%prep
+%autosetup -n cldfbench-1.13.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-cldfbench -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 31 2023 Python_Bot <Python_Bot@openeuler.org> - 1.13.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..fad3a47
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+e82b71ffa3ed6b2fac06acc795feaed8  cldfbench-1.13.0.tar.gz