%global _empty_manifest_terminate_build 0
Name:		python-wiktionary-de-parser
Version:	0.9.5
Release:	1
Summary:	Extracts data from German Wiktionary dump files. Allows you to add your own extraction methods 🚀
License:	MIT
URL:		https://github.com/gambolputty/wiktionary-de-parser
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/d6/e6/d91d18aff8de3b01402413043ea9a53c83ea83d36bed6ba6c47f37be6ab8/wiktionary-de-parser-0.9.5.tar.gz
BuildArch:	noarch

Requires:	python3-lxml
Requires:	python3-mwparserfromhell

%description
# wiktionary-de-parser

This is a Python module to extract data from German Wiktionary XML files (for Python 3.7+). It allows you to add your own extraction methods.

## Installation

`pip install wiktionary-de-parser`

## Features

- Extracts flexion tables, genus, IPA, language, lemma, part of speech (basic), syllables, raw Wikitext
- Allows you to add your own extraction methods (pass them as argument)
- Yields per section, not per page (a word can have multiple meanings --> multiple sections of a Wiktionary pages)

## Usage

```python
from bz2 import BZ2File
from wiktionary_de_parser import Parser

bzfile_path = '/tmp/dewiktionary-latest-pages-articles-multistream.xml.bz2'
bz_file = BZ2File(bzfile_path)

for record in Parser(bz_file):
    if 'lang_code' not in record or record['lang_code'] != 'de':
      continue
    # do stuff with 'record'
```

Note: In this example we load a compressed Wiktionary dump file that was [obtained from here](https://dumps.wikimedia.org/dewiktionary/latest).

### Adding new extraction methods

An extraction method takes the following arguments:

- `title` (_string_): The title of the current Wiktionary page
- `text` (_string_): The [Wikitext](https://en.wikipedia.org/wiki/Wiki#Editing) of the current word entry/section
- `current_record` (_Dict_): A dictionary with all values of the current iteration (e. g. `current_record['lang_code']`)

It must return a `Dict` with the results or `False` if the record was processed unsuccesfully.

```python
# Create a new extraction method
def my_method(title, text, current_record):
  # do stuff
  return {'my_field': my_data} if my_data else False

# Pass a list with all extraction methods to the class constructor:
for record in Parser(bz_file, custom_methods=[my_method]):
    print(record['my_field'])
```

## Output
Example output for the word "Abend":
```python
{'flexion': {'Akkusativ Plural': 'Abende',
             'Akkusativ Singular': 'Abend',
             'Dativ Plural': 'Abenden',
             'Dativ Singular': 'Abend',
             'Genitiv Plural': 'Abende',
             'Genitiv Singular': 'Abends',
             'Genus': 'm',
             'Nominativ Plural': 'Abende',
             'Nominativ Singular': 'Abend'},
 'inflected': False,
 'ipa': ['ˈaːbn̩t', 'ˈaːbm̩t'],
 'lang': 'Deutsch',
 'lang_code': 'de',
 'lemma': 'Abend',
 'pos': {'Substantiv': []},
 'rhymes': ['aːbn̩t'],
 'syllables': ['Abend'],
 'title': 'Abend'}
```

## Development
This project uses [Poetry](https://python-poetry.org/).

1. Install [Poetry](https://python-poetry.org/).
2. Clone this repository
3. Run `poetry install` inside of the project folder to install dependencies.
4. Change `wiktionary_de_parser/run.py` to your needs.
5. Run `poetry run python wiktionary_de_parser/run.py` to run the parser. Or `poetry run pytest` to run tests.

## License

[MIT](https://github.com/gambolputty/wiktionary-de-parser/blob/master/LICENSE.md) © Gregor Weichbrodt


%package -n python3-wiktionary-de-parser
Summary:	Extracts data from German Wiktionary dump files. Allows you to add your own extraction methods 🚀
Provides:	python-wiktionary-de-parser
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-wiktionary-de-parser
# wiktionary-de-parser

This is a Python module to extract data from German Wiktionary XML files (for Python 3.7+). It allows you to add your own extraction methods.

## Installation

`pip install wiktionary-de-parser`

## Features

- Extracts flexion tables, genus, IPA, language, lemma, part of speech (basic), syllables, raw Wikitext
- Allows you to add your own extraction methods (pass them as argument)
- Yields per section, not per page (a word can have multiple meanings --> multiple sections of a Wiktionary pages)

## Usage

```python
from bz2 import BZ2File
from wiktionary_de_parser import Parser

bzfile_path = '/tmp/dewiktionary-latest-pages-articles-multistream.xml.bz2'
bz_file = BZ2File(bzfile_path)

for record in Parser(bz_file):
    if 'lang_code' not in record or record['lang_code'] != 'de':
      continue
    # do stuff with 'record'
```

Note: In this example we load a compressed Wiktionary dump file that was [obtained from here](https://dumps.wikimedia.org/dewiktionary/latest).

### Adding new extraction methods

An extraction method takes the following arguments:

- `title` (_string_): The title of the current Wiktionary page
- `text` (_string_): The [Wikitext](https://en.wikipedia.org/wiki/Wiki#Editing) of the current word entry/section
- `current_record` (_Dict_): A dictionary with all values of the current iteration (e. g. `current_record['lang_code']`)

It must return a `Dict` with the results or `False` if the record was processed unsuccesfully.

```python
# Create a new extraction method
def my_method(title, text, current_record):
  # do stuff
  return {'my_field': my_data} if my_data else False

# Pass a list with all extraction methods to the class constructor:
for record in Parser(bz_file, custom_methods=[my_method]):
    print(record['my_field'])
```

## Output
Example output for the word "Abend":
```python
{'flexion': {'Akkusativ Plural': 'Abende',
             'Akkusativ Singular': 'Abend',
             'Dativ Plural': 'Abenden',
             'Dativ Singular': 'Abend',
             'Genitiv Plural': 'Abende',
             'Genitiv Singular': 'Abends',
             'Genus': 'm',
             'Nominativ Plural': 'Abende',
             'Nominativ Singular': 'Abend'},
 'inflected': False,
 'ipa': ['ˈaːbn̩t', 'ˈaːbm̩t'],
 'lang': 'Deutsch',
 'lang_code': 'de',
 'lemma': 'Abend',
 'pos': {'Substantiv': []},
 'rhymes': ['aːbn̩t'],
 'syllables': ['Abend'],
 'title': 'Abend'}
```

## Development
This project uses [Poetry](https://python-poetry.org/).

1. Install [Poetry](https://python-poetry.org/).
2. Clone this repository
3. Run `poetry install` inside of the project folder to install dependencies.
4. Change `wiktionary_de_parser/run.py` to your needs.
5. Run `poetry run python wiktionary_de_parser/run.py` to run the parser. Or `poetry run pytest` to run tests.

## License

[MIT](https://github.com/gambolputty/wiktionary-de-parser/blob/master/LICENSE.md) © Gregor Weichbrodt


%package help
Summary:	Development documents and examples for wiktionary-de-parser
Provides:	python3-wiktionary-de-parser-doc
%description help
# wiktionary-de-parser

This is a Python module to extract data from German Wiktionary XML files (for Python 3.7+). It allows you to add your own extraction methods.

## Installation

`pip install wiktionary-de-parser`

## Features

- Extracts flexion tables, genus, IPA, language, lemma, part of speech (basic), syllables, raw Wikitext
- Allows you to add your own extraction methods (pass them as argument)
- Yields per section, not per page (a word can have multiple meanings --> multiple sections of a Wiktionary pages)

## Usage

```python
from bz2 import BZ2File
from wiktionary_de_parser import Parser

bzfile_path = '/tmp/dewiktionary-latest-pages-articles-multistream.xml.bz2'
bz_file = BZ2File(bzfile_path)

for record in Parser(bz_file):
    if 'lang_code' not in record or record['lang_code'] != 'de':
      continue
    # do stuff with 'record'
```

Note: In this example we load a compressed Wiktionary dump file that was [obtained from here](https://dumps.wikimedia.org/dewiktionary/latest).

### Adding new extraction methods

An extraction method takes the following arguments:

- `title` (_string_): The title of the current Wiktionary page
- `text` (_string_): The [Wikitext](https://en.wikipedia.org/wiki/Wiki#Editing) of the current word entry/section
- `current_record` (_Dict_): A dictionary with all values of the current iteration (e. g. `current_record['lang_code']`)

It must return a `Dict` with the results or `False` if the record was processed unsuccesfully.

```python
# Create a new extraction method
def my_method(title, text, current_record):
  # do stuff
  return {'my_field': my_data} if my_data else False

# Pass a list with all extraction methods to the class constructor:
for record in Parser(bz_file, custom_methods=[my_method]):
    print(record['my_field'])
```

## Output
Example output for the word "Abend":
```python
{'flexion': {'Akkusativ Plural': 'Abende',
             'Akkusativ Singular': 'Abend',
             'Dativ Plural': 'Abenden',
             'Dativ Singular': 'Abend',
             'Genitiv Plural': 'Abende',
             'Genitiv Singular': 'Abends',
             'Genus': 'm',
             'Nominativ Plural': 'Abende',
             'Nominativ Singular': 'Abend'},
 'inflected': False,
 'ipa': ['ˈaːbn̩t', 'ˈaːbm̩t'],
 'lang': 'Deutsch',
 'lang_code': 'de',
 'lemma': 'Abend',
 'pos': {'Substantiv': []},
 'rhymes': ['aːbn̩t'],
 'syllables': ['Abend'],
 'title': 'Abend'}
```

## Development
This project uses [Poetry](https://python-poetry.org/).

1. Install [Poetry](https://python-poetry.org/).
2. Clone this repository
3. Run `poetry install` inside of the project folder to install dependencies.
4. Change `wiktionary_de_parser/run.py` to your needs.
5. Run `poetry run python wiktionary_de_parser/run.py` to run the parser. Or `poetry run pytest` to run tests.

## License

[MIT](https://github.com/gambolputty/wiktionary-de-parser/blob/master/LICENSE.md) © Gregor Weichbrodt


%prep
%autosetup -n wiktionary-de-parser-0.9.5

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-wiktionary-de-parser -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Wed May 17 2023 Python_Bot <Python_Bot@openeuler.org> - 0.9.5-1
- Package Spec generated