%global _empty_manifest_terminate_build 0 Name: python-wiktionary-de-parser Version: 0.9.5 Release: 1 Summary: Extracts data from German Wiktionary dump files. Allows you to add your own extraction methods 🚀 License: MIT URL: https://github.com/gambolputty/wiktionary-de-parser Source0: https://mirrors.nju.edu.cn/pypi/web/packages/d6/e6/d91d18aff8de3b01402413043ea9a53c83ea83d36bed6ba6c47f37be6ab8/wiktionary-de-parser-0.9.5.tar.gz BuildArch: noarch Requires: python3-lxml Requires: python3-mwparserfromhell %description # wiktionary-de-parser This is a Python module to extract data from German Wiktionary XML files (for Python 3.7+). It allows you to add your own extraction methods. ## Installation `pip install wiktionary-de-parser` ## Features - Extracts flexion tables, genus, IPA, language, lemma, part of speech (basic), syllables, raw Wikitext - Allows you to add your own extraction methods (pass them as argument) - Yields per section, not per page (a word can have multiple meanings --> multiple sections of a Wiktionary pages) ## Usage ```python from bz2 import BZ2File from wiktionary_de_parser import Parser bzfile_path = '/tmp/dewiktionary-latest-pages-articles-multistream.xml.bz2' bz_file = BZ2File(bzfile_path) for record in Parser(bz_file): if 'lang_code' not in record or record['lang_code'] != 'de': continue # do stuff with 'record' ``` Note: In this example we load a compressed Wiktionary dump file that was [obtained from here](https://dumps.wikimedia.org/dewiktionary/latest). ### Adding new extraction methods An extraction method takes the following arguments: - `title` (_string_): The title of the current Wiktionary page - `text` (_string_): The [Wikitext](https://en.wikipedia.org/wiki/Wiki#Editing) of the current word entry/section - `current_record` (_Dict_): A dictionary with all values of the current iteration (e. g. `current_record['lang_code']`) It must return a `Dict` with the results or `False` if the record was processed unsuccesfully. ```python # Create a new extraction method def my_method(title, text, current_record): # do stuff return {'my_field': my_data} if my_data else False # Pass a list with all extraction methods to the class constructor: for record in Parser(bz_file, custom_methods=[my_method]): print(record['my_field']) ``` ## Output Example output for the word "Abend": ```python {'flexion': {'Akkusativ Plural': 'Abende', 'Akkusativ Singular': 'Abend', 'Dativ Plural': 'Abenden', 'Dativ Singular': 'Abend', 'Genitiv Plural': 'Abende', 'Genitiv Singular': 'Abends', 'Genus': 'm', 'Nominativ Plural': 'Abende', 'Nominativ Singular': 'Abend'}, 'inflected': False, 'ipa': ['ˈaːbn̩t', 'ˈaːbm̩t'], 'lang': 'Deutsch', 'lang_code': 'de', 'lemma': 'Abend', 'pos': {'Substantiv': []}, 'rhymes': ['aːbn̩t'], 'syllables': ['Abend'], 'title': 'Abend'} ``` ## Development This project uses [Poetry](https://python-poetry.org/). 1. Install [Poetry](https://python-poetry.org/). 2. Clone this repository 3. Run `poetry install` inside of the project folder to install dependencies. 4. Change `wiktionary_de_parser/run.py` to your needs. 5. Run `poetry run python wiktionary_de_parser/run.py` to run the parser. Or `poetry run pytest` to run tests. ## License [MIT](https://github.com/gambolputty/wiktionary-de-parser/blob/master/LICENSE.md) © Gregor Weichbrodt %package -n python3-wiktionary-de-parser Summary: Extracts data from German Wiktionary dump files. Allows you to add your own extraction methods 🚀 Provides: python-wiktionary-de-parser BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-wiktionary-de-parser # wiktionary-de-parser This is a Python module to extract data from German Wiktionary XML files (for Python 3.7+). It allows you to add your own extraction methods. ## Installation `pip install wiktionary-de-parser` ## Features - Extracts flexion tables, genus, IPA, language, lemma, part of speech (basic), syllables, raw Wikitext - Allows you to add your own extraction methods (pass them as argument) - Yields per section, not per page (a word can have multiple meanings --> multiple sections of a Wiktionary pages) ## Usage ```python from bz2 import BZ2File from wiktionary_de_parser import Parser bzfile_path = '/tmp/dewiktionary-latest-pages-articles-multistream.xml.bz2' bz_file = BZ2File(bzfile_path) for record in Parser(bz_file): if 'lang_code' not in record or record['lang_code'] != 'de': continue # do stuff with 'record' ``` Note: In this example we load a compressed Wiktionary dump file that was [obtained from here](https://dumps.wikimedia.org/dewiktionary/latest). ### Adding new extraction methods An extraction method takes the following arguments: - `title` (_string_): The title of the current Wiktionary page - `text` (_string_): The [Wikitext](https://en.wikipedia.org/wiki/Wiki#Editing) of the current word entry/section - `current_record` (_Dict_): A dictionary with all values of the current iteration (e. g. `current_record['lang_code']`) It must return a `Dict` with the results or `False` if the record was processed unsuccesfully. ```python # Create a new extraction method def my_method(title, text, current_record): # do stuff return {'my_field': my_data} if my_data else False # Pass a list with all extraction methods to the class constructor: for record in Parser(bz_file, custom_methods=[my_method]): print(record['my_field']) ``` ## Output Example output for the word "Abend": ```python {'flexion': {'Akkusativ Plural': 'Abende', 'Akkusativ Singular': 'Abend', 'Dativ Plural': 'Abenden', 'Dativ Singular': 'Abend', 'Genitiv Plural': 'Abende', 'Genitiv Singular': 'Abends', 'Genus': 'm', 'Nominativ Plural': 'Abende', 'Nominativ Singular': 'Abend'}, 'inflected': False, 'ipa': ['ˈaːbn̩t', 'ˈaːbm̩t'], 'lang': 'Deutsch', 'lang_code': 'de', 'lemma': 'Abend', 'pos': {'Substantiv': []}, 'rhymes': ['aːbn̩t'], 'syllables': ['Abend'], 'title': 'Abend'} ``` ## Development This project uses [Poetry](https://python-poetry.org/). 1. Install [Poetry](https://python-poetry.org/). 2. Clone this repository 3. Run `poetry install` inside of the project folder to install dependencies. 4. Change `wiktionary_de_parser/run.py` to your needs. 5. Run `poetry run python wiktionary_de_parser/run.py` to run the parser. Or `poetry run pytest` to run tests. ## License [MIT](https://github.com/gambolputty/wiktionary-de-parser/blob/master/LICENSE.md) © Gregor Weichbrodt %package help Summary: Development documents and examples for wiktionary-de-parser Provides: python3-wiktionary-de-parser-doc %description help # wiktionary-de-parser This is a Python module to extract data from German Wiktionary XML files (for Python 3.7+). It allows you to add your own extraction methods. ## Installation `pip install wiktionary-de-parser` ## Features - Extracts flexion tables, genus, IPA, language, lemma, part of speech (basic), syllables, raw Wikitext - Allows you to add your own extraction methods (pass them as argument) - Yields per section, not per page (a word can have multiple meanings --> multiple sections of a Wiktionary pages) ## Usage ```python from bz2 import BZ2File from wiktionary_de_parser import Parser bzfile_path = '/tmp/dewiktionary-latest-pages-articles-multistream.xml.bz2' bz_file = BZ2File(bzfile_path) for record in Parser(bz_file): if 'lang_code' not in record or record['lang_code'] != 'de': continue # do stuff with 'record' ``` Note: In this example we load a compressed Wiktionary dump file that was [obtained from here](https://dumps.wikimedia.org/dewiktionary/latest). ### Adding new extraction methods An extraction method takes the following arguments: - `title` (_string_): The title of the current Wiktionary page - `text` (_string_): The [Wikitext](https://en.wikipedia.org/wiki/Wiki#Editing) of the current word entry/section - `current_record` (_Dict_): A dictionary with all values of the current iteration (e. g. `current_record['lang_code']`) It must return a `Dict` with the results or `False` if the record was processed unsuccesfully. ```python # Create a new extraction method def my_method(title, text, current_record): # do stuff return {'my_field': my_data} if my_data else False # Pass a list with all extraction methods to the class constructor: for record in Parser(bz_file, custom_methods=[my_method]): print(record['my_field']) ``` ## Output Example output for the word "Abend": ```python {'flexion': {'Akkusativ Plural': 'Abende', 'Akkusativ Singular': 'Abend', 'Dativ Plural': 'Abenden', 'Dativ Singular': 'Abend', 'Genitiv Plural': 'Abende', 'Genitiv Singular': 'Abends', 'Genus': 'm', 'Nominativ Plural': 'Abende', 'Nominativ Singular': 'Abend'}, 'inflected': False, 'ipa': ['ˈaːbn̩t', 'ˈaːbm̩t'], 'lang': 'Deutsch', 'lang_code': 'de', 'lemma': 'Abend', 'pos': {'Substantiv': []}, 'rhymes': ['aːbn̩t'], 'syllables': ['Abend'], 'title': 'Abend'} ``` ## Development This project uses [Poetry](https://python-poetry.org/). 1. Install [Poetry](https://python-poetry.org/). 2. Clone this repository 3. Run `poetry install` inside of the project folder to install dependencies. 4. Change `wiktionary_de_parser/run.py` to your needs. 5. Run `poetry run python wiktionary_de_parser/run.py` to run the parser. Or `poetry run pytest` to run tests. ## License [MIT](https://github.com/gambolputty/wiktionary-de-parser/blob/master/LICENSE.md) © Gregor Weichbrodt %prep %autosetup -n wiktionary-de-parser-0.9.5 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-wiktionary-de-parser -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Wed May 17 2023 Python_Bot - 0.9.5-1 - Package Spec generated