summaryrefslogtreecommitdiff
path: root/python-wiktionary-de-parser.spec
diff options
context:
space:
mode:
Diffstat (limited to 'python-wiktionary-de-parser.spec')
-rw-r--r--python-wiktionary-de-parser.spec335
1 files changed, 335 insertions, 0 deletions
diff --git a/python-wiktionary-de-parser.spec b/python-wiktionary-de-parser.spec
new file mode 100644
index 0000000..83ab00c
--- /dev/null
+++ b/python-wiktionary-de-parser.spec
@@ -0,0 +1,335 @@
+%global _empty_manifest_terminate_build 0
+Name: python-wiktionary-de-parser
+Version: 0.9.5
+Release: 1
+Summary: Extracts data from German Wiktionary dump files. Allows you to add your own extraction methods 🚀
+License: MIT
+URL: https://github.com/gambolputty/wiktionary-de-parser
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/d6/e6/d91d18aff8de3b01402413043ea9a53c83ea83d36bed6ba6c47f37be6ab8/wiktionary-de-parser-0.9.5.tar.gz
+BuildArch: noarch
+
+Requires: python3-lxml
+Requires: python3-mwparserfromhell
+
+%description
+# wiktionary-de-parser
+
+This is a Python module to extract data from German Wiktionary XML files (for Python 3.7+). It allows you to add your own extraction methods.
+
+## Installation
+
+`pip install wiktionary-de-parser`
+
+## Features
+
+- Extracts flexion tables, genus, IPA, language, lemma, part of speech (basic), syllables, raw Wikitext
+- Allows you to add your own extraction methods (pass them as argument)
+- Yields per section, not per page (a word can have multiple meanings --> multiple sections of a Wiktionary pages)
+
+## Usage
+
+```python
+from bz2 import BZ2File
+from wiktionary_de_parser import Parser
+
+bzfile_path = '/tmp/dewiktionary-latest-pages-articles-multistream.xml.bz2'
+bz_file = BZ2File(bzfile_path)
+
+for record in Parser(bz_file):
+ if 'lang_code' not in record or record['lang_code'] != 'de':
+ continue
+ # do stuff with 'record'
+```
+
+Note: In this example we load a compressed Wiktionary dump file that was [obtained from here](https://dumps.wikimedia.org/dewiktionary/latest).
+
+### Adding new extraction methods
+
+An extraction method takes the following arguments:
+
+- `title` (_string_): The title of the current Wiktionary page
+- `text` (_string_): The [Wikitext](https://en.wikipedia.org/wiki/Wiki#Editing) of the current word entry/section
+- `current_record` (_Dict_): A dictionary with all values of the current iteration (e. g. `current_record['lang_code']`)
+
+It must return a `Dict` with the results or `False` if the record was processed unsuccesfully.
+
+```python
+# Create a new extraction method
+def my_method(title, text, current_record):
+ # do stuff
+ return {'my_field': my_data} if my_data else False
+
+# Pass a list with all extraction methods to the class constructor:
+for record in Parser(bz_file, custom_methods=[my_method]):
+ print(record['my_field'])
+```
+
+## Output
+Example output for the word "Abend":
+```python
+{'flexion': {'Akkusativ Plural': 'Abende',
+ 'Akkusativ Singular': 'Abend',
+ 'Dativ Plural': 'Abenden',
+ 'Dativ Singular': 'Abend',
+ 'Genitiv Plural': 'Abende',
+ 'Genitiv Singular': 'Abends',
+ 'Genus': 'm',
+ 'Nominativ Plural': 'Abende',
+ 'Nominativ Singular': 'Abend'},
+ 'inflected': False,
+ 'ipa': ['ˈaːbn̊t', 'ˈaːbm̊t'],
+ 'lang': 'Deutsch',
+ 'lang_code': 'de',
+ 'lemma': 'Abend',
+ 'pos': {'Substantiv': []},
+ 'rhymes': ['aːbn̊t'],
+ 'syllables': ['Abend'],
+ 'title': 'Abend'}
+```
+
+## Development
+This project uses [Poetry](https://python-poetry.org/).
+
+1. Install [Poetry](https://python-poetry.org/).
+2. Clone this repository
+3. Run `poetry install` inside of the project folder to install dependencies.
+4. Change `wiktionary_de_parser/run.py` to your needs.
+5. Run `poetry run python wiktionary_de_parser/run.py` to run the parser. Or `poetry run pytest` to run tests.
+
+## License
+
+[MIT](https://github.com/gambolputty/wiktionary-de-parser/blob/master/LICENSE.md) Š Gregor Weichbrodt
+
+
+%package -n python3-wiktionary-de-parser
+Summary: Extracts data from German Wiktionary dump files. Allows you to add your own extraction methods 🚀
+Provides: python-wiktionary-de-parser
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-wiktionary-de-parser
+# wiktionary-de-parser
+
+This is a Python module to extract data from German Wiktionary XML files (for Python 3.7+). It allows you to add your own extraction methods.
+
+## Installation
+
+`pip install wiktionary-de-parser`
+
+## Features
+
+- Extracts flexion tables, genus, IPA, language, lemma, part of speech (basic), syllables, raw Wikitext
+- Allows you to add your own extraction methods (pass them as argument)
+- Yields per section, not per page (a word can have multiple meanings --> multiple sections of a Wiktionary pages)
+
+## Usage
+
+```python
+from bz2 import BZ2File
+from wiktionary_de_parser import Parser
+
+bzfile_path = '/tmp/dewiktionary-latest-pages-articles-multistream.xml.bz2'
+bz_file = BZ2File(bzfile_path)
+
+for record in Parser(bz_file):
+ if 'lang_code' not in record or record['lang_code'] != 'de':
+ continue
+ # do stuff with 'record'
+```
+
+Note: In this example we load a compressed Wiktionary dump file that was [obtained from here](https://dumps.wikimedia.org/dewiktionary/latest).
+
+### Adding new extraction methods
+
+An extraction method takes the following arguments:
+
+- `title` (_string_): The title of the current Wiktionary page
+- `text` (_string_): The [Wikitext](https://en.wikipedia.org/wiki/Wiki#Editing) of the current word entry/section
+- `current_record` (_Dict_): A dictionary with all values of the current iteration (e. g. `current_record['lang_code']`)
+
+It must return a `Dict` with the results or `False` if the record was processed unsuccesfully.
+
+```python
+# Create a new extraction method
+def my_method(title, text, current_record):
+ # do stuff
+ return {'my_field': my_data} if my_data else False
+
+# Pass a list with all extraction methods to the class constructor:
+for record in Parser(bz_file, custom_methods=[my_method]):
+ print(record['my_field'])
+```
+
+## Output
+Example output for the word "Abend":
+```python
+{'flexion': {'Akkusativ Plural': 'Abende',
+ 'Akkusativ Singular': 'Abend',
+ 'Dativ Plural': 'Abenden',
+ 'Dativ Singular': 'Abend',
+ 'Genitiv Plural': 'Abende',
+ 'Genitiv Singular': 'Abends',
+ 'Genus': 'm',
+ 'Nominativ Plural': 'Abende',
+ 'Nominativ Singular': 'Abend'},
+ 'inflected': False,
+ 'ipa': ['ˈaːbn̊t', 'ˈaːbm̊t'],
+ 'lang': 'Deutsch',
+ 'lang_code': 'de',
+ 'lemma': 'Abend',
+ 'pos': {'Substantiv': []},
+ 'rhymes': ['aːbn̊t'],
+ 'syllables': ['Abend'],
+ 'title': 'Abend'}
+```
+
+## Development
+This project uses [Poetry](https://python-poetry.org/).
+
+1. Install [Poetry](https://python-poetry.org/).
+2. Clone this repository
+3. Run `poetry install` inside of the project folder to install dependencies.
+4. Change `wiktionary_de_parser/run.py` to your needs.
+5. Run `poetry run python wiktionary_de_parser/run.py` to run the parser. Or `poetry run pytest` to run tests.
+
+## License
+
+[MIT](https://github.com/gambolputty/wiktionary-de-parser/blob/master/LICENSE.md) Š Gregor Weichbrodt
+
+
+%package help
+Summary: Development documents and examples for wiktionary-de-parser
+Provides: python3-wiktionary-de-parser-doc
+%description help
+# wiktionary-de-parser
+
+This is a Python module to extract data from German Wiktionary XML files (for Python 3.7+). It allows you to add your own extraction methods.
+
+## Installation
+
+`pip install wiktionary-de-parser`
+
+## Features
+
+- Extracts flexion tables, genus, IPA, language, lemma, part of speech (basic), syllables, raw Wikitext
+- Allows you to add your own extraction methods (pass them as argument)
+- Yields per section, not per page (a word can have multiple meanings --> multiple sections of a Wiktionary pages)
+
+## Usage
+
+```python
+from bz2 import BZ2File
+from wiktionary_de_parser import Parser
+
+bzfile_path = '/tmp/dewiktionary-latest-pages-articles-multistream.xml.bz2'
+bz_file = BZ2File(bzfile_path)
+
+for record in Parser(bz_file):
+ if 'lang_code' not in record or record['lang_code'] != 'de':
+ continue
+ # do stuff with 'record'
+```
+
+Note: In this example we load a compressed Wiktionary dump file that was [obtained from here](https://dumps.wikimedia.org/dewiktionary/latest).
+
+### Adding new extraction methods
+
+An extraction method takes the following arguments:
+
+- `title` (_string_): The title of the current Wiktionary page
+- `text` (_string_): The [Wikitext](https://en.wikipedia.org/wiki/Wiki#Editing) of the current word entry/section
+- `current_record` (_Dict_): A dictionary with all values of the current iteration (e. g. `current_record['lang_code']`)
+
+It must return a `Dict` with the results or `False` if the record was processed unsuccesfully.
+
+```python
+# Create a new extraction method
+def my_method(title, text, current_record):
+ # do stuff
+ return {'my_field': my_data} if my_data else False
+
+# Pass a list with all extraction methods to the class constructor:
+for record in Parser(bz_file, custom_methods=[my_method]):
+ print(record['my_field'])
+```
+
+## Output
+Example output for the word "Abend":
+```python
+{'flexion': {'Akkusativ Plural': 'Abende',
+ 'Akkusativ Singular': 'Abend',
+ 'Dativ Plural': 'Abenden',
+ 'Dativ Singular': 'Abend',
+ 'Genitiv Plural': 'Abende',
+ 'Genitiv Singular': 'Abends',
+ 'Genus': 'm',
+ 'Nominativ Plural': 'Abende',
+ 'Nominativ Singular': 'Abend'},
+ 'inflected': False,
+ 'ipa': ['ˈaːbn̊t', 'ˈaːbm̊t'],
+ 'lang': 'Deutsch',
+ 'lang_code': 'de',
+ 'lemma': 'Abend',
+ 'pos': {'Substantiv': []},
+ 'rhymes': ['aːbn̊t'],
+ 'syllables': ['Abend'],
+ 'title': 'Abend'}
+```
+
+## Development
+This project uses [Poetry](https://python-poetry.org/).
+
+1. Install [Poetry](https://python-poetry.org/).
+2. Clone this repository
+3. Run `poetry install` inside of the project folder to install dependencies.
+4. Change `wiktionary_de_parser/run.py` to your needs.
+5. Run `poetry run python wiktionary_de_parser/run.py` to run the parser. Or `poetry run pytest` to run tests.
+
+## License
+
+[MIT](https://github.com/gambolputty/wiktionary-de-parser/blob/master/LICENSE.md) Š Gregor Weichbrodt
+
+
+%prep
+%autosetup -n wiktionary-de-parser-0.9.5
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-wiktionary-de-parser -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 17 2023 Python_Bot <Python_Bot@openeuler.org> - 0.9.5-1
+- Package Spec generated