diff options
author | CoprDistGit <infra@openeuler.org> | 2023-06-20 04:55:49 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-06-20 04:55:49 +0000 |
commit | e74335478fb1cf0db38f149bf7584f8ebb8fd4fd (patch) | |
tree | 958ad6f5b77868acb0ef0202ffb0f75367facc06 | |
parent | 02a5bd6d67c13d6ffbf15fd78970a561c165aa38 (diff) |
automatic import of python-pyglossaryopeneuler20.03
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-pyglossary.spec | 1244 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 1246 insertions, 0 deletions
@@ -0,0 +1 @@ +/pyglossary-4.6.1.tar.gz diff --git a/python-pyglossary.spec b/python-pyglossary.spec new file mode 100644 index 0000000..2205d2d --- /dev/null +++ b/python-pyglossary.spec @@ -0,0 +1,1244 @@ +%global _empty_manifest_terminate_build 0 +Name: python-pyglossary +Version: 4.6.1 +Release: 1 +Summary: A tool for converting dictionary files aka glossaries. +License: GPLv3+ +URL: https://github.com/ilius/pyglossary +Source0: https://mirrors.aliyun.com/pypi/web/packages/24/be/c1652b5ed637d83b33955d73c5a311309ef5328c2e30b197c84a35d3403d/pyglossary-4.6.1.tar.gz +BuildArch: noarch + +Requires: python3-PyICU +Requires: python3-PyYAML +Requires: python3-beautifulsoup4 +Requires: python3-html5lib +Requires: python3-libzim +Requires: python3-lxml +Requires: python3-marisa-trie +Requires: python3-lzo + +%description +# PyGlossary + +A tool for converting dictionary files aka glossaries. + +The primary purpose is to be able to use our offline glossaries in any Open +Source dictionary we like on any OS/device. + +There are countless formats, and my time is limited, so I implement formats that +seem more useful for myself, or for Open Source community. Also diversity of +languages is taken into account. Pull requests are welcome. + +## Screenshots + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/44-gtk-txt-stardict-aryanpur-dark.png" width="50%" height="50%"/> + +Linux - Gtk3-based interface + +______________________________________________________________________ + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/40b-tk-bgl-epub-es-en-2.png" width="50%" height="50%"/> + +Windows - Tkinter-based interface + +______________________________________________________________________ + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/32-cmd-freedict-mids-de-ru.png" width="50%" height="50%"/> + +Linux - command-line interface + +______________________________________________________________________ + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/40-cmdi-termux-zim-slob-en-med.jpg" width="50%" height="50%"/> + +Android Termux - interactive command-line interface + +## Supported formats + +| Format | | Extension | Read | Write | +| ------------------------------------------------------- | :-: | :-------------: | :--: | :---: | +| [Aard 2 (slob)](./doc/p/aard2_slob.md) | 🔢 | .slob | ✔ | ✔ | +| [ABBYY Lingvo DSL](./doc/p/dsl.md) | 📝 | .dsl | ✔ | | +| [Almaany.com](./doc/p/almaany.md) (SQLite3, Arabic) | 🔢 | .db | ✔ | | +| [AppleDict Binary](./doc/p/appledict_bin.md) | 🔢 | .dictionary | ✔ | ❌ | +| [AppleDict Source](./doc/p/appledict.md) | 📁 | | | ✔ | +| [Babylon BGL](./doc/p/babylon_bgl.md) | 🔢 | .bgl | ✔ | ❌ | +| [CC-CEDICT](./doc/p/cc_cedict.md) (Chinese) | 📝 | | ✔ | ❌ | +| [cc-kedict](./doc/p/cc_kedict.md) (Korean) | 📝 | | ✔ | ❌ | +| [CSV](./doc/p/csv.md) | 📝 | .csv | ✔ | ✔ | +| [Dict.cc](./doc/p/dict_cc.md) (SQLite3, German) | 🔢 | .db | ✔ | | +| [DICT.org / Dictd server](./doc/p/dict_org.md) | 📁 | (📝.index) | ✔ | ✔ | +| [DICT.org / dictfmt source](./doc/p/dict_org_source.md) | 📝 | (.dtxt) | | ✔ | +| [dictunformat output file](./doc/p/dictunformat.md) | 📝 | (.dictunformat) | ✔ | | +| [DictionaryForMIDs](./doc/p/dicformids.md) | 📁 | (📁.mids) | ✔ | ✔ | +| [DigitalNK](./doc/p/digitalnk.md) (SQLite3, N-Korean) | 🔢 | .db | ✔ | | +| [DIKT JSON](./doc/p/dikt_json.md) | 📝 | (.json) | | ✔ | +| [EDLIN](./doc/p/edlin.md) | 📁 | .edlin | ✔ | ✔ | +| [EPUB-2 E-Book](./doc/p/epub2.md) | 📦 | .epub | ❌ | ✔ | +| [FreeDict](./doc/p/freedict.md) | 📝 | .tei | ✔ | ❌ | +| [Gettext Source](./doc/p/gettext_po.md) | 📝 | .po | ✔ | ✔ | +| [HTML Directory (by file size)](./doc/p/html_dir.md) | 📁 | | ❌ | ✔ | +| [JMDict](./doc/p/jmdict.md) (Japanese) | 📝 | | ✔ | ❌ | +| [JSON](./doc/p/json.md) | 📝 | .json | | ✔ | +| [Kobo E-Reader Dictionary](./doc/p/kobo.md) | 📦 | .kobo.zip | ❌ | ✔ | +| [Kobo E-Reader Dictfile](./doc/p/kobo_dictfile.md) | 📝 | .df | ✔ | ✔ | +| [Lingoes Source](./doc/p/lingoes_ldf.md) | 📝 | .ldf | ✔ | ✔ | +| [Mobipocket E-Book](./doc/p/mobi.md) | 🔢 | .mobi | ❌ | ✔ | +| [Octopus MDict](./doc/p/octopus_mdict.md) | 🔢 | .mdx | ✔ | ❌ | +| [SQL](./doc/p/sql.md) | 📝 | .sql | ❌ | ✔ | +| [StarDict](./doc/p/stardict.md) | 📁 | (📝.ifo) | ✔ | ✔ | +| [StarDict Textual File](./doc/p/stardict_textual.md) | 📝 | (.xml) | ✔ | ✔ | +| [Tabfile](./doc/p/tabfile.md) | 📝 | .txt, .tab | ✔ | ✔ | +| [Wordset.org](./doc/p/wordset.md) | 📁 | | ✔ | | +| [XDXF](./doc/p/xdxf.md) | 📝 | .xdxf | ✔ | ❌ | +| [Yomichan](./doc/p/yomichan.md) | 📦 | (.zip) | | ✔ | +| [Zim (Kiwix)](./doc/p/zim.md) | 🔢 | .zim | ✔ | | + +Legend: + +- 📁 Directory +- 📝 Text file +- 📦 Package/archive file +- 🔢 Binary file +- ✔ Supported +- ❌ Will not be supported + +**Note**: SQLite-based formats are not detected by extension (`.db`); +So you need to select the format (with UI or `--read-format` flag). +**Also don't confuse SQLite-based formats with [SQLite mode](#sqlite-mode).** + +## Requirements + +PyGlossary requires **Python 3.9 or higher**, and works in practically all +modern operating systems. While primarily designed for *GNU/Linux*, it works +on *Windows*, *Mac OS X* and other Unix-based operating systems as well. + +As shown in the screenshots, there are multiple User Interface types (multiple +ways to use the program). + +- **Gtk3-based interface**, uses [PyGI (Python Gobject Introspection)](http://pygobject.readthedocs.io/en/latest/getting_started.html) + You can install it on: + + - Debian/Ubuntu: `apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0` + - openSUSE: `zypper install python3-gobject gtk3` + - Fedora: `dnf install pygobject3 python3-gobject gtk3` + - ArchLinux: + - `pacman -S python-gobject gtk3` + - https://aur.archlinux.org/packages/pyglossary/ + - Mac OS X: `brew install pygobject3 gtk+3` + - Nix / NixOS: `nix-shell -p pkgs.gobject-introspection python38Packages.pygobject3 python38Packages.pycairo` + +- **Tkinter-based interface**, works in the lack of Gtk. Specially on + Windows where Tkinter library is installed with the Python itself. + You can also install it on: + + - Debian/Ubuntu: `apt-get install python3-tk tix` + - openSUSE: `zypper install python3-tk tix` + - Fedora: `yum install python3-tkinter tix` + - Mac OS X: read <https://www.python.org/download/mac/tcltk/> + - Nix / NixOS: `nix-shell -p python38Packages.tkinter tix` + +- **Command-line interface**, works in all operating systems without + any specific requirements, just type: + + `python3 main.py --help` + + - **Interactive command-line interface** + - Requires: `pip install prompt_toolkit` + - Perfect for mobile devices (like Termux on Android) where no GUI is available + - Automatically selected if output file argument is not passed **and** one of these: + - On Linux and `$DISPLAY` environment variable is empty or not set + - For example when you are using a remote Linux machine over SSH + - On Mac and no `tkinter` module is found + - Manually select with `--cmd` or `--ui=cmd` + - Minimally: `python3 main.py --cmd` + - You can still pass input file, or any flag/option + - If both input and output files are passed, non-interactive cmd ui will be default + - If you are writing a script, you can pass `--no-interactive` to force disable interactive ui + - Then you have to pass both input and output file arguments + - Don't forget to use *Up/Down* or *Tab* keys in prompts! + - Up/Down key shows you recent values you have used + - Tab key shows available values/options + - You can press Control+C (on Linux/Windows) at any prompt to exit + +## UI (User Interface) selection + +When you run PyGlossary without any command-line arguments or options/flags, +PyGlossary tries to find PyGI and open the Gtk3-based interface. If it fails, +it tries to find Tkinter and open the Tkinter-based interface. If that fails, +it tries to find `prompt_toolkit` and run interactive command-line interface. +And if none of these libraries are found, it exits with an error. + +But you can explicitly determine the user interface type using `--ui` + +- `python3 main.py --ui=gtk` +- `python3 main.py --ui=tk` +- `python3 main.py --ui=cmd` + +## Installation on Windows + +- [Download and install Python](https://www.python.org/downloads/windows/) (3.9 or above) +- Open Start -> type Command -> right-click on Command Prompt -> Run as administrator +- To ensure you have `pip`, run: `python -m ensurepip --upgrade` +- To install, run: `pip install --upgrade pyglossary` +- Now you should be able to run `pyglossary` command +- If command was not found, make sure Python environment variables are set up: + <img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/windows-python39-env-vars.png" width="50%" height="50%"/> + +## Feature-specific requirements + +- Using [Sort by Locale](#sorting) feature requires [PyICU](./doc/pyicu.md) + +- Using `--remove-html-all` flag requires: + + `pip install lxml beautifulsoup4` + +Some formats have additional requirements. +If you have trouble with any format, please check the [link given for that format](#supported-formats) to see its documentations. + +**Using Termux on Android?** See [doc/termux.md](./doc/termux.md) + +## Configuration + +See [doc/config.rst](./doc/config.rst). + +## Direct and indirect modes + +Indirect mode means the input glossary is completely read and loaded into RAM, then converted +into the output format. This was the only method available in old versions (before [3.0.0](https://github.com/ilius/pyglossary/releases/tag/3.0.0)). + +Direct mode means entries are one-at-a-time read, processed and written into output glossary. + +Direct mode was added to limit the memory usage for large glossaries; But it may reduce the +conversion time for most cases as well. + +Converting glossaries into these formats requires [sorting](#sorting) entries: + +- [StarDict](./doc/p/stardict.md) +- [EPUB-2](./doc/p/epub2.md) +- [Mobipocket E-Book](./doc/p/mobi.md) + +That's why direct mode will not work for these formats, and PyGlossary has to +switch to indirect mode (or it previously had to, see [SQLite mode](#sqlite-mode)). + +For other formats, direct mode will be the default. You may override this by `--indirect` flag. + +## SQLite mode + +As mentioned above, converting glossaries to some specific formats will +need them to loaded into RAM. + +This can be problematic if the glossary is too big to fit into RAM. That's when +you should try adding `--sqlite` flag to your command. Then it uses SQLite3 as intermediate +storage for storing, sorting and then fetching entries. This fixes the memory issue, and may +even reduce running time of conversion (depending on your home directory storage). + +The temporary SQLite file is stored in [cache directory](#cache-directory) then +deleted after conversion (unless you pass `--no-cleanup` flag). + +SQLite mode is automatically enabled for writing these formats if `auto_sqlite` +[config parameter](./doc/config.rst) is `true` (which is the default). +This also applies to when you pass `--sort` flag for any format. +You may use `--no-sqlite` to override this and switch to indirect mode. + +Currently you can not disable alternates in SQLite mode (`--no-alts` is ignored). + +## Sorting + +There are two things than can activate sorting entries: + +- Output format requires sorting (as explained [above](#direct-and-indirect-modes)) +- You pass `--sort` flag in command line. + +In the case of passing `--sort`, you can also pass: + +- `--sort-key` to select sort key aka sorting order (including locale), see [doc/sort-key.md](./doc/sort-key.md) + +- `--sort-encoding` to change the encoding used for sort + + - UTF-8 is the default encoding for all sort keys and all output formats (unless mentioned otherwise) + - This will only effect the order of entries, and will not corrupt words / definition + - Non-encodable characters are replaced with `?` byte (*only for sorting*) + - Conflicts with `--sort-locale` + +## Cache directory + +Cache directory is used for storing temporary files which are either moved or deleted +after conversion. You can pass `--no-cleanup` flag in order to keep them. + +The path for cache directory: + +- Linux or BSD: `~/.cache/pyglossary/` +- Mac: `~/Library/Caches/PyGlossary/` +- Windows: `C:\Users\USERNAME\AppData\Local\PyGlossary\Cache\` + +## User plugins + +If you want to add your own plugin without adding it to source code directory, +or you want to use a plugin that has been removed from repository, +you can place it in this directory: + +- Linux or BSD: `~/.pyglossary/plugins/` +- Mac: `~/Library/Preferences/PyGlossary/plugins/` +- Windows: `C:\Users\USERNAME\AppData\Roaming\PyGlossary\plugins\` + +## Using PyGlossary as a Python library + +There are a few examples in [doc/lib-examples](./doc/lib-examples) directory. + +Here is a basic script that converts any supported glossary format to [Tabfile](./doc/p/tabfile.md): + +```python +import sys +from pyglossary import Glossary + +# Glossary.init() should be called only once, so make sure you put it +# in the right place +Glossary.init() + +glos = Glossary() +glos.convert( + inputFilename=sys.argv[1], + outputFilename=f"{sys.argv[1]}.txt", + # although it can detect format for *.txt, you can still pass outputFormat + outputFormat="Tabfile", + # you can pass readOptions or writeOptions as a dict + # writeOptions={"encoding": "utf-8"}, +) +``` + +And if you choose to use `glossary_v2`: + +```python +import sys +from pyglossary.glossary_v2 import ConvertArgs, Glossary + +# Glossary.init() should be called only once, so make sure you put it +# in the right place +Glossary.init() + +glos = Glossary() +glos.convert(ConvertArgs( + inputFilename=sys.argv[1], + outputFilename=f"{sys.argv[1]}.txt", + # although it can detect format for *.txt, you can still pass outputFormat + outputFormat="Tabfile", + # you can pass readOptions or writeOptions as a dict + # writeOptions={"encoding": "utf-8"}, +)) +``` + +You may look at docstring of `Glossary.convert` for full list of keyword arguments. + + +If you need to add entries inside your Python program (rather than converting one glossary into another), then you use `write` instead of `convert`, here is an example: + +```python +from pyglossary import Glossary + +Glossary.init() + +glos = Glossary() +mydict = { + "a": "test1", + "b": "test2", + "c": "test3", +} +for word, defi in mydict.items(): + glos.addEntryObj(glos.newEntry( + word, + defi, + defiFormat="m", # "m" for plain text, "h" for HTML + )) + +glos.setInfo("title", "My Test StarDict") +glos.setInfo("author", "John Doe") +glos.write("test.ifo", format="Stardict") +``` + +**Note:** `addEntryObj` is renamed to `addEntry` in `pyglossary.glossary_v2`. + +**Note:** Switching to `glossary_v2` is optional and recommended. + + +And if you need to read a glossary from file into a `Glossary` object in RAM (without immediately converting it), you can use `glos.read(filename, format=inputFormat)`. Be wary of RAM usage in this case. + +If you want to include images, css, js or other files in a glossary that you are creating, you need to add them as **Data Entries**, for example: + +```python +with open(os.path.join(imageDir, "a.jpeg")) as fp: + glos.addEntry(glos.newDataEntry("img/a.jpeg", fp.read())) +``` + +The first argument to `newDataEntry` must be the relative path (that generally html codes of your definitions points to). + +## Internal glossary structure + +A glossary contains a number of entries. + +Each entry contains: + +- Headword (title or main phrase for lookup) +- Alternates (some alternative phrases for lookup) +- Definition + +In PyGlossary, headword and alternates together are accessible as a single Python list `entry.l_word` + +`entry.defi` is the definition as a Python Unicode `str`. Also `entry.b_defi` is definition in UTF-8 byte array. + +`entry.defiFormat` is definition format. If definition is plaintext (not rich text), the value is `m`. And if it's in HTML (contains any html tag), then `defiFormat` is `h`. The value `x` is also allowed for XFXF, but XDXF is not widely supported in dictionary applications. + +There is another type of entry which is called **Data Entry**, and generally contains an image, audio, css, or any other file that was included in input glossary. For data entries: + +- `entry.s_word` is file name (and `l_word` is still a list containing this string), +- `entry.defiFormat` is `b` +- `entry.data` gives the content of file in `bytes`. + +## Entry filters + +Entry filters are internal objects that modify words/definition of entries, +or remove entries (in some special cases). + +Like several filters in a pipe which connects a `reader` object to a `writer` object +(with both of their classes defined in plugins and instantiated in `Glossary` class). + +You can enable/disable some of these filters using config parameters / command like flags, which +are documented in [doc/config.rst](./doc/config.rst). + +The full list of entry filters is also documented in [doc/entry-filters.md](./doc/entry-filters.md). + + +%package -n python3-pyglossary +Summary: A tool for converting dictionary files aka glossaries. +Provides: python-pyglossary +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-pyglossary +# PyGlossary + +A tool for converting dictionary files aka glossaries. + +The primary purpose is to be able to use our offline glossaries in any Open +Source dictionary we like on any OS/device. + +There are countless formats, and my time is limited, so I implement formats that +seem more useful for myself, or for Open Source community. Also diversity of +languages is taken into account. Pull requests are welcome. + +## Screenshots + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/44-gtk-txt-stardict-aryanpur-dark.png" width="50%" height="50%"/> + +Linux - Gtk3-based interface + +______________________________________________________________________ + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/40b-tk-bgl-epub-es-en-2.png" width="50%" height="50%"/> + +Windows - Tkinter-based interface + +______________________________________________________________________ + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/32-cmd-freedict-mids-de-ru.png" width="50%" height="50%"/> + +Linux - command-line interface + +______________________________________________________________________ + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/40-cmdi-termux-zim-slob-en-med.jpg" width="50%" height="50%"/> + +Android Termux - interactive command-line interface + +## Supported formats + +| Format | | Extension | Read | Write | +| ------------------------------------------------------- | :-: | :-------------: | :--: | :---: | +| [Aard 2 (slob)](./doc/p/aard2_slob.md) | 🔢 | .slob | ✔ | ✔ | +| [ABBYY Lingvo DSL](./doc/p/dsl.md) | 📝 | .dsl | ✔ | | +| [Almaany.com](./doc/p/almaany.md) (SQLite3, Arabic) | 🔢 | .db | ✔ | | +| [AppleDict Binary](./doc/p/appledict_bin.md) | 🔢 | .dictionary | ✔ | ❌ | +| [AppleDict Source](./doc/p/appledict.md) | 📁 | | | ✔ | +| [Babylon BGL](./doc/p/babylon_bgl.md) | 🔢 | .bgl | ✔ | ❌ | +| [CC-CEDICT](./doc/p/cc_cedict.md) (Chinese) | 📝 | | ✔ | ❌ | +| [cc-kedict](./doc/p/cc_kedict.md) (Korean) | 📝 | | ✔ | ❌ | +| [CSV](./doc/p/csv.md) | 📝 | .csv | ✔ | ✔ | +| [Dict.cc](./doc/p/dict_cc.md) (SQLite3, German) | 🔢 | .db | ✔ | | +| [DICT.org / Dictd server](./doc/p/dict_org.md) | 📁 | (📝.index) | ✔ | ✔ | +| [DICT.org / dictfmt source](./doc/p/dict_org_source.md) | 📝 | (.dtxt) | | ✔ | +| [dictunformat output file](./doc/p/dictunformat.md) | 📝 | (.dictunformat) | ✔ | | +| [DictionaryForMIDs](./doc/p/dicformids.md) | 📁 | (📁.mids) | ✔ | ✔ | +| [DigitalNK](./doc/p/digitalnk.md) (SQLite3, N-Korean) | 🔢 | .db | ✔ | | +| [DIKT JSON](./doc/p/dikt_json.md) | 📝 | (.json) | | ✔ | +| [EDLIN](./doc/p/edlin.md) | 📁 | .edlin | ✔ | ✔ | +| [EPUB-2 E-Book](./doc/p/epub2.md) | 📦 | .epub | ❌ | ✔ | +| [FreeDict](./doc/p/freedict.md) | 📝 | .tei | ✔ | ❌ | +| [Gettext Source](./doc/p/gettext_po.md) | 📝 | .po | ✔ | ✔ | +| [HTML Directory (by file size)](./doc/p/html_dir.md) | 📁 | | ❌ | ✔ | +| [JMDict](./doc/p/jmdict.md) (Japanese) | 📝 | | ✔ | ❌ | +| [JSON](./doc/p/json.md) | 📝 | .json | | ✔ | +| [Kobo E-Reader Dictionary](./doc/p/kobo.md) | 📦 | .kobo.zip | ❌ | ✔ | +| [Kobo E-Reader Dictfile](./doc/p/kobo_dictfile.md) | 📝 | .df | ✔ | ✔ | +| [Lingoes Source](./doc/p/lingoes_ldf.md) | 📝 | .ldf | ✔ | ✔ | +| [Mobipocket E-Book](./doc/p/mobi.md) | 🔢 | .mobi | ❌ | ✔ | +| [Octopus MDict](./doc/p/octopus_mdict.md) | 🔢 | .mdx | ✔ | ❌ | +| [SQL](./doc/p/sql.md) | 📝 | .sql | ❌ | ✔ | +| [StarDict](./doc/p/stardict.md) | 📁 | (📝.ifo) | ✔ | ✔ | +| [StarDict Textual File](./doc/p/stardict_textual.md) | 📝 | (.xml) | ✔ | ✔ | +| [Tabfile](./doc/p/tabfile.md) | 📝 | .txt, .tab | ✔ | ✔ | +| [Wordset.org](./doc/p/wordset.md) | 📁 | | ✔ | | +| [XDXF](./doc/p/xdxf.md) | 📝 | .xdxf | ✔ | ❌ | +| [Yomichan](./doc/p/yomichan.md) | 📦 | (.zip) | | ✔ | +| [Zim (Kiwix)](./doc/p/zim.md) | 🔢 | .zim | ✔ | | + +Legend: + +- 📁 Directory +- 📝 Text file +- 📦 Package/archive file +- 🔢 Binary file +- ✔ Supported +- ❌ Will not be supported + +**Note**: SQLite-based formats are not detected by extension (`.db`); +So you need to select the format (with UI or `--read-format` flag). +**Also don't confuse SQLite-based formats with [SQLite mode](#sqlite-mode).** + +## Requirements + +PyGlossary requires **Python 3.9 or higher**, and works in practically all +modern operating systems. While primarily designed for *GNU/Linux*, it works +on *Windows*, *Mac OS X* and other Unix-based operating systems as well. + +As shown in the screenshots, there are multiple User Interface types (multiple +ways to use the program). + +- **Gtk3-based interface**, uses [PyGI (Python Gobject Introspection)](http://pygobject.readthedocs.io/en/latest/getting_started.html) + You can install it on: + + - Debian/Ubuntu: `apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0` + - openSUSE: `zypper install python3-gobject gtk3` + - Fedora: `dnf install pygobject3 python3-gobject gtk3` + - ArchLinux: + - `pacman -S python-gobject gtk3` + - https://aur.archlinux.org/packages/pyglossary/ + - Mac OS X: `brew install pygobject3 gtk+3` + - Nix / NixOS: `nix-shell -p pkgs.gobject-introspection python38Packages.pygobject3 python38Packages.pycairo` + +- **Tkinter-based interface**, works in the lack of Gtk. Specially on + Windows where Tkinter library is installed with the Python itself. + You can also install it on: + + - Debian/Ubuntu: `apt-get install python3-tk tix` + - openSUSE: `zypper install python3-tk tix` + - Fedora: `yum install python3-tkinter tix` + - Mac OS X: read <https://www.python.org/download/mac/tcltk/> + - Nix / NixOS: `nix-shell -p python38Packages.tkinter tix` + +- **Command-line interface**, works in all operating systems without + any specific requirements, just type: + + `python3 main.py --help` + + - **Interactive command-line interface** + - Requires: `pip install prompt_toolkit` + - Perfect for mobile devices (like Termux on Android) where no GUI is available + - Automatically selected if output file argument is not passed **and** one of these: + - On Linux and `$DISPLAY` environment variable is empty or not set + - For example when you are using a remote Linux machine over SSH + - On Mac and no `tkinter` module is found + - Manually select with `--cmd` or `--ui=cmd` + - Minimally: `python3 main.py --cmd` + - You can still pass input file, or any flag/option + - If both input and output files are passed, non-interactive cmd ui will be default + - If you are writing a script, you can pass `--no-interactive` to force disable interactive ui + - Then you have to pass both input and output file arguments + - Don't forget to use *Up/Down* or *Tab* keys in prompts! + - Up/Down key shows you recent values you have used + - Tab key shows available values/options + - You can press Control+C (on Linux/Windows) at any prompt to exit + +## UI (User Interface) selection + +When you run PyGlossary without any command-line arguments or options/flags, +PyGlossary tries to find PyGI and open the Gtk3-based interface. If it fails, +it tries to find Tkinter and open the Tkinter-based interface. If that fails, +it tries to find `prompt_toolkit` and run interactive command-line interface. +And if none of these libraries are found, it exits with an error. + +But you can explicitly determine the user interface type using `--ui` + +- `python3 main.py --ui=gtk` +- `python3 main.py --ui=tk` +- `python3 main.py --ui=cmd` + +## Installation on Windows + +- [Download and install Python](https://www.python.org/downloads/windows/) (3.9 or above) +- Open Start -> type Command -> right-click on Command Prompt -> Run as administrator +- To ensure you have `pip`, run: `python -m ensurepip --upgrade` +- To install, run: `pip install --upgrade pyglossary` +- Now you should be able to run `pyglossary` command +- If command was not found, make sure Python environment variables are set up: + <img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/windows-python39-env-vars.png" width="50%" height="50%"/> + +## Feature-specific requirements + +- Using [Sort by Locale](#sorting) feature requires [PyICU](./doc/pyicu.md) + +- Using `--remove-html-all` flag requires: + + `pip install lxml beautifulsoup4` + +Some formats have additional requirements. +If you have trouble with any format, please check the [link given for that format](#supported-formats) to see its documentations. + +**Using Termux on Android?** See [doc/termux.md](./doc/termux.md) + +## Configuration + +See [doc/config.rst](./doc/config.rst). + +## Direct and indirect modes + +Indirect mode means the input glossary is completely read and loaded into RAM, then converted +into the output format. This was the only method available in old versions (before [3.0.0](https://github.com/ilius/pyglossary/releases/tag/3.0.0)). + +Direct mode means entries are one-at-a-time read, processed and written into output glossary. + +Direct mode was added to limit the memory usage for large glossaries; But it may reduce the +conversion time for most cases as well. + +Converting glossaries into these formats requires [sorting](#sorting) entries: + +- [StarDict](./doc/p/stardict.md) +- [EPUB-2](./doc/p/epub2.md) +- [Mobipocket E-Book](./doc/p/mobi.md) + +That's why direct mode will not work for these formats, and PyGlossary has to +switch to indirect mode (or it previously had to, see [SQLite mode](#sqlite-mode)). + +For other formats, direct mode will be the default. You may override this by `--indirect` flag. + +## SQLite mode + +As mentioned above, converting glossaries to some specific formats will +need them to loaded into RAM. + +This can be problematic if the glossary is too big to fit into RAM. That's when +you should try adding `--sqlite` flag to your command. Then it uses SQLite3 as intermediate +storage for storing, sorting and then fetching entries. This fixes the memory issue, and may +even reduce running time of conversion (depending on your home directory storage). + +The temporary SQLite file is stored in [cache directory](#cache-directory) then +deleted after conversion (unless you pass `--no-cleanup` flag). + +SQLite mode is automatically enabled for writing these formats if `auto_sqlite` +[config parameter](./doc/config.rst) is `true` (which is the default). +This also applies to when you pass `--sort` flag for any format. +You may use `--no-sqlite` to override this and switch to indirect mode. + +Currently you can not disable alternates in SQLite mode (`--no-alts` is ignored). + +## Sorting + +There are two things than can activate sorting entries: + +- Output format requires sorting (as explained [above](#direct-and-indirect-modes)) +- You pass `--sort` flag in command line. + +In the case of passing `--sort`, you can also pass: + +- `--sort-key` to select sort key aka sorting order (including locale), see [doc/sort-key.md](./doc/sort-key.md) + +- `--sort-encoding` to change the encoding used for sort + + - UTF-8 is the default encoding for all sort keys and all output formats (unless mentioned otherwise) + - This will only effect the order of entries, and will not corrupt words / definition + - Non-encodable characters are replaced with `?` byte (*only for sorting*) + - Conflicts with `--sort-locale` + +## Cache directory + +Cache directory is used for storing temporary files which are either moved or deleted +after conversion. You can pass `--no-cleanup` flag in order to keep them. + +The path for cache directory: + +- Linux or BSD: `~/.cache/pyglossary/` +- Mac: `~/Library/Caches/PyGlossary/` +- Windows: `C:\Users\USERNAME\AppData\Local\PyGlossary\Cache\` + +## User plugins + +If you want to add your own plugin without adding it to source code directory, +or you want to use a plugin that has been removed from repository, +you can place it in this directory: + +- Linux or BSD: `~/.pyglossary/plugins/` +- Mac: `~/Library/Preferences/PyGlossary/plugins/` +- Windows: `C:\Users\USERNAME\AppData\Roaming\PyGlossary\plugins\` + +## Using PyGlossary as a Python library + +There are a few examples in [doc/lib-examples](./doc/lib-examples) directory. + +Here is a basic script that converts any supported glossary format to [Tabfile](./doc/p/tabfile.md): + +```python +import sys +from pyglossary import Glossary + +# Glossary.init() should be called only once, so make sure you put it +# in the right place +Glossary.init() + +glos = Glossary() +glos.convert( + inputFilename=sys.argv[1], + outputFilename=f"{sys.argv[1]}.txt", + # although it can detect format for *.txt, you can still pass outputFormat + outputFormat="Tabfile", + # you can pass readOptions or writeOptions as a dict + # writeOptions={"encoding": "utf-8"}, +) +``` + +And if you choose to use `glossary_v2`: + +```python +import sys +from pyglossary.glossary_v2 import ConvertArgs, Glossary + +# Glossary.init() should be called only once, so make sure you put it +# in the right place +Glossary.init() + +glos = Glossary() +glos.convert(ConvertArgs( + inputFilename=sys.argv[1], + outputFilename=f"{sys.argv[1]}.txt", + # although it can detect format for *.txt, you can still pass outputFormat + outputFormat="Tabfile", + # you can pass readOptions or writeOptions as a dict + # writeOptions={"encoding": "utf-8"}, +)) +``` + +You may look at docstring of `Glossary.convert` for full list of keyword arguments. + + +If you need to add entries inside your Python program (rather than converting one glossary into another), then you use `write` instead of `convert`, here is an example: + +```python +from pyglossary import Glossary + +Glossary.init() + +glos = Glossary() +mydict = { + "a": "test1", + "b": "test2", + "c": "test3", +} +for word, defi in mydict.items(): + glos.addEntryObj(glos.newEntry( + word, + defi, + defiFormat="m", # "m" for plain text, "h" for HTML + )) + +glos.setInfo("title", "My Test StarDict") +glos.setInfo("author", "John Doe") +glos.write("test.ifo", format="Stardict") +``` + +**Note:** `addEntryObj` is renamed to `addEntry` in `pyglossary.glossary_v2`. + +**Note:** Switching to `glossary_v2` is optional and recommended. + + +And if you need to read a glossary from file into a `Glossary` object in RAM (without immediately converting it), you can use `glos.read(filename, format=inputFormat)`. Be wary of RAM usage in this case. + +If you want to include images, css, js or other files in a glossary that you are creating, you need to add them as **Data Entries**, for example: + +```python +with open(os.path.join(imageDir, "a.jpeg")) as fp: + glos.addEntry(glos.newDataEntry("img/a.jpeg", fp.read())) +``` + +The first argument to `newDataEntry` must be the relative path (that generally html codes of your definitions points to). + +## Internal glossary structure + +A glossary contains a number of entries. + +Each entry contains: + +- Headword (title or main phrase for lookup) +- Alternates (some alternative phrases for lookup) +- Definition + +In PyGlossary, headword and alternates together are accessible as a single Python list `entry.l_word` + +`entry.defi` is the definition as a Python Unicode `str`. Also `entry.b_defi` is definition in UTF-8 byte array. + +`entry.defiFormat` is definition format. If definition is plaintext (not rich text), the value is `m`. And if it's in HTML (contains any html tag), then `defiFormat` is `h`. The value `x` is also allowed for XFXF, but XDXF is not widely supported in dictionary applications. + +There is another type of entry which is called **Data Entry**, and generally contains an image, audio, css, or any other file that was included in input glossary. For data entries: + +- `entry.s_word` is file name (and `l_word` is still a list containing this string), +- `entry.defiFormat` is `b` +- `entry.data` gives the content of file in `bytes`. + +## Entry filters + +Entry filters are internal objects that modify words/definition of entries, +or remove entries (in some special cases). + +Like several filters in a pipe which connects a `reader` object to a `writer` object +(with both of their classes defined in plugins and instantiated in `Glossary` class). + +You can enable/disable some of these filters using config parameters / command like flags, which +are documented in [doc/config.rst](./doc/config.rst). + +The full list of entry filters is also documented in [doc/entry-filters.md](./doc/entry-filters.md). + + +%package help +Summary: Development documents and examples for pyglossary +Provides: python3-pyglossary-doc +%description help +# PyGlossary + +A tool for converting dictionary files aka glossaries. + +The primary purpose is to be able to use our offline glossaries in any Open +Source dictionary we like on any OS/device. + +There are countless formats, and my time is limited, so I implement formats that +seem more useful for myself, or for Open Source community. Also diversity of +languages is taken into account. Pull requests are welcome. + +## Screenshots + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/44-gtk-txt-stardict-aryanpur-dark.png" width="50%" height="50%"/> + +Linux - Gtk3-based interface + +______________________________________________________________________ + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/40b-tk-bgl-epub-es-en-2.png" width="50%" height="50%"/> + +Windows - Tkinter-based interface + +______________________________________________________________________ + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/32-cmd-freedict-mids-de-ru.png" width="50%" height="50%"/> + +Linux - command-line interface + +______________________________________________________________________ + +<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/40-cmdi-termux-zim-slob-en-med.jpg" width="50%" height="50%"/> + +Android Termux - interactive command-line interface + +## Supported formats + +| Format | | Extension | Read | Write | +| ------------------------------------------------------- | :-: | :-------------: | :--: | :---: | +| [Aard 2 (slob)](./doc/p/aard2_slob.md) | 🔢 | .slob | ✔ | ✔ | +| [ABBYY Lingvo DSL](./doc/p/dsl.md) | 📝 | .dsl | ✔ | | +| [Almaany.com](./doc/p/almaany.md) (SQLite3, Arabic) | 🔢 | .db | ✔ | | +| [AppleDict Binary](./doc/p/appledict_bin.md) | 🔢 | .dictionary | ✔ | ❌ | +| [AppleDict Source](./doc/p/appledict.md) | 📁 | | | ✔ | +| [Babylon BGL](./doc/p/babylon_bgl.md) | 🔢 | .bgl | ✔ | ❌ | +| [CC-CEDICT](./doc/p/cc_cedict.md) (Chinese) | 📝 | | ✔ | ❌ | +| [cc-kedict](./doc/p/cc_kedict.md) (Korean) | 📝 | | ✔ | ❌ | +| [CSV](./doc/p/csv.md) | 📝 | .csv | ✔ | ✔ | +| [Dict.cc](./doc/p/dict_cc.md) (SQLite3, German) | 🔢 | .db | ✔ | | +| [DICT.org / Dictd server](./doc/p/dict_org.md) | 📁 | (📝.index) | ✔ | ✔ | +| [DICT.org / dictfmt source](./doc/p/dict_org_source.md) | 📝 | (.dtxt) | | ✔ | +| [dictunformat output file](./doc/p/dictunformat.md) | 📝 | (.dictunformat) | ✔ | | +| [DictionaryForMIDs](./doc/p/dicformids.md) | 📁 | (📁.mids) | ✔ | ✔ | +| [DigitalNK](./doc/p/digitalnk.md) (SQLite3, N-Korean) | 🔢 | .db | ✔ | | +| [DIKT JSON](./doc/p/dikt_json.md) | 📝 | (.json) | | ✔ | +| [EDLIN](./doc/p/edlin.md) | 📁 | .edlin | ✔ | ✔ | +| [EPUB-2 E-Book](./doc/p/epub2.md) | 📦 | .epub | ❌ | ✔ | +| [FreeDict](./doc/p/freedict.md) | 📝 | .tei | ✔ | ❌ | +| [Gettext Source](./doc/p/gettext_po.md) | 📝 | .po | ✔ | ✔ | +| [HTML Directory (by file size)](./doc/p/html_dir.md) | 📁 | | ❌ | ✔ | +| [JMDict](./doc/p/jmdict.md) (Japanese) | 📝 | | ✔ | ❌ | +| [JSON](./doc/p/json.md) | 📝 | .json | | ✔ | +| [Kobo E-Reader Dictionary](./doc/p/kobo.md) | 📦 | .kobo.zip | ❌ | ✔ | +| [Kobo E-Reader Dictfile](./doc/p/kobo_dictfile.md) | 📝 | .df | ✔ | ✔ | +| [Lingoes Source](./doc/p/lingoes_ldf.md) | 📝 | .ldf | ✔ | ✔ | +| [Mobipocket E-Book](./doc/p/mobi.md) | 🔢 | .mobi | ❌ | ✔ | +| [Octopus MDict](./doc/p/octopus_mdict.md) | 🔢 | .mdx | ✔ | ❌ | +| [SQL](./doc/p/sql.md) | 📝 | .sql | ❌ | ✔ | +| [StarDict](./doc/p/stardict.md) | 📁 | (📝.ifo) | ✔ | ✔ | +| [StarDict Textual File](./doc/p/stardict_textual.md) | 📝 | (.xml) | ✔ | ✔ | +| [Tabfile](./doc/p/tabfile.md) | 📝 | .txt, .tab | ✔ | ✔ | +| [Wordset.org](./doc/p/wordset.md) | 📁 | | ✔ | | +| [XDXF](./doc/p/xdxf.md) | 📝 | .xdxf | ✔ | ❌ | +| [Yomichan](./doc/p/yomichan.md) | 📦 | (.zip) | | ✔ | +| [Zim (Kiwix)](./doc/p/zim.md) | 🔢 | .zim | ✔ | | + +Legend: + +- 📁 Directory +- 📝 Text file +- 📦 Package/archive file +- 🔢 Binary file +- ✔ Supported +- ❌ Will not be supported + +**Note**: SQLite-based formats are not detected by extension (`.db`); +So you need to select the format (with UI or `--read-format` flag). +**Also don't confuse SQLite-based formats with [SQLite mode](#sqlite-mode).** + +## Requirements + +PyGlossary requires **Python 3.9 or higher**, and works in practically all +modern operating systems. While primarily designed for *GNU/Linux*, it works +on *Windows*, *Mac OS X* and other Unix-based operating systems as well. + +As shown in the screenshots, there are multiple User Interface types (multiple +ways to use the program). + +- **Gtk3-based interface**, uses [PyGI (Python Gobject Introspection)](http://pygobject.readthedocs.io/en/latest/getting_started.html) + You can install it on: + + - Debian/Ubuntu: `apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0` + - openSUSE: `zypper install python3-gobject gtk3` + - Fedora: `dnf install pygobject3 python3-gobject gtk3` + - ArchLinux: + - `pacman -S python-gobject gtk3` + - https://aur.archlinux.org/packages/pyglossary/ + - Mac OS X: `brew install pygobject3 gtk+3` + - Nix / NixOS: `nix-shell -p pkgs.gobject-introspection python38Packages.pygobject3 python38Packages.pycairo` + +- **Tkinter-based interface**, works in the lack of Gtk. Specially on + Windows where Tkinter library is installed with the Python itself. + You can also install it on: + + - Debian/Ubuntu: `apt-get install python3-tk tix` + - openSUSE: `zypper install python3-tk tix` + - Fedora: `yum install python3-tkinter tix` + - Mac OS X: read <https://www.python.org/download/mac/tcltk/> + - Nix / NixOS: `nix-shell -p python38Packages.tkinter tix` + +- **Command-line interface**, works in all operating systems without + any specific requirements, just type: + + `python3 main.py --help` + + - **Interactive command-line interface** + - Requires: `pip install prompt_toolkit` + - Perfect for mobile devices (like Termux on Android) where no GUI is available + - Automatically selected if output file argument is not passed **and** one of these: + - On Linux and `$DISPLAY` environment variable is empty or not set + - For example when you are using a remote Linux machine over SSH + - On Mac and no `tkinter` module is found + - Manually select with `--cmd` or `--ui=cmd` + - Minimally: `python3 main.py --cmd` + - You can still pass input file, or any flag/option + - If both input and output files are passed, non-interactive cmd ui will be default + - If you are writing a script, you can pass `--no-interactive` to force disable interactive ui + - Then you have to pass both input and output file arguments + - Don't forget to use *Up/Down* or *Tab* keys in prompts! + - Up/Down key shows you recent values you have used + - Tab key shows available values/options + - You can press Control+C (on Linux/Windows) at any prompt to exit + +## UI (User Interface) selection + +When you run PyGlossary without any command-line arguments or options/flags, +PyGlossary tries to find PyGI and open the Gtk3-based interface. If it fails, +it tries to find Tkinter and open the Tkinter-based interface. If that fails, +it tries to find `prompt_toolkit` and run interactive command-line interface. +And if none of these libraries are found, it exits with an error. + +But you can explicitly determine the user interface type using `--ui` + +- `python3 main.py --ui=gtk` +- `python3 main.py --ui=tk` +- `python3 main.py --ui=cmd` + +## Installation on Windows + +- [Download and install Python](https://www.python.org/downloads/windows/) (3.9 or above) +- Open Start -> type Command -> right-click on Command Prompt -> Run as administrator +- To ensure you have `pip`, run: `python -m ensurepip --upgrade` +- To install, run: `pip install --upgrade pyglossary` +- Now you should be able to run `pyglossary` command +- If command was not found, make sure Python environment variables are set up: + <img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/windows-python39-env-vars.png" width="50%" height="50%"/> + +## Feature-specific requirements + +- Using [Sort by Locale](#sorting) feature requires [PyICU](./doc/pyicu.md) + +- Using `--remove-html-all` flag requires: + + `pip install lxml beautifulsoup4` + +Some formats have additional requirements. +If you have trouble with any format, please check the [link given for that format](#supported-formats) to see its documentations. + +**Using Termux on Android?** See [doc/termux.md](./doc/termux.md) + +## Configuration + +See [doc/config.rst](./doc/config.rst). + +## Direct and indirect modes + +Indirect mode means the input glossary is completely read and loaded into RAM, then converted +into the output format. This was the only method available in old versions (before [3.0.0](https://github.com/ilius/pyglossary/releases/tag/3.0.0)). + +Direct mode means entries are one-at-a-time read, processed and written into output glossary. + +Direct mode was added to limit the memory usage for large glossaries; But it may reduce the +conversion time for most cases as well. + +Converting glossaries into these formats requires [sorting](#sorting) entries: + +- [StarDict](./doc/p/stardict.md) +- [EPUB-2](./doc/p/epub2.md) +- [Mobipocket E-Book](./doc/p/mobi.md) + +That's why direct mode will not work for these formats, and PyGlossary has to +switch to indirect mode (or it previously had to, see [SQLite mode](#sqlite-mode)). + +For other formats, direct mode will be the default. You may override this by `--indirect` flag. + +## SQLite mode + +As mentioned above, converting glossaries to some specific formats will +need them to loaded into RAM. + +This can be problematic if the glossary is too big to fit into RAM. That's when +you should try adding `--sqlite` flag to your command. Then it uses SQLite3 as intermediate +storage for storing, sorting and then fetching entries. This fixes the memory issue, and may +even reduce running time of conversion (depending on your home directory storage). + +The temporary SQLite file is stored in [cache directory](#cache-directory) then +deleted after conversion (unless you pass `--no-cleanup` flag). + +SQLite mode is automatically enabled for writing these formats if `auto_sqlite` +[config parameter](./doc/config.rst) is `true` (which is the default). +This also applies to when you pass `--sort` flag for any format. +You may use `--no-sqlite` to override this and switch to indirect mode. + +Currently you can not disable alternates in SQLite mode (`--no-alts` is ignored). + +## Sorting + +There are two things than can activate sorting entries: + +- Output format requires sorting (as explained [above](#direct-and-indirect-modes)) +- You pass `--sort` flag in command line. + +In the case of passing `--sort`, you can also pass: + +- `--sort-key` to select sort key aka sorting order (including locale), see [doc/sort-key.md](./doc/sort-key.md) + +- `--sort-encoding` to change the encoding used for sort + + - UTF-8 is the default encoding for all sort keys and all output formats (unless mentioned otherwise) + - This will only effect the order of entries, and will not corrupt words / definition + - Non-encodable characters are replaced with `?` byte (*only for sorting*) + - Conflicts with `--sort-locale` + +## Cache directory + +Cache directory is used for storing temporary files which are either moved or deleted +after conversion. You can pass `--no-cleanup` flag in order to keep them. + +The path for cache directory: + +- Linux or BSD: `~/.cache/pyglossary/` +- Mac: `~/Library/Caches/PyGlossary/` +- Windows: `C:\Users\USERNAME\AppData\Local\PyGlossary\Cache\` + +## User plugins + +If you want to add your own plugin without adding it to source code directory, +or you want to use a plugin that has been removed from repository, +you can place it in this directory: + +- Linux or BSD: `~/.pyglossary/plugins/` +- Mac: `~/Library/Preferences/PyGlossary/plugins/` +- Windows: `C:\Users\USERNAME\AppData\Roaming\PyGlossary\plugins\` + +## Using PyGlossary as a Python library + +There are a few examples in [doc/lib-examples](./doc/lib-examples) directory. + +Here is a basic script that converts any supported glossary format to [Tabfile](./doc/p/tabfile.md): + +```python +import sys +from pyglossary import Glossary + +# Glossary.init() should be called only once, so make sure you put it +# in the right place +Glossary.init() + +glos = Glossary() +glos.convert( + inputFilename=sys.argv[1], + outputFilename=f"{sys.argv[1]}.txt", + # although it can detect format for *.txt, you can still pass outputFormat + outputFormat="Tabfile", + # you can pass readOptions or writeOptions as a dict + # writeOptions={"encoding": "utf-8"}, +) +``` + +And if you choose to use `glossary_v2`: + +```python +import sys +from pyglossary.glossary_v2 import ConvertArgs, Glossary + +# Glossary.init() should be called only once, so make sure you put it +# in the right place +Glossary.init() + +glos = Glossary() +glos.convert(ConvertArgs( + inputFilename=sys.argv[1], + outputFilename=f"{sys.argv[1]}.txt", + # although it can detect format for *.txt, you can still pass outputFormat + outputFormat="Tabfile", + # you can pass readOptions or writeOptions as a dict + # writeOptions={"encoding": "utf-8"}, +)) +``` + +You may look at docstring of `Glossary.convert` for full list of keyword arguments. + + +If you need to add entries inside your Python program (rather than converting one glossary into another), then you use `write` instead of `convert`, here is an example: + +```python +from pyglossary import Glossary + +Glossary.init() + +glos = Glossary() +mydict = { + "a": "test1", + "b": "test2", + "c": "test3", +} +for word, defi in mydict.items(): + glos.addEntryObj(glos.newEntry( + word, + defi, + defiFormat="m", # "m" for plain text, "h" for HTML + )) + +glos.setInfo("title", "My Test StarDict") +glos.setInfo("author", "John Doe") +glos.write("test.ifo", format="Stardict") +``` + +**Note:** `addEntryObj` is renamed to `addEntry` in `pyglossary.glossary_v2`. + +**Note:** Switching to `glossary_v2` is optional and recommended. + + +And if you need to read a glossary from file into a `Glossary` object in RAM (without immediately converting it), you can use `glos.read(filename, format=inputFormat)`. Be wary of RAM usage in this case. + +If you want to include images, css, js or other files in a glossary that you are creating, you need to add them as **Data Entries**, for example: + +```python +with open(os.path.join(imageDir, "a.jpeg")) as fp: + glos.addEntry(glos.newDataEntry("img/a.jpeg", fp.read())) +``` + +The first argument to `newDataEntry` must be the relative path (that generally html codes of your definitions points to). + +## Internal glossary structure + +A glossary contains a number of entries. + +Each entry contains: + +- Headword (title or main phrase for lookup) +- Alternates (some alternative phrases for lookup) +- Definition + +In PyGlossary, headword and alternates together are accessible as a single Python list `entry.l_word` + +`entry.defi` is the definition as a Python Unicode `str`. Also `entry.b_defi` is definition in UTF-8 byte array. + +`entry.defiFormat` is definition format. If definition is plaintext (not rich text), the value is `m`. And if it's in HTML (contains any html tag), then `defiFormat` is `h`. The value `x` is also allowed for XFXF, but XDXF is not widely supported in dictionary applications. + +There is another type of entry which is called **Data Entry**, and generally contains an image, audio, css, or any other file that was included in input glossary. For data entries: + +- `entry.s_word` is file name (and `l_word` is still a list containing this string), +- `entry.defiFormat` is `b` +- `entry.data` gives the content of file in `bytes`. + +## Entry filters + +Entry filters are internal objects that modify words/definition of entries, +or remove entries (in some special cases). + +Like several filters in a pipe which connects a `reader` object to a `writer` object +(with both of their classes defined in plugins and instantiated in `Glossary` class). + +You can enable/disable some of these filters using config parameters / command like flags, which +are documented in [doc/config.rst](./doc/config.rst). + +The full list of entry filters is also documented in [doc/entry-filters.md](./doc/entry-filters.md). + + +%prep +%autosetup -n pyglossary-4.6.1 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-pyglossary -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 4.6.1-1 +- Package Spec generated @@ -0,0 +1 @@ +5f0ed823eb07c4559302925194003770 pyglossary-4.6.1.tar.gz |