%global _empty_manifest_terminate_build 0 Name: python-pycld3 Version: 0.22 Release: 1 Summary: CLD3 Python bindings License: Apache 2.0 URL: https://github.com/bsolomon1124/pycld3 Source0: https://mirrors.nju.edu.cn/pypi/web/packages/6b/d0/b180a38c983062877f72dffe876de58dad216a5be26d05b04f9ae4050e4b/pycld3-0.22.tar.gz BuildArch: noarch %description # `pycld3` Python bindings to the Compact Language Detector v3 (CLD3). [![CircleCI](https://circleci.com/gh/bsolomon1124/pycld3.svg?style=svg)](https://circleci.com/gh/bsolomon1124/pycld3) [![License](https://img.shields.io/github/license/bsolomon1124/pycld3.svg)](https://github.com/bsolomon1124/pycld3/blob/master/LICENSE) [![PyPI](https://img.shields.io/pypi/v/pycld3.svg)](https://pypi.org/project/pycld3/) [![Wheel](https://img.shields.io/pypi/wheel/pycld3)](https://img.shields.io/pypi/wheel/pycld3) [![Status](https://img.shields.io/pypi/status/pycld3.svg)](https://pypi.org/project/pycld3/) [![Python](https://img.shields.io/pypi/pyversions/pycld3.svg)](https://pypi.org/project/pycld3) [![Implementation](https://img.shields.io/pypi/implementation/pycld3)](https://pypi.org/project/pycld3) ## Newer Alternative: `gcld3` **Note**: Since the original publication of this `pycld3`, Google's `cld3` authors have published the Python package [gcld3](https://pypi.org/project/gcld3/), which are official Python bindings built with [pybind](https://github.com/pybind/pybind11). Please check that project out as it is part of the canonical `cld3` repository and will likely stay in better lock step with any `cld3` changes over time. ## Overview This package contains Python bindings (via Cython) to Google's [CLD3](https://github.com/google/cld3/) library. ```python >>> import cld3 >>> cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度") LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0) ``` The library outputs BCP-47-style language codes. For some languages, output is differentiated by script. Language and script names from Unicode CLDR. It supports over 100 languages/scripts. See full list of [supported languages/scripts](https://github.com/google/cld3/blob/master/README.md#supported-languages) in Google's CLD3 documentation. ## Installing with Wheels: Supported Versions and Platforms This project supports **CPython versions 3.6 through 3.9.** We publish [wheels](https://pypi.org/project/pycld3/#files) for the following matrix: - **MacOS**: CPython 3.6 thru 3.9 - **Linux**: CPython 3.6 thru 3.9; ([manylinux1](https://www.python.org/dev/peps/pep-0513/#the-manylinux1-policy)) The wheels for both MacOS and manylinux1 include the external protobuf library copied into the wheel itself via [auditwheel](https://github.com/pypa/auditwheel) or [delocate](https://github.com/matthew-brett/delocate) so that you won't need to install any extra non-PyPI dependencies. If you are installing on one of the variants listed above, you should **not** need to have `protoc` or `libprotobuf` installed: ```bash python -m pip install -U pycld3 ``` ## Installing from Source: Prerequisites If you are not on a platform variant that is eligible to use a wheel, you may still be able to use `pycld3` via its [source distribution](https://docs.python.org/3/distutils/sourcedist.html) (`tar.gz`), but a bit more work is required to install. Namely, you'll also need: - the Protobuf compiler (the `protoc` executable) - the Protobuf development headers and `libprotoc` library - a compiler, preferably `g++` Please consult [the official protobuf repository](https://github.com/protocolbuffers/protobuf) for information on installing Protobuf. The project contains an [Installation README](https://github.com/protocolbuffers/protobuf/tree/master/src) that covers installation on Windows and Unix. If for whatever reason you are on a Unix host but unable to use the wheels (for instance, if you have an i686 architecture), here is a quick-and-dirty guide to installing. ### Debian/Ubuntu ```bash sudo apt-get update -y sudo apt-get install -y --no-install-recommends \ g++ \ protobuf-compiler \ libprotobuf-dev python -m pip install -U pycld3 ``` ### Alpine Linux _Note_: [Alpine Linux does not support PyPI wheels](https://pythonspeed.com/articles/alpine-docker-python/) as of April 2020. The steps below are mandatory on Alpine Linux because you will need to install from the source distribution. If the situation permits, using a Debian distro should be much easier (and faster). ```bash apk --update add g++ protobuf protobuf-dev python -m pip install -U pycld3 ``` ### CentOS/RHEL Install from source, as root/UID 0: ```bash sudo su - set -ex pushd /opt PROTOBUF_VERSION='3.11.4' yum update -y yum install -y autoconf automake gcc-c++ glibc-headers gzip libtool make python3-devel zlib-devel curl -Lo /opt/protobuf.tar.gz \ "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOBUF_VERSION}/protobuf-cpp-${PROTOBUF_VERSION}.tar.gz" tar -xzvf protobuf.tar.gz rm -f protobuf.tar.gz pushd "protobuf-${PROTOBUF_VERSION}" ./configure --with-zlib --disable-debug && make && make install && ldconfig --verbose popd && rm -rf "protobuf-${PROTOBUF_VERSION}" && popd && set +ex python -m pip install -U pycld3 ``` Note: the steps above are for CentOS 8. For earlier versions, you may need to replace: - `gcc-c++` with `g++` - `python3-devel` with `python-devel` ### MacOS/Homebrew ```bash brew update brew upgrade protobuf || brew install -v protobuf python -m pip install -U pycld3 ``` ### Windows Please consult Protobuf's [C++ Installation - Windows](https://github.com/protocolbuffers/protobuf/tree/master/src#c-installation---windows) section for help with installing Protobuf on Windows. If you would like to help contribute Windows wheels (preferably as a job within the project's CI/CD pipelines), please [file an issue](https://github.com/bsolomon1124/pycld3). ## Usage `cld3` exports two module-level functions, `get_language()` and `get_frequent_languages()`: ```python >>> import cld3 >>> cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度") LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0) >>> cld3.get_language("This is a test") LanguagePrediction(language='en', probability=0.9999980926513672, is_reliable=True, proportion=1.0) >>> for lang in cld3.get_frequent_languages( ... "This piece of text is in English. Този текст е на Български.", ... num_langs=3 ... ): ... print(lang) ... LanguagePrediction(language='bg', probability=0.9173890948295593, is_reliable=True, proportion=0.5853658318519592) LanguagePrediction(language='en', probability=0.9999790191650391, is_reliable=True, proportion=0.4146341383457184) ``` ## FAQ ### `cld3` incorrectly detects my input. How can I fix this? A first resort is to **preprocess (clean) your input text** based on conditions specific to your program. A salient example is to remove URLs and email addresses from the input. **CLD3 (unlike [CLD2](https://github.com/CLD2Owners/cld2)) does almost none of this cleaning for you**, in the spirit of not penalizing other users with overhead that they may not need. Here's such an example using a simplified URL regex from _Regular Expressions Cookbook, 2nd ed._: ```python >>> import re >>> import cld3 # cld3 does not ignore the URL components by default >>> s = "Je veux que: https://site.english.com/this/is/a/url/path/component#fragment" >>> cld3.get_language(s) LanguagePrediction(language='en', probability=0.5319557189941406, is_reliable=False, proportion=1.0) >>> url_re = r"\b(?:https?://|www\.)[a-z0-9-]+(\.[a-z0-9-]+)+(?:[/?].*)?" >>> new_s = re.sub(url_re, "", s) >>> new_s 'Je veux que: ' >>> cld3.get_language(new_s) LanguagePrediction(language='fr', probability=0.9799421429634094, is_reliable=True, proportion=1.0) ``` _Note_: This URL regex aims for simplicity. It requires a domain name, and doesn't allow a username or password; it allows the scheme (http or https) to be omitted if it can be inferred from the subdomain (www). Source: _Regular Expressions Cookbook, 2nd ed._ - Goyvaerts & Levithan. **In some other cases, you cannot fix the incorrect detection.** Language detection algorithms in general may perform poorly with very short inputs. Rarely should you trust the output of something like `detect("hi")`. Keep this limitation in mind regardless of what library you are using. Please remember that, at the end of the day, this project is just a Python wrapper to the CLD3 C++ library that does the actual heavy-lifting. ### I'm seeing an error during `pip` installation. How can I fix this? First, please make sure you have read the [installation](#installation-supported-versions-and-platforms) section that that you have installed Protobuf if necessary. If that doesn't help, please [file an issue](https://github.com/bsolomon1124/pycld3/issues) in this repository. The build process for this project is somewhat complex because it involves both Cython and Protobuf, but I do my best to make it work everywhere possible. ### Protobuf is installed, but I'm still seeing "cannot open shared object file" If you've installed Protobuf, but are seeing an error such as: ``` ImportError: libprotobuf.so.22: cannot open shared object file: No such file or directory ``` This likely means that Python is not finding the `libprotobuf` shared object, possibly because `ldconfig` didn't do what it was supposed to. You may need to tell it where to look. You can find where the library sits via: ```bash $ find /usr -name 'libprotoc.so' \( -type l -o -type f \) /usr/local/lib/libprotoc.so ``` Then, you can add the directory containing this file to `LD_LIBRARY_PATH`: ```bash export LD_LIBRARY_PATH="$(dirname $(find /usr -name 'libprotoc.so' \( -type l -o -type f \))):$LD_LIBRARY_PATH" ``` You can quickly test that this worked: ```bash $ python -c 'import cld3; print(cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度"))' LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0) ``` ### Authors This repository contains a fork of [`google/cld3`](https://github.com/google/cld3/) at commit 06f695f. The license for `google/cld3` can be found at [LICENSES/CLD3\_LICENSE](https://github.com/bsolomon1124/pycld3/blob/master/LICENSES/CLD3_LICENSE). This repository is a combination of changes [introduced](https://github.com/google/cld3/issues/15) by [various forks](https://github.com/google/cld3/network/members) of `google/cld3` by the following people: - Johannes Baiter ([@jbaiter](https://github.com/jbaiter)) - Elizabeth Myers ([@Elizafox](https://github.com/Elizafox)) - Witold Bołt ([@houp](https://github.com/houp)) - Alfredo Luque ([@iamthebot](https://github.com/iamthebot)) - WISESIGHT ([@wisesight](https://github.com/wisesight)) - RNogales ([@RNogales94](https://github.com/RNogales94)) - Brad Solomon ([@bsolomon1124](https://github.com/bsolomon1124)) %package -n python3-pycld3 Summary: CLD3 Python bindings Provides: python-pycld3 BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-pycld3 # `pycld3` Python bindings to the Compact Language Detector v3 (CLD3). [![CircleCI](https://circleci.com/gh/bsolomon1124/pycld3.svg?style=svg)](https://circleci.com/gh/bsolomon1124/pycld3) [![License](https://img.shields.io/github/license/bsolomon1124/pycld3.svg)](https://github.com/bsolomon1124/pycld3/blob/master/LICENSE) [![PyPI](https://img.shields.io/pypi/v/pycld3.svg)](https://pypi.org/project/pycld3/) [![Wheel](https://img.shields.io/pypi/wheel/pycld3)](https://img.shields.io/pypi/wheel/pycld3) [![Status](https://img.shields.io/pypi/status/pycld3.svg)](https://pypi.org/project/pycld3/) [![Python](https://img.shields.io/pypi/pyversions/pycld3.svg)](https://pypi.org/project/pycld3) [![Implementation](https://img.shields.io/pypi/implementation/pycld3)](https://pypi.org/project/pycld3) ## Newer Alternative: `gcld3` **Note**: Since the original publication of this `pycld3`, Google's `cld3` authors have published the Python package [gcld3](https://pypi.org/project/gcld3/), which are official Python bindings built with [pybind](https://github.com/pybind/pybind11). Please check that project out as it is part of the canonical `cld3` repository and will likely stay in better lock step with any `cld3` changes over time. ## Overview This package contains Python bindings (via Cython) to Google's [CLD3](https://github.com/google/cld3/) library. ```python >>> import cld3 >>> cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度") LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0) ``` The library outputs BCP-47-style language codes. For some languages, output is differentiated by script. Language and script names from Unicode CLDR. It supports over 100 languages/scripts. See full list of [supported languages/scripts](https://github.com/google/cld3/blob/master/README.md#supported-languages) in Google's CLD3 documentation. ## Installing with Wheels: Supported Versions and Platforms This project supports **CPython versions 3.6 through 3.9.** We publish [wheels](https://pypi.org/project/pycld3/#files) for the following matrix: - **MacOS**: CPython 3.6 thru 3.9 - **Linux**: CPython 3.6 thru 3.9; ([manylinux1](https://www.python.org/dev/peps/pep-0513/#the-manylinux1-policy)) The wheels for both MacOS and manylinux1 include the external protobuf library copied into the wheel itself via [auditwheel](https://github.com/pypa/auditwheel) or [delocate](https://github.com/matthew-brett/delocate) so that you won't need to install any extra non-PyPI dependencies. If you are installing on one of the variants listed above, you should **not** need to have `protoc` or `libprotobuf` installed: ```bash python -m pip install -U pycld3 ``` ## Installing from Source: Prerequisites If you are not on a platform variant that is eligible to use a wheel, you may still be able to use `pycld3` via its [source distribution](https://docs.python.org/3/distutils/sourcedist.html) (`tar.gz`), but a bit more work is required to install. Namely, you'll also need: - the Protobuf compiler (the `protoc` executable) - the Protobuf development headers and `libprotoc` library - a compiler, preferably `g++` Please consult [the official protobuf repository](https://github.com/protocolbuffers/protobuf) for information on installing Protobuf. The project contains an [Installation README](https://github.com/protocolbuffers/protobuf/tree/master/src) that covers installation on Windows and Unix. If for whatever reason you are on a Unix host but unable to use the wheels (for instance, if you have an i686 architecture), here is a quick-and-dirty guide to installing. ### Debian/Ubuntu ```bash sudo apt-get update -y sudo apt-get install -y --no-install-recommends \ g++ \ protobuf-compiler \ libprotobuf-dev python -m pip install -U pycld3 ``` ### Alpine Linux _Note_: [Alpine Linux does not support PyPI wheels](https://pythonspeed.com/articles/alpine-docker-python/) as of April 2020. The steps below are mandatory on Alpine Linux because you will need to install from the source distribution. If the situation permits, using a Debian distro should be much easier (and faster). ```bash apk --update add g++ protobuf protobuf-dev python -m pip install -U pycld3 ``` ### CentOS/RHEL Install from source, as root/UID 0: ```bash sudo su - set -ex pushd /opt PROTOBUF_VERSION='3.11.4' yum update -y yum install -y autoconf automake gcc-c++ glibc-headers gzip libtool make python3-devel zlib-devel curl -Lo /opt/protobuf.tar.gz \ "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOBUF_VERSION}/protobuf-cpp-${PROTOBUF_VERSION}.tar.gz" tar -xzvf protobuf.tar.gz rm -f protobuf.tar.gz pushd "protobuf-${PROTOBUF_VERSION}" ./configure --with-zlib --disable-debug && make && make install && ldconfig --verbose popd && rm -rf "protobuf-${PROTOBUF_VERSION}" && popd && set +ex python -m pip install -U pycld3 ``` Note: the steps above are for CentOS 8. For earlier versions, you may need to replace: - `gcc-c++` with `g++` - `python3-devel` with `python-devel` ### MacOS/Homebrew ```bash brew update brew upgrade protobuf || brew install -v protobuf python -m pip install -U pycld3 ``` ### Windows Please consult Protobuf's [C++ Installation - Windows](https://github.com/protocolbuffers/protobuf/tree/master/src#c-installation---windows) section for help with installing Protobuf on Windows. If you would like to help contribute Windows wheels (preferably as a job within the project's CI/CD pipelines), please [file an issue](https://github.com/bsolomon1124/pycld3). ## Usage `cld3` exports two module-level functions, `get_language()` and `get_frequent_languages()`: ```python >>> import cld3 >>> cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度") LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0) >>> cld3.get_language("This is a test") LanguagePrediction(language='en', probability=0.9999980926513672, is_reliable=True, proportion=1.0) >>> for lang in cld3.get_frequent_languages( ... "This piece of text is in English. Този текст е на Български.", ... num_langs=3 ... ): ... print(lang) ... LanguagePrediction(language='bg', probability=0.9173890948295593, is_reliable=True, proportion=0.5853658318519592) LanguagePrediction(language='en', probability=0.9999790191650391, is_reliable=True, proportion=0.4146341383457184) ``` ## FAQ ### `cld3` incorrectly detects my input. How can I fix this? A first resort is to **preprocess (clean) your input text** based on conditions specific to your program. A salient example is to remove URLs and email addresses from the input. **CLD3 (unlike [CLD2](https://github.com/CLD2Owners/cld2)) does almost none of this cleaning for you**, in the spirit of not penalizing other users with overhead that they may not need. Here's such an example using a simplified URL regex from _Regular Expressions Cookbook, 2nd ed._: ```python >>> import re >>> import cld3 # cld3 does not ignore the URL components by default >>> s = "Je veux que: https://site.english.com/this/is/a/url/path/component#fragment" >>> cld3.get_language(s) LanguagePrediction(language='en', probability=0.5319557189941406, is_reliable=False, proportion=1.0) >>> url_re = r"\b(?:https?://|www\.)[a-z0-9-]+(\.[a-z0-9-]+)+(?:[/?].*)?" >>> new_s = re.sub(url_re, "", s) >>> new_s 'Je veux que: ' >>> cld3.get_language(new_s) LanguagePrediction(language='fr', probability=0.9799421429634094, is_reliable=True, proportion=1.0) ``` _Note_: This URL regex aims for simplicity. It requires a domain name, and doesn't allow a username or password; it allows the scheme (http or https) to be omitted if it can be inferred from the subdomain (www). Source: _Regular Expressions Cookbook, 2nd ed._ - Goyvaerts & Levithan. **In some other cases, you cannot fix the incorrect detection.** Language detection algorithms in general may perform poorly with very short inputs. Rarely should you trust the output of something like `detect("hi")`. Keep this limitation in mind regardless of what library you are using. Please remember that, at the end of the day, this project is just a Python wrapper to the CLD3 C++ library that does the actual heavy-lifting. ### I'm seeing an error during `pip` installation. How can I fix this? First, please make sure you have read the [installation](#installation-supported-versions-and-platforms) section that that you have installed Protobuf if necessary. If that doesn't help, please [file an issue](https://github.com/bsolomon1124/pycld3/issues) in this repository. The build process for this project is somewhat complex because it involves both Cython and Protobuf, but I do my best to make it work everywhere possible. ### Protobuf is installed, but I'm still seeing "cannot open shared object file" If you've installed Protobuf, but are seeing an error such as: ``` ImportError: libprotobuf.so.22: cannot open shared object file: No such file or directory ``` This likely means that Python is not finding the `libprotobuf` shared object, possibly because `ldconfig` didn't do what it was supposed to. You may need to tell it where to look. You can find where the library sits via: ```bash $ find /usr -name 'libprotoc.so' \( -type l -o -type f \) /usr/local/lib/libprotoc.so ``` Then, you can add the directory containing this file to `LD_LIBRARY_PATH`: ```bash export LD_LIBRARY_PATH="$(dirname $(find /usr -name 'libprotoc.so' \( -type l -o -type f \))):$LD_LIBRARY_PATH" ``` You can quickly test that this worked: ```bash $ python -c 'import cld3; print(cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度"))' LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0) ``` ### Authors This repository contains a fork of [`google/cld3`](https://github.com/google/cld3/) at commit 06f695f. The license for `google/cld3` can be found at [LICENSES/CLD3\_LICENSE](https://github.com/bsolomon1124/pycld3/blob/master/LICENSES/CLD3_LICENSE). This repository is a combination of changes [introduced](https://github.com/google/cld3/issues/15) by [various forks](https://github.com/google/cld3/network/members) of `google/cld3` by the following people: - Johannes Baiter ([@jbaiter](https://github.com/jbaiter)) - Elizabeth Myers ([@Elizafox](https://github.com/Elizafox)) - Witold Bołt ([@houp](https://github.com/houp)) - Alfredo Luque ([@iamthebot](https://github.com/iamthebot)) - WISESIGHT ([@wisesight](https://github.com/wisesight)) - RNogales ([@RNogales94](https://github.com/RNogales94)) - Brad Solomon ([@bsolomon1124](https://github.com/bsolomon1124)) %package help Summary: Development documents and examples for pycld3 Provides: python3-pycld3-doc %description help # `pycld3` Python bindings to the Compact Language Detector v3 (CLD3). [![CircleCI](https://circleci.com/gh/bsolomon1124/pycld3.svg?style=svg)](https://circleci.com/gh/bsolomon1124/pycld3) [![License](https://img.shields.io/github/license/bsolomon1124/pycld3.svg)](https://github.com/bsolomon1124/pycld3/blob/master/LICENSE) [![PyPI](https://img.shields.io/pypi/v/pycld3.svg)](https://pypi.org/project/pycld3/) [![Wheel](https://img.shields.io/pypi/wheel/pycld3)](https://img.shields.io/pypi/wheel/pycld3) [![Status](https://img.shields.io/pypi/status/pycld3.svg)](https://pypi.org/project/pycld3/) [![Python](https://img.shields.io/pypi/pyversions/pycld3.svg)](https://pypi.org/project/pycld3) [![Implementation](https://img.shields.io/pypi/implementation/pycld3)](https://pypi.org/project/pycld3) ## Newer Alternative: `gcld3` **Note**: Since the original publication of this `pycld3`, Google's `cld3` authors have published the Python package [gcld3](https://pypi.org/project/gcld3/), which are official Python bindings built with [pybind](https://github.com/pybind/pybind11). Please check that project out as it is part of the canonical `cld3` repository and will likely stay in better lock step with any `cld3` changes over time. ## Overview This package contains Python bindings (via Cython) to Google's [CLD3](https://github.com/google/cld3/) library. ```python >>> import cld3 >>> cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度") LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0) ``` The library outputs BCP-47-style language codes. For some languages, output is differentiated by script. Language and script names from Unicode CLDR. It supports over 100 languages/scripts. See full list of [supported languages/scripts](https://github.com/google/cld3/blob/master/README.md#supported-languages) in Google's CLD3 documentation. ## Installing with Wheels: Supported Versions and Platforms This project supports **CPython versions 3.6 through 3.9.** We publish [wheels](https://pypi.org/project/pycld3/#files) for the following matrix: - **MacOS**: CPython 3.6 thru 3.9 - **Linux**: CPython 3.6 thru 3.9; ([manylinux1](https://www.python.org/dev/peps/pep-0513/#the-manylinux1-policy)) The wheels for both MacOS and manylinux1 include the external protobuf library copied into the wheel itself via [auditwheel](https://github.com/pypa/auditwheel) or [delocate](https://github.com/matthew-brett/delocate) so that you won't need to install any extra non-PyPI dependencies. If you are installing on one of the variants listed above, you should **not** need to have `protoc` or `libprotobuf` installed: ```bash python -m pip install -U pycld3 ``` ## Installing from Source: Prerequisites If you are not on a platform variant that is eligible to use a wheel, you may still be able to use `pycld3` via its [source distribution](https://docs.python.org/3/distutils/sourcedist.html) (`tar.gz`), but a bit more work is required to install. Namely, you'll also need: - the Protobuf compiler (the `protoc` executable) - the Protobuf development headers and `libprotoc` library - a compiler, preferably `g++` Please consult [the official protobuf repository](https://github.com/protocolbuffers/protobuf) for information on installing Protobuf. The project contains an [Installation README](https://github.com/protocolbuffers/protobuf/tree/master/src) that covers installation on Windows and Unix. If for whatever reason you are on a Unix host but unable to use the wheels (for instance, if you have an i686 architecture), here is a quick-and-dirty guide to installing. ### Debian/Ubuntu ```bash sudo apt-get update -y sudo apt-get install -y --no-install-recommends \ g++ \ protobuf-compiler \ libprotobuf-dev python -m pip install -U pycld3 ``` ### Alpine Linux _Note_: [Alpine Linux does not support PyPI wheels](https://pythonspeed.com/articles/alpine-docker-python/) as of April 2020. The steps below are mandatory on Alpine Linux because you will need to install from the source distribution. If the situation permits, using a Debian distro should be much easier (and faster). ```bash apk --update add g++ protobuf protobuf-dev python -m pip install -U pycld3 ``` ### CentOS/RHEL Install from source, as root/UID 0: ```bash sudo su - set -ex pushd /opt PROTOBUF_VERSION='3.11.4' yum update -y yum install -y autoconf automake gcc-c++ glibc-headers gzip libtool make python3-devel zlib-devel curl -Lo /opt/protobuf.tar.gz \ "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOBUF_VERSION}/protobuf-cpp-${PROTOBUF_VERSION}.tar.gz" tar -xzvf protobuf.tar.gz rm -f protobuf.tar.gz pushd "protobuf-${PROTOBUF_VERSION}" ./configure --with-zlib --disable-debug && make && make install && ldconfig --verbose popd && rm -rf "protobuf-${PROTOBUF_VERSION}" && popd && set +ex python -m pip install -U pycld3 ``` Note: the steps above are for CentOS 8. For earlier versions, you may need to replace: - `gcc-c++` with `g++` - `python3-devel` with `python-devel` ### MacOS/Homebrew ```bash brew update brew upgrade protobuf || brew install -v protobuf python -m pip install -U pycld3 ``` ### Windows Please consult Protobuf's [C++ Installation - Windows](https://github.com/protocolbuffers/protobuf/tree/master/src#c-installation---windows) section for help with installing Protobuf on Windows. If you would like to help contribute Windows wheels (preferably as a job within the project's CI/CD pipelines), please [file an issue](https://github.com/bsolomon1124/pycld3). ## Usage `cld3` exports two module-level functions, `get_language()` and `get_frequent_languages()`: ```python >>> import cld3 >>> cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度") LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0) >>> cld3.get_language("This is a test") LanguagePrediction(language='en', probability=0.9999980926513672, is_reliable=True, proportion=1.0) >>> for lang in cld3.get_frequent_languages( ... "This piece of text is in English. Този текст е на Български.", ... num_langs=3 ... ): ... print(lang) ... LanguagePrediction(language='bg', probability=0.9173890948295593, is_reliable=True, proportion=0.5853658318519592) LanguagePrediction(language='en', probability=0.9999790191650391, is_reliable=True, proportion=0.4146341383457184) ``` ## FAQ ### `cld3` incorrectly detects my input. How can I fix this? A first resort is to **preprocess (clean) your input text** based on conditions specific to your program. A salient example is to remove URLs and email addresses from the input. **CLD3 (unlike [CLD2](https://github.com/CLD2Owners/cld2)) does almost none of this cleaning for you**, in the spirit of not penalizing other users with overhead that they may not need. Here's such an example using a simplified URL regex from _Regular Expressions Cookbook, 2nd ed._: ```python >>> import re >>> import cld3 # cld3 does not ignore the URL components by default >>> s = "Je veux que: https://site.english.com/this/is/a/url/path/component#fragment" >>> cld3.get_language(s) LanguagePrediction(language='en', probability=0.5319557189941406, is_reliable=False, proportion=1.0) >>> url_re = r"\b(?:https?://|www\.)[a-z0-9-]+(\.[a-z0-9-]+)+(?:[/?].*)?" >>> new_s = re.sub(url_re, "", s) >>> new_s 'Je veux que: ' >>> cld3.get_language(new_s) LanguagePrediction(language='fr', probability=0.9799421429634094, is_reliable=True, proportion=1.0) ``` _Note_: This URL regex aims for simplicity. It requires a domain name, and doesn't allow a username or password; it allows the scheme (http or https) to be omitted if it can be inferred from the subdomain (www). Source: _Regular Expressions Cookbook, 2nd ed._ - Goyvaerts & Levithan. **In some other cases, you cannot fix the incorrect detection.** Language detection algorithms in general may perform poorly with very short inputs. Rarely should you trust the output of something like `detect("hi")`. Keep this limitation in mind regardless of what library you are using. Please remember that, at the end of the day, this project is just a Python wrapper to the CLD3 C++ library that does the actual heavy-lifting. ### I'm seeing an error during `pip` installation. How can I fix this? First, please make sure you have read the [installation](#installation-supported-versions-and-platforms) section that that you have installed Protobuf if necessary. If that doesn't help, please [file an issue](https://github.com/bsolomon1124/pycld3/issues) in this repository. The build process for this project is somewhat complex because it involves both Cython and Protobuf, but I do my best to make it work everywhere possible. ### Protobuf is installed, but I'm still seeing "cannot open shared object file" If you've installed Protobuf, but are seeing an error such as: ``` ImportError: libprotobuf.so.22: cannot open shared object file: No such file or directory ``` This likely means that Python is not finding the `libprotobuf` shared object, possibly because `ldconfig` didn't do what it was supposed to. You may need to tell it where to look. You can find where the library sits via: ```bash $ find /usr -name 'libprotoc.so' \( -type l -o -type f \) /usr/local/lib/libprotoc.so ``` Then, you can add the directory containing this file to `LD_LIBRARY_PATH`: ```bash export LD_LIBRARY_PATH="$(dirname $(find /usr -name 'libprotoc.so' \( -type l -o -type f \))):$LD_LIBRARY_PATH" ``` You can quickly test that this worked: ```bash $ python -c 'import cld3; print(cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度"))' LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0) ``` ### Authors This repository contains a fork of [`google/cld3`](https://github.com/google/cld3/) at commit 06f695f. The license for `google/cld3` can be found at [LICENSES/CLD3\_LICENSE](https://github.com/bsolomon1124/pycld3/blob/master/LICENSES/CLD3_LICENSE). This repository is a combination of changes [introduced](https://github.com/google/cld3/issues/15) by [various forks](https://github.com/google/cld3/network/members) of `google/cld3` by the following people: - Johannes Baiter ([@jbaiter](https://github.com/jbaiter)) - Elizabeth Myers ([@Elizafox](https://github.com/Elizafox)) - Witold Bołt ([@houp](https://github.com/houp)) - Alfredo Luque ([@iamthebot](https://github.com/iamthebot)) - WISESIGHT ([@wisesight](https://github.com/wisesight)) - RNogales ([@RNogales94](https://github.com/RNogales94)) - Brad Solomon ([@bsolomon1124](https://github.com/bsolomon1124)) %prep %autosetup -n pycld3-0.22 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-pycld3 -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Sun Apr 23 2023 Python_Bot - 0.22-1 - Package Spec generated