diff options
author | CoprDistGit <infra@openeuler.org> | 2023-04-11 20:19:33 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-04-11 20:19:33 +0000 |
commit | 9f0f4b14ee9b649b065296b47d3a981c4d816a52 (patch) | |
tree | 77a02e8954a3279a5d6447e3b9ba78340b3ba10e | |
parent | 5aedbc5b7da54cf2907aca7bbf6f6d6679608e1e (diff) |
automatic import of python-spacy-cld
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-spacy-cld.spec | 159 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 161 insertions, 0 deletions
@@ -0,0 +1 @@ +/spacy_cld-0.1.0.tar.gz diff --git a/python-spacy-cld.spec b/python-spacy-cld.spec new file mode 100644 index 0000000..7ce0d9f --- /dev/null +++ b/python-spacy-cld.spec @@ -0,0 +1,159 @@ +%global _empty_manifest_terminate_build 0 +Name: python-spacy-cld +Version: 0.1.0 +Release: 1 +Summary: spaCy pipeline component for guessing the language of Doc and Span objects. +License: MIT +URL: https://github.com/nickdavidhaynes/spacy-cld +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/e3/3b/f5344007259b5beb0a8e0d7b9e6b0d2c5c4dcfe674bc94b7497bcc201ee0/spacy_cld-0.1.0.tar.gz +BuildArch: noarch + + +%description +# spaCy-CLD: Bringing simple language detection to spaCy + +## Installation + +`pip install spacy_cld` + +## Usage + +Adding the spaCy-CLD component to the processing pipeline is relatively simple: + +``` +import spacy +from spacy_cld import LanguageDetector + +nlp = spacy.load('en') +language_detector = LanguageDetector() +nlp.add_pipe(language_detector) +doc = nlp('This is some English text.') + +doc._.languages # ['en'] +doc._.language_scores['en'] # 0.96 +``` + +spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1). + +## Under the hood + +spacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language. + +For additional details, see the linked project pages for PYCLD2 and CLD2. + +%package -n python3-spacy-cld +Summary: spaCy pipeline component for guessing the language of Doc and Span objects. +Provides: python-spacy-cld +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-spacy-cld +# spaCy-CLD: Bringing simple language detection to spaCy + +## Installation + +`pip install spacy_cld` + +## Usage + +Adding the spaCy-CLD component to the processing pipeline is relatively simple: + +``` +import spacy +from spacy_cld import LanguageDetector + +nlp = spacy.load('en') +language_detector = LanguageDetector() +nlp.add_pipe(language_detector) +doc = nlp('This is some English text.') + +doc._.languages # ['en'] +doc._.language_scores['en'] # 0.96 +``` + +spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1). + +## Under the hood + +spacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language. + +For additional details, see the linked project pages for PYCLD2 and CLD2. + +%package help +Summary: Development documents and examples for spacy-cld +Provides: python3-spacy-cld-doc +%description help +# spaCy-CLD: Bringing simple language detection to spaCy + +## Installation + +`pip install spacy_cld` + +## Usage + +Adding the spaCy-CLD component to the processing pipeline is relatively simple: + +``` +import spacy +from spacy_cld import LanguageDetector + +nlp = spacy.load('en') +language_detector = LanguageDetector() +nlp.add_pipe(language_detector) +doc = nlp('This is some English text.') + +doc._.languages # ['en'] +doc._.language_scores['en'] # 0.96 +``` + +spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1). + +## Under the hood + +spacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language. + +For additional details, see the linked project pages for PYCLD2 and CLD2. + +%prep +%autosetup -n spacy-cld-0.1.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-spacy-cld -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.0-1 +- Package Spec generated @@ -0,0 +1 @@ +0572f0ff474332ec85c0b348ad248619 spacy_cld-0.1.0.tar.gz |