summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-04-11 20:19:33 +0000
committerCoprDistGit <infra@openeuler.org>2023-04-11 20:19:33 +0000
commit9f0f4b14ee9b649b065296b47d3a981c4d816a52 (patch)
tree77a02e8954a3279a5d6447e3b9ba78340b3ba10e
parent5aedbc5b7da54cf2907aca7bbf6f6d6679608e1e (diff)
automatic import of python-spacy-cld
-rw-r--r--.gitignore1
-rw-r--r--python-spacy-cld.spec159
-rw-r--r--sources1
3 files changed, 161 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..5b00b0e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/spacy_cld-0.1.0.tar.gz
diff --git a/python-spacy-cld.spec b/python-spacy-cld.spec
new file mode 100644
index 0000000..7ce0d9f
--- /dev/null
+++ b/python-spacy-cld.spec
@@ -0,0 +1,159 @@
+%global _empty_manifest_terminate_build 0
+Name: python-spacy-cld
+Version: 0.1.0
+Release: 1
+Summary: spaCy pipeline component for guessing the language of Doc and Span objects.
+License: MIT
+URL: https://github.com/nickdavidhaynes/spacy-cld
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/e3/3b/f5344007259b5beb0a8e0d7b9e6b0d2c5c4dcfe674bc94b7497bcc201ee0/spacy_cld-0.1.0.tar.gz
+BuildArch: noarch
+
+
+%description
+# spaCy-CLD: Bringing simple language detection to spaCy
+
+## Installation
+
+`pip install spacy_cld`
+
+## Usage
+
+Adding the spaCy-CLD component to the processing pipeline is relatively simple:
+
+```
+import spacy
+from spacy_cld import LanguageDetector
+
+nlp = spacy.load('en')
+language_detector = LanguageDetector()
+nlp.add_pipe(language_detector)
+doc = nlp('This is some English text.')
+
+doc._.languages # ['en']
+doc._.language_scores['en'] # 0.96
+```
+
+spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1).
+
+## Under the hood
+
+spacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language.
+
+For additional details, see the linked project pages for PYCLD2 and CLD2.
+
+%package -n python3-spacy-cld
+Summary: spaCy pipeline component for guessing the language of Doc and Span objects.
+Provides: python-spacy-cld
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-spacy-cld
+# spaCy-CLD: Bringing simple language detection to spaCy
+
+## Installation
+
+`pip install spacy_cld`
+
+## Usage
+
+Adding the spaCy-CLD component to the processing pipeline is relatively simple:
+
+```
+import spacy
+from spacy_cld import LanguageDetector
+
+nlp = spacy.load('en')
+language_detector = LanguageDetector()
+nlp.add_pipe(language_detector)
+doc = nlp('This is some English text.')
+
+doc._.languages # ['en']
+doc._.language_scores['en'] # 0.96
+```
+
+spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1).
+
+## Under the hood
+
+spacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language.
+
+For additional details, see the linked project pages for PYCLD2 and CLD2.
+
+%package help
+Summary: Development documents and examples for spacy-cld
+Provides: python3-spacy-cld-doc
+%description help
+# spaCy-CLD: Bringing simple language detection to spaCy
+
+## Installation
+
+`pip install spacy_cld`
+
+## Usage
+
+Adding the spaCy-CLD component to the processing pipeline is relatively simple:
+
+```
+import spacy
+from spacy_cld import LanguageDetector
+
+nlp = spacy.load('en')
+language_detector = LanguageDetector()
+nlp.add_pipe(language_detector)
+doc = nlp('This is some English text.')
+
+doc._.languages # ['en']
+doc._.language_scores['en'] # 0.96
+```
+
+spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1).
+
+## Under the hood
+
+spacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language.
+
+For additional details, see the linked project pages for PYCLD2 and CLD2.
+
+%prep
+%autosetup -n spacy-cld-0.1.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-spacy-cld -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..e8121b9
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+0572f0ff474332ec85c0b348ad248619 spacy_cld-0.1.0.tar.gz