%global _empty_manifest_terminate_build 0 Name: python-spacy-cld Version: 0.1.0 Release: 1 Summary: spaCy pipeline component for guessing the language of Doc and Span objects. License: MIT URL: https://github.com/nickdavidhaynes/spacy-cld Source0: https://mirrors.nju.edu.cn/pypi/web/packages/e3/3b/f5344007259b5beb0a8e0d7b9e6b0d2c5c4dcfe674bc94b7497bcc201ee0/spacy_cld-0.1.0.tar.gz BuildArch: noarch %description # spaCy-CLD: Bringing simple language detection to spaCy ## Installation `pip install spacy_cld` ## Usage Adding the spaCy-CLD component to the processing pipeline is relatively simple: ``` import spacy from spacy_cld import LanguageDetector nlp = spacy.load('en') language_detector = LanguageDetector() nlp.add_pipe(language_detector) doc = nlp('This is some English text.') doc._.languages # ['en'] doc._.language_scores['en'] # 0.96 ``` spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1). ## Under the hood spacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language. For additional details, see the linked project pages for PYCLD2 and CLD2. %package -n python3-spacy-cld Summary: spaCy pipeline component for guessing the language of Doc and Span objects. Provides: python-spacy-cld BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-spacy-cld # spaCy-CLD: Bringing simple language detection to spaCy ## Installation `pip install spacy_cld` ## Usage Adding the spaCy-CLD component to the processing pipeline is relatively simple: ``` import spacy from spacy_cld import LanguageDetector nlp = spacy.load('en') language_detector = LanguageDetector() nlp.add_pipe(language_detector) doc = nlp('This is some English text.') doc._.languages # ['en'] doc._.language_scores['en'] # 0.96 ``` spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1). ## Under the hood spacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language. For additional details, see the linked project pages for PYCLD2 and CLD2. %package help Summary: Development documents and examples for spacy-cld Provides: python3-spacy-cld-doc %description help # spaCy-CLD: Bringing simple language detection to spaCy ## Installation `pip install spacy_cld` ## Usage Adding the spaCy-CLD component to the processing pipeline is relatively simple: ``` import spacy from spacy_cld import LanguageDetector nlp = spacy.load('en') language_detector = LanguageDetector() nlp.add_pipe(language_detector) doc = nlp('This is some English text.') doc._.languages # ['en'] doc._.language_scores['en'] # 0.96 ``` spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1). ## Under the hood spacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language. For additional details, see the linked project pages for PYCLD2 and CLD2. %prep %autosetup -n spacy-cld-0.1.0 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-spacy-cld -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue Apr 25 2023 Python_Bot - 0.1.0-1 - Package Spec generated