diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-05-05 11:11:55 +0000 |
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-05-05 11:11:55 +0000 |
| commit | 9ca9b8d2eff00bf4521bb3e27dd90e7dc9e9a097 (patch) | |
| tree | d227ad4852ad9976f09bb05d6a115973899100c9 | |
| parent | 2938b3d0baf14338c43c708ce639d51dfc62bb48 (diff) | |
automatic import of python-spacymojiopeneuler20.03
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-spacymoji.spec | 386 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 388 insertions, 0 deletions
@@ -0,0 +1 @@ +/spacymoji-3.0.1.tar.gz diff --git a/python-spacymoji.spec b/python-spacymoji.spec new file mode 100644 index 0000000..bfe5e8c --- /dev/null +++ b/python-spacymoji.spec @@ -0,0 +1,386 @@ +%global _empty_manifest_terminate_build 0 +Name: python-spacymoji +Version: 3.0.1 +Release: 1 +Summary: spaCy pipeline component for adding emoji meta data to Doc, Token and Span objects +License: MIT +URL: https://github.com/explosion/spacymoji +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/2d/69/91125a437c48a2c5d40ff89a7adc659dcc4e371223f83540bf1ae990ffd3/spacymoji-3.0.1.tar.gz +BuildArch: noarch + +Requires: python3-spacy +Requires: python3-emoji + +%description +# spacymoji: emoji for spaCy + +[spaCy](https://spacy.io) extension and pipeline component +for adding emoji meta data to `Doc` objects. Detects emoji consisting of one +or more unicode characters, and can optionally merge multi-char emoji (combined +pictures, emoji with skin tone modifiers) into one token. Human-readable emoji +descriptions are added as a custom attribute, and an optional lookup table can +be provided for your own descriptions. The extension sets the custom `Doc`, +`Token` and `Span` attributes `._.is_emoji`, `._.emoji_desc`, `._.has_emoji` and `._.emoji`. You can read more about custom pipeline components and extension attributes [here](https://spacy.io/usage/processing-pipelines). + +Emoji are matched using spaCy's [`PhraseMatcher`](https://spacy.io/api/phrasematcher), and looked up in the data +table provided by the [`emoji` package](https://github.com/carpedm20/emoji). + +[](https://dev.azure.com/explosion-ai/public/_build?definitionId=22) +[](https://github.com/explosion/spacymoji/releases) +[](https://pypi.org/project/spacymoji/) + +# ⏳ Installation + +`spacymoji` requires `spacy` v3.0.0 or higher. For spaCy v2.x, instally `spacymoji==2.0.0`. + +```bash +pip install spacymoji +``` + +# ☝️ Usage + +Import the component and add it anywhere in your pipeline using the string +name of the `"emoji"` component factory: + +```python +import spacy + +nlp = spacy.load("en_core_web_sm") +nlp.add_pipe("emoji", first=True) +doc = nlp("This is a test 😻 👍🏿") +assert doc._.has_emoji is True +assert doc[2:5]._.has_emoji is True +assert doc[0]._.is_emoji is False +assert doc[4]._.is_emoji is True +assert doc[5]._.emoji_desc == "thumbs up dark skin tone" +assert len(doc._.emoji) == 2 +assert doc._.emoji[1] == ("👍🏿", 5, "thumbs up dark skin tone") +``` + +`spacymoji` only cares about the token text, so you can use it on a blank +`Language` instance (it should work for all +[available languages](https://spacy.io/usage/models#languages)!), or in +a pipeline with a loaded pipeline. If your pipeline +includes a tagger, parser and entity recognizer, make sure to add the emoji +component as `first=True`, so the spans are merged right after tokenization, +and _before_ the document is parsed. If your text contains a lot of emoji, this +might even give you a nice boost in parser accuracy. + +## Available attributes + +The extension sets attributes on the `Doc`, `Span` and `Token`. You can +change the attribute names (and other parameters of the Emoji component) by passing +them via the `config` parameter in the `nlp.add_pipe(...)` method. For more details +on custom components and attributes, see the +[processing pipelines documentation](https://spacy.io/usage/processing-pipelines#custom-components). + +| Attribute | Type | Description | +| -------------------- | -------------------------- | ------------------------------------------------------------- | +| `Token._.is_emoji` | bool | Whether the token is an emoji. | +| `Token._.emoji_desc` | str | A human-readable description of the emoji. | +| `Doc._.has_emoji` | bool | Whether the document contains emoji. | +| `Doc._.emoji` | List[Tuple[str, int, str]] | `(emoji, index, description)` tuples of the document's emoji. | +| `Span._.has_emoji` | bool | Whether the span contains emoji. | +| `Span._.emoji` | List[Tuple[str, int, str]] | `(emoji, index, description)` tuples of the span's emoji. | + +## Settings + +You can configure the `emoji` factory by setting any of the following parameters in +the `config` dictionary: + +| Setting | Type | Description | +| ------------- | ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | +| `attrs` | Tuple[str, str, str, str] | Attributes to set on the `._` property. Defaults to `('has_emoji', 'is_emoji', 'emoji_desc', 'emoji')`. | +| `pattern_id` | str | ID of match pattern, defaults to `'EMOJI'`. Can be changed to avoid ID conflicts. | +| `merge_spans` | bool | Merge spans containing multi-character emoji, defaults to `True`. Will only merge combined emoji resulting in one icon, not sequences. | +| `lookup` | Dict[str, str] | Optional lookup table that maps emoji strings to custom descriptions, e.g. translations or other annotations. | + +```python +emoji_config = {"attrs": ("has_e", "is_e", "e_desc", "e"), lookup={"👨🎤": "David Bowie"}) +nlp.add_pipe(emoji, first=True, config=emoji_config) +doc = nlp("We can be 👨🎤 heroes") +assert doc[3]._.is_e +assert doc[3]._.e_desc == "David Bowie" +``` + +If you're training a pipeline, you can define the component config in your [`config.cfg`](https://spacy.io/usage/training): + +```ini +[nlp] +pipeline = ["emoji", "ner"] +# ... + +[components.emoji] +factory = "emoji" +merge_spans = false +``` + + + + +%package -n python3-spacymoji +Summary: spaCy pipeline component for adding emoji meta data to Doc, Token and Span objects +Provides: python-spacymoji +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-spacymoji +# spacymoji: emoji for spaCy + +[spaCy](https://spacy.io) extension and pipeline component +for adding emoji meta data to `Doc` objects. Detects emoji consisting of one +or more unicode characters, and can optionally merge multi-char emoji (combined +pictures, emoji with skin tone modifiers) into one token. Human-readable emoji +descriptions are added as a custom attribute, and an optional lookup table can +be provided for your own descriptions. The extension sets the custom `Doc`, +`Token` and `Span` attributes `._.is_emoji`, `._.emoji_desc`, `._.has_emoji` and `._.emoji`. You can read more about custom pipeline components and extension attributes [here](https://spacy.io/usage/processing-pipelines). + +Emoji are matched using spaCy's [`PhraseMatcher`](https://spacy.io/api/phrasematcher), and looked up in the data +table provided by the [`emoji` package](https://github.com/carpedm20/emoji). + +[](https://dev.azure.com/explosion-ai/public/_build?definitionId=22) +[](https://github.com/explosion/spacymoji/releases) +[](https://pypi.org/project/spacymoji/) + +# ⏳ Installation + +`spacymoji` requires `spacy` v3.0.0 or higher. For spaCy v2.x, instally `spacymoji==2.0.0`. + +```bash +pip install spacymoji +``` + +# ☝️ Usage + +Import the component and add it anywhere in your pipeline using the string +name of the `"emoji"` component factory: + +```python +import spacy + +nlp = spacy.load("en_core_web_sm") +nlp.add_pipe("emoji", first=True) +doc = nlp("This is a test 😻 👍🏿") +assert doc._.has_emoji is True +assert doc[2:5]._.has_emoji is True +assert doc[0]._.is_emoji is False +assert doc[4]._.is_emoji is True +assert doc[5]._.emoji_desc == "thumbs up dark skin tone" +assert len(doc._.emoji) == 2 +assert doc._.emoji[1] == ("👍🏿", 5, "thumbs up dark skin tone") +``` + +`spacymoji` only cares about the token text, so you can use it on a blank +`Language` instance (it should work for all +[available languages](https://spacy.io/usage/models#languages)!), or in +a pipeline with a loaded pipeline. If your pipeline +includes a tagger, parser and entity recognizer, make sure to add the emoji +component as `first=True`, so the spans are merged right after tokenization, +and _before_ the document is parsed. If your text contains a lot of emoji, this +might even give you a nice boost in parser accuracy. + +## Available attributes + +The extension sets attributes on the `Doc`, `Span` and `Token`. You can +change the attribute names (and other parameters of the Emoji component) by passing +them via the `config` parameter in the `nlp.add_pipe(...)` method. For more details +on custom components and attributes, see the +[processing pipelines documentation](https://spacy.io/usage/processing-pipelines#custom-components). + +| Attribute | Type | Description | +| -------------------- | -------------------------- | ------------------------------------------------------------- | +| `Token._.is_emoji` | bool | Whether the token is an emoji. | +| `Token._.emoji_desc` | str | A human-readable description of the emoji. | +| `Doc._.has_emoji` | bool | Whether the document contains emoji. | +| `Doc._.emoji` | List[Tuple[str, int, str]] | `(emoji, index, description)` tuples of the document's emoji. | +| `Span._.has_emoji` | bool | Whether the span contains emoji. | +| `Span._.emoji` | List[Tuple[str, int, str]] | `(emoji, index, description)` tuples of the span's emoji. | + +## Settings + +You can configure the `emoji` factory by setting any of the following parameters in +the `config` dictionary: + +| Setting | Type | Description | +| ------------- | ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | +| `attrs` | Tuple[str, str, str, str] | Attributes to set on the `._` property. Defaults to `('has_emoji', 'is_emoji', 'emoji_desc', 'emoji')`. | +| `pattern_id` | str | ID of match pattern, defaults to `'EMOJI'`. Can be changed to avoid ID conflicts. | +| `merge_spans` | bool | Merge spans containing multi-character emoji, defaults to `True`. Will only merge combined emoji resulting in one icon, not sequences. | +| `lookup` | Dict[str, str] | Optional lookup table that maps emoji strings to custom descriptions, e.g. translations or other annotations. | + +```python +emoji_config = {"attrs": ("has_e", "is_e", "e_desc", "e"), lookup={"👨🎤": "David Bowie"}) +nlp.add_pipe(emoji, first=True, config=emoji_config) +doc = nlp("We can be 👨🎤 heroes") +assert doc[3]._.is_e +assert doc[3]._.e_desc == "David Bowie" +``` + +If you're training a pipeline, you can define the component config in your [`config.cfg`](https://spacy.io/usage/training): + +```ini +[nlp] +pipeline = ["emoji", "ner"] +# ... + +[components.emoji] +factory = "emoji" +merge_spans = false +``` + + + + +%package help +Summary: Development documents and examples for spacymoji +Provides: python3-spacymoji-doc +%description help +# spacymoji: emoji for spaCy + +[spaCy](https://spacy.io) extension and pipeline component +for adding emoji meta data to `Doc` objects. Detects emoji consisting of one +or more unicode characters, and can optionally merge multi-char emoji (combined +pictures, emoji with skin tone modifiers) into one token. Human-readable emoji +descriptions are added as a custom attribute, and an optional lookup table can +be provided for your own descriptions. The extension sets the custom `Doc`, +`Token` and `Span` attributes `._.is_emoji`, `._.emoji_desc`, `._.has_emoji` and `._.emoji`. You can read more about custom pipeline components and extension attributes [here](https://spacy.io/usage/processing-pipelines). + +Emoji are matched using spaCy's [`PhraseMatcher`](https://spacy.io/api/phrasematcher), and looked up in the data +table provided by the [`emoji` package](https://github.com/carpedm20/emoji). + +[](https://dev.azure.com/explosion-ai/public/_build?definitionId=22) +[](https://github.com/explosion/spacymoji/releases) +[](https://pypi.org/project/spacymoji/) + +# ⏳ Installation + +`spacymoji` requires `spacy` v3.0.0 or higher. For spaCy v2.x, instally `spacymoji==2.0.0`. + +```bash +pip install spacymoji +``` + +# ☝️ Usage + +Import the component and add it anywhere in your pipeline using the string +name of the `"emoji"` component factory: + +```python +import spacy + +nlp = spacy.load("en_core_web_sm") +nlp.add_pipe("emoji", first=True) +doc = nlp("This is a test 😻 👍🏿") +assert doc._.has_emoji is True +assert doc[2:5]._.has_emoji is True +assert doc[0]._.is_emoji is False +assert doc[4]._.is_emoji is True +assert doc[5]._.emoji_desc == "thumbs up dark skin tone" +assert len(doc._.emoji) == 2 +assert doc._.emoji[1] == ("👍🏿", 5, "thumbs up dark skin tone") +``` + +`spacymoji` only cares about the token text, so you can use it on a blank +`Language` instance (it should work for all +[available languages](https://spacy.io/usage/models#languages)!), or in +a pipeline with a loaded pipeline. If your pipeline +includes a tagger, parser and entity recognizer, make sure to add the emoji +component as `first=True`, so the spans are merged right after tokenization, +and _before_ the document is parsed. If your text contains a lot of emoji, this +might even give you a nice boost in parser accuracy. + +## Available attributes + +The extension sets attributes on the `Doc`, `Span` and `Token`. You can +change the attribute names (and other parameters of the Emoji component) by passing +them via the `config` parameter in the `nlp.add_pipe(...)` method. For more details +on custom components and attributes, see the +[processing pipelines documentation](https://spacy.io/usage/processing-pipelines#custom-components). + +| Attribute | Type | Description | +| -------------------- | -------------------------- | ------------------------------------------------------------- | +| `Token._.is_emoji` | bool | Whether the token is an emoji. | +| `Token._.emoji_desc` | str | A human-readable description of the emoji. | +| `Doc._.has_emoji` | bool | Whether the document contains emoji. | +| `Doc._.emoji` | List[Tuple[str, int, str]] | `(emoji, index, description)` tuples of the document's emoji. | +| `Span._.has_emoji` | bool | Whether the span contains emoji. | +| `Span._.emoji` | List[Tuple[str, int, str]] | `(emoji, index, description)` tuples of the span's emoji. | + +## Settings + +You can configure the `emoji` factory by setting any of the following parameters in +the `config` dictionary: + +| Setting | Type | Description | +| ------------- | ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | +| `attrs` | Tuple[str, str, str, str] | Attributes to set on the `._` property. Defaults to `('has_emoji', 'is_emoji', 'emoji_desc', 'emoji')`. | +| `pattern_id` | str | ID of match pattern, defaults to `'EMOJI'`. Can be changed to avoid ID conflicts. | +| `merge_spans` | bool | Merge spans containing multi-character emoji, defaults to `True`. Will only merge combined emoji resulting in one icon, not sequences. | +| `lookup` | Dict[str, str] | Optional lookup table that maps emoji strings to custom descriptions, e.g. translations or other annotations. | + +```python +emoji_config = {"attrs": ("has_e", "is_e", "e_desc", "e"), lookup={"👨🎤": "David Bowie"}) +nlp.add_pipe(emoji, first=True, config=emoji_config) +doc = nlp("We can be 👨🎤 heroes") +assert doc[3]._.is_e +assert doc[3]._.e_desc == "David Bowie" +``` + +If you're training a pipeline, you can define the component config in your [`config.cfg`](https://spacy.io/usage/training): + +```ini +[nlp] +pipeline = ["emoji", "ner"] +# ... + +[components.emoji] +factory = "emoji" +merge_spans = false +``` + + + + +%prep +%autosetup -n spacymoji-3.0.1 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-spacymoji -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 3.0.1-1 +- Package Spec generated @@ -0,0 +1 @@ +2c81778d55a5603b891e2c8edf4a4ade spacymoji-3.0.1.tar.gz |
