summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-spacy-stanza.spec680
-rw-r--r--sources1
3 files changed, 682 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..0ed979d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/spacy-stanza-1.0.3.tar.gz
diff --git a/python-spacy-stanza.spec b/python-spacy-stanza.spec
new file mode 100644
index 0000000..6fc0840
--- /dev/null
+++ b/python-spacy-stanza.spec
@@ -0,0 +1,680 @@
+%global _empty_manifest_terminate_build 0
+Name: python-spacy-stanza
+Version: 1.0.3
+Release: 1
+Summary: Use the latest Stanza (StanfordNLP) research models directly in spaCy
+License: MIT
+URL: https://explosion.ai
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/95/54/b30f48236ab4701a52e4304a69e01b561f8ac5778f3defa14dc720644461/spacy-stanza-1.0.3.tar.gz
+BuildArch: noarch
+
+Requires: python3-spacy
+Requires: python3-stanza
+
+%description
+<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
+
+# spaCy + Stanza (formerly StanfordNLP)
+
+This package wraps the [Stanza](https://github.com/stanfordnlp/stanza)
+(formerly StanfordNLP) library, so you can use Stanford's models in a
+[spaCy](https://spacy.io) pipeline. The Stanford models achieved top accuracy
+in the CoNLL 2017 and 2018 shared task, which involves tokenization,
+part-of-speech tagging, morphological analysis, lemmatization and labeled
+dependency parsing in 68 languages. As of v1.0, Stanza also supports named
+entity recognition for selected languages.
+
+> ⚠️ Previous version of this package were available as
+> [`spacy-stanfordnlp`](https://pypi.python.org/pypi/spacy-stanfordnlp).
+
+[![Azure Pipelines](https://img.shields.io/azure-devops/build/explosion-ai/public/17/master.svg?logo=azure-pipelines&style=flat-square)](https://dev.azure.com/explosion-ai/public/_build?definitionId=17)
+[![PyPi](https://img.shields.io/pypi/v/spacy-stanza.svg?style=flat-square)](https://pypi.python.org/pypi/spacy-stanza)
+[![GitHub](https://img.shields.io/github/release/explosion/spacy-stanza/all.svg?style=flat-square)](https://github.com/explosion/spacy-stanza)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)
+
+Using this wrapper, you'll be able to use the following annotations, computed by
+your pretrained `stanza` model:
+
+- Statistical tokenization (reflected in the `Doc` and its tokens)
+- Lemmatization (`token.lemma` and `token.lemma_`)
+- Part-of-speech tagging (`token.tag`, `token.tag_`, `token.pos`, `token.pos_`)
+- Morphological analysis (`token.morph`)
+- Dependency parsing (`token.dep`, `token.dep_`, `token.head`)
+- Named entity recognition (`doc.ents`, `token.ent_type`, `token.ent_type_`, `token.ent_iob`, `token.ent_iob_`)
+- Sentence segmentation (`doc.sents`)
+
+## ️️️⌛️ Installation
+
+As of v1.0.0 `spacy-stanza` is only compatible with **spaCy v3.x**. To install
+the most recent version:
+
+```bash
+pip install spacy-stanza
+```
+
+For spaCy v2, install v0.2.x and refer to the [v0.2.x usage
+documentation](https://github.com/explosion/spacy-stanza/tree/v0.2.x#-usage--examples):
+
+```bash
+pip install "spacy-stanza<0.3.0"
+```
+
+Make sure to also
+[download](https://stanfordnlp.github.io/stanza/download_models.html) one of
+the [pre-trained Stanza
+models](https://stanfordnlp.github.io/stanza/models.html).
+
+## 📖 Usage & Examples
+
+> ⚠️ **Important note:** This package has been refactored to take advantage of
+> [spaCy v3.0](https://spacy.io). Previous versions that were built for [spaCy
+> v2.x](https://v2.spacy.io) worked considerably differently. Please see
+> previous tagged versions of this README for documentation on prior versions.
+
+Use `spacy_stanza.load_pipeline()` to create an `nlp` object that you can use to
+process a text with a Stanza pipeline and create a spaCy [`Doc`
+object](https://spacy.io/api/doc). By default, both the spaCy pipeline and the
+Stanza pipeline will be initialized with the same `lang`, e.g. "en":
+
+```python
+import stanza
+import spacy_stanza
+
+# Download the stanza model if necessary
+stanza.download("en")
+
+# Initialize the pipeline
+nlp = spacy_stanza.load_pipeline("en")
+
+doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
+for token in doc:
+ print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
+print(doc.ents)
+```
+
+If language data for the given language is available in spaCy, the respective
+language class can be used as the base for the `nlp` object – for example,
+`English()`. This lets you use spaCy's lexical attributes like `is_stop` or
+`like_num`. The `nlp` object follows the same API as any other spaCy `Language`
+class – so you can visualize the `Doc` objects with displaCy, add custom
+components to the pipeline, use the rule-based matcher and do pretty much
+anything else you'd normally do in spaCy.
+
+```python
+# Access spaCy's lexical attributes
+print([token.is_stop for token in doc])
+print([token.like_num for token in doc])
+
+# Visualize dependencies
+from spacy import displacy
+displacy.serve(doc) # or displacy.render if you're in a Jupyter notebook
+
+# Process texts with nlp.pipe
+for doc in nlp.pipe(["Lots of texts", "Even more texts", "..."]):
+ print(doc.text)
+
+# Combine with your own custom pipeline components
+from spacy import Language
+@Language.component("custom_component")
+def custom_component(doc):
+ # Do something to the doc here
+ print(f"Custom component called: {doc.text}")
+ return doc
+
+nlp.add_pipe("custom_component")
+doc = nlp("Some text")
+
+# Serialize attributes to a numpy array
+np_array = doc.to_array(['ORTH', 'LEMMA', 'POS'])
+```
+
+### Stanza Pipeline options
+
+Additional options for the Stanza
+[`Pipeline`](https://stanfordnlp.github.io/stanza/pipeline.html#pipeline) can be
+provided as keyword arguments following the `Pipeline` API:
+
+- Provide the Stanza language as `lang`. For Stanza languages without spaCy
+ support, use "xx" for the spaCy language setting:
+
+ ```python
+ # Initialize a pipeline for Coptic
+ nlp = spacy_stanza.load_pipeline("xx", lang="cop")
+ ```
+
+- Provide Stanza pipeline settings following the `Pipeline` API:
+
+ ```python
+ # Initialize a German pipeline with the `hdt` package
+ nlp = spacy_stanza.load_pipeline("de", package="hdt")
+ ```
+
+- Tokenize with spaCy rather than the statistical tokenizer (only for English):
+
+ ```python
+ nlp = spacy_stanza.load_pipeline("en", processors= {"tokenize": "spacy"})
+ ```
+
+- Provide any additional processor settings as additional keyword arguments:
+
+ ```python
+ # Provide pretokenized texts (whitespace tokenization)
+ nlp = spacy_stanza.load_pipeline("de", tokenize_pretokenized=True)
+ ```
+
+The spaCy config specifies all `Pipeline` options in the `[nlp.tokenizer]`
+block. For example, the config for the last example above, a German pipeline
+with pretokenized texts:
+
+```ini
+[nlp.tokenizer]
+@tokenizers = "spacy_stanza.PipelineAsTokenizer.v1"
+lang = "de"
+dir = null
+package = "default"
+logging_level = null
+verbose = null
+use_gpu = true
+
+[nlp.tokenizer.kwargs]
+tokenize_pretokenized = true
+
+[nlp.tokenizer.processors]
+```
+
+### Serialization
+
+The full Stanza pipeline configuration is stored in the spaCy pipeline
+[config](https://spacy.io/usage/training#config), so you can save and load the
+pipeline just like any other `nlp` pipeline:
+
+```python
+# Save to a local directory
+nlp.to_disk("./stanza-spacy-model")
+
+# Reload the pipeline
+nlp = spacy.load("./stanza-spacy-model")
+```
+
+Note that this **does not save any Stanza model data by default**. The Stanza
+models are very large, so for now, this package expects you to download the
+models separately with `stanza.download()` and have them available either in
+the default model directory or in the path specified under
+`[nlp.tokenizer.dir]` in the config.
+
+### Adding additional spaCy pipeline components
+
+By default, the spaCy pipeline in the `nlp` object returned by
+`spacy_stanza.load_pipeline()` will be empty, because all `stanza` attributes
+are computed and set within the custom tokenizer,
+[`StanzaTokenizer`](spacy_stanza/tokenizer.py). But since it's a regular `nlp`
+object, you can add your own components to the pipeline. For example, you could
+add [your own custom text classification
+component](https://spacy.io/usage/training) with `nlp.add_pipe("textcat",
+source=source_nlp)`, or augment the named entities with your own rule-based
+patterns using the [`EntityRuler`
+component](https://spacy.io/usage/rule-based-matching#entityruler).
+
+
+%package -n python3-spacy-stanza
+Summary: Use the latest Stanza (StanfordNLP) research models directly in spaCy
+Provides: python-spacy-stanza
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-spacy-stanza
+<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
+
+# spaCy + Stanza (formerly StanfordNLP)
+
+This package wraps the [Stanza](https://github.com/stanfordnlp/stanza)
+(formerly StanfordNLP) library, so you can use Stanford's models in a
+[spaCy](https://spacy.io) pipeline. The Stanford models achieved top accuracy
+in the CoNLL 2017 and 2018 shared task, which involves tokenization,
+part-of-speech tagging, morphological analysis, lemmatization and labeled
+dependency parsing in 68 languages. As of v1.0, Stanza also supports named
+entity recognition for selected languages.
+
+> ⚠️ Previous version of this package were available as
+> [`spacy-stanfordnlp`](https://pypi.python.org/pypi/spacy-stanfordnlp).
+
+[![Azure Pipelines](https://img.shields.io/azure-devops/build/explosion-ai/public/17/master.svg?logo=azure-pipelines&style=flat-square)](https://dev.azure.com/explosion-ai/public/_build?definitionId=17)
+[![PyPi](https://img.shields.io/pypi/v/spacy-stanza.svg?style=flat-square)](https://pypi.python.org/pypi/spacy-stanza)
+[![GitHub](https://img.shields.io/github/release/explosion/spacy-stanza/all.svg?style=flat-square)](https://github.com/explosion/spacy-stanza)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)
+
+Using this wrapper, you'll be able to use the following annotations, computed by
+your pretrained `stanza` model:
+
+- Statistical tokenization (reflected in the `Doc` and its tokens)
+- Lemmatization (`token.lemma` and `token.lemma_`)
+- Part-of-speech tagging (`token.tag`, `token.tag_`, `token.pos`, `token.pos_`)
+- Morphological analysis (`token.morph`)
+- Dependency parsing (`token.dep`, `token.dep_`, `token.head`)
+- Named entity recognition (`doc.ents`, `token.ent_type`, `token.ent_type_`, `token.ent_iob`, `token.ent_iob_`)
+- Sentence segmentation (`doc.sents`)
+
+## ️️️⌛️ Installation
+
+As of v1.0.0 `spacy-stanza` is only compatible with **spaCy v3.x**. To install
+the most recent version:
+
+```bash
+pip install spacy-stanza
+```
+
+For spaCy v2, install v0.2.x and refer to the [v0.2.x usage
+documentation](https://github.com/explosion/spacy-stanza/tree/v0.2.x#-usage--examples):
+
+```bash
+pip install "spacy-stanza<0.3.0"
+```
+
+Make sure to also
+[download](https://stanfordnlp.github.io/stanza/download_models.html) one of
+the [pre-trained Stanza
+models](https://stanfordnlp.github.io/stanza/models.html).
+
+## 📖 Usage & Examples
+
+> ⚠️ **Important note:** This package has been refactored to take advantage of
+> [spaCy v3.0](https://spacy.io). Previous versions that were built for [spaCy
+> v2.x](https://v2.spacy.io) worked considerably differently. Please see
+> previous tagged versions of this README for documentation on prior versions.
+
+Use `spacy_stanza.load_pipeline()` to create an `nlp` object that you can use to
+process a text with a Stanza pipeline and create a spaCy [`Doc`
+object](https://spacy.io/api/doc). By default, both the spaCy pipeline and the
+Stanza pipeline will be initialized with the same `lang`, e.g. "en":
+
+```python
+import stanza
+import spacy_stanza
+
+# Download the stanza model if necessary
+stanza.download("en")
+
+# Initialize the pipeline
+nlp = spacy_stanza.load_pipeline("en")
+
+doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
+for token in doc:
+ print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
+print(doc.ents)
+```
+
+If language data for the given language is available in spaCy, the respective
+language class can be used as the base for the `nlp` object – for example,
+`English()`. This lets you use spaCy's lexical attributes like `is_stop` or
+`like_num`. The `nlp` object follows the same API as any other spaCy `Language`
+class – so you can visualize the `Doc` objects with displaCy, add custom
+components to the pipeline, use the rule-based matcher and do pretty much
+anything else you'd normally do in spaCy.
+
+```python
+# Access spaCy's lexical attributes
+print([token.is_stop for token in doc])
+print([token.like_num for token in doc])
+
+# Visualize dependencies
+from spacy import displacy
+displacy.serve(doc) # or displacy.render if you're in a Jupyter notebook
+
+# Process texts with nlp.pipe
+for doc in nlp.pipe(["Lots of texts", "Even more texts", "..."]):
+ print(doc.text)
+
+# Combine with your own custom pipeline components
+from spacy import Language
+@Language.component("custom_component")
+def custom_component(doc):
+ # Do something to the doc here
+ print(f"Custom component called: {doc.text}")
+ return doc
+
+nlp.add_pipe("custom_component")
+doc = nlp("Some text")
+
+# Serialize attributes to a numpy array
+np_array = doc.to_array(['ORTH', 'LEMMA', 'POS'])
+```
+
+### Stanza Pipeline options
+
+Additional options for the Stanza
+[`Pipeline`](https://stanfordnlp.github.io/stanza/pipeline.html#pipeline) can be
+provided as keyword arguments following the `Pipeline` API:
+
+- Provide the Stanza language as `lang`. For Stanza languages without spaCy
+ support, use "xx" for the spaCy language setting:
+
+ ```python
+ # Initialize a pipeline for Coptic
+ nlp = spacy_stanza.load_pipeline("xx", lang="cop")
+ ```
+
+- Provide Stanza pipeline settings following the `Pipeline` API:
+
+ ```python
+ # Initialize a German pipeline with the `hdt` package
+ nlp = spacy_stanza.load_pipeline("de", package="hdt")
+ ```
+
+- Tokenize with spaCy rather than the statistical tokenizer (only for English):
+
+ ```python
+ nlp = spacy_stanza.load_pipeline("en", processors= {"tokenize": "spacy"})
+ ```
+
+- Provide any additional processor settings as additional keyword arguments:
+
+ ```python
+ # Provide pretokenized texts (whitespace tokenization)
+ nlp = spacy_stanza.load_pipeline("de", tokenize_pretokenized=True)
+ ```
+
+The spaCy config specifies all `Pipeline` options in the `[nlp.tokenizer]`
+block. For example, the config for the last example above, a German pipeline
+with pretokenized texts:
+
+```ini
+[nlp.tokenizer]
+@tokenizers = "spacy_stanza.PipelineAsTokenizer.v1"
+lang = "de"
+dir = null
+package = "default"
+logging_level = null
+verbose = null
+use_gpu = true
+
+[nlp.tokenizer.kwargs]
+tokenize_pretokenized = true
+
+[nlp.tokenizer.processors]
+```
+
+### Serialization
+
+The full Stanza pipeline configuration is stored in the spaCy pipeline
+[config](https://spacy.io/usage/training#config), so you can save and load the
+pipeline just like any other `nlp` pipeline:
+
+```python
+# Save to a local directory
+nlp.to_disk("./stanza-spacy-model")
+
+# Reload the pipeline
+nlp = spacy.load("./stanza-spacy-model")
+```
+
+Note that this **does not save any Stanza model data by default**. The Stanza
+models are very large, so for now, this package expects you to download the
+models separately with `stanza.download()` and have them available either in
+the default model directory or in the path specified under
+`[nlp.tokenizer.dir]` in the config.
+
+### Adding additional spaCy pipeline components
+
+By default, the spaCy pipeline in the `nlp` object returned by
+`spacy_stanza.load_pipeline()` will be empty, because all `stanza` attributes
+are computed and set within the custom tokenizer,
+[`StanzaTokenizer`](spacy_stanza/tokenizer.py). But since it's a regular `nlp`
+object, you can add your own components to the pipeline. For example, you could
+add [your own custom text classification
+component](https://spacy.io/usage/training) with `nlp.add_pipe("textcat",
+source=source_nlp)`, or augment the named entities with your own rule-based
+patterns using the [`EntityRuler`
+component](https://spacy.io/usage/rule-based-matching#entityruler).
+
+
+%package help
+Summary: Development documents and examples for spacy-stanza
+Provides: python3-spacy-stanza-doc
+%description help
+<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
+
+# spaCy + Stanza (formerly StanfordNLP)
+
+This package wraps the [Stanza](https://github.com/stanfordnlp/stanza)
+(formerly StanfordNLP) library, so you can use Stanford's models in a
+[spaCy](https://spacy.io) pipeline. The Stanford models achieved top accuracy
+in the CoNLL 2017 and 2018 shared task, which involves tokenization,
+part-of-speech tagging, morphological analysis, lemmatization and labeled
+dependency parsing in 68 languages. As of v1.0, Stanza also supports named
+entity recognition for selected languages.
+
+> ⚠️ Previous version of this package were available as
+> [`spacy-stanfordnlp`](https://pypi.python.org/pypi/spacy-stanfordnlp).
+
+[![Azure Pipelines](https://img.shields.io/azure-devops/build/explosion-ai/public/17/master.svg?logo=azure-pipelines&style=flat-square)](https://dev.azure.com/explosion-ai/public/_build?definitionId=17)
+[![PyPi](https://img.shields.io/pypi/v/spacy-stanza.svg?style=flat-square)](https://pypi.python.org/pypi/spacy-stanza)
+[![GitHub](https://img.shields.io/github/release/explosion/spacy-stanza/all.svg?style=flat-square)](https://github.com/explosion/spacy-stanza)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)
+
+Using this wrapper, you'll be able to use the following annotations, computed by
+your pretrained `stanza` model:
+
+- Statistical tokenization (reflected in the `Doc` and its tokens)
+- Lemmatization (`token.lemma` and `token.lemma_`)
+- Part-of-speech tagging (`token.tag`, `token.tag_`, `token.pos`, `token.pos_`)
+- Morphological analysis (`token.morph`)
+- Dependency parsing (`token.dep`, `token.dep_`, `token.head`)
+- Named entity recognition (`doc.ents`, `token.ent_type`, `token.ent_type_`, `token.ent_iob`, `token.ent_iob_`)
+- Sentence segmentation (`doc.sents`)
+
+## ️️️⌛️ Installation
+
+As of v1.0.0 `spacy-stanza` is only compatible with **spaCy v3.x**. To install
+the most recent version:
+
+```bash
+pip install spacy-stanza
+```
+
+For spaCy v2, install v0.2.x and refer to the [v0.2.x usage
+documentation](https://github.com/explosion/spacy-stanza/tree/v0.2.x#-usage--examples):
+
+```bash
+pip install "spacy-stanza<0.3.0"
+```
+
+Make sure to also
+[download](https://stanfordnlp.github.io/stanza/download_models.html) one of
+the [pre-trained Stanza
+models](https://stanfordnlp.github.io/stanza/models.html).
+
+## 📖 Usage & Examples
+
+> ⚠️ **Important note:** This package has been refactored to take advantage of
+> [spaCy v3.0](https://spacy.io). Previous versions that were built for [spaCy
+> v2.x](https://v2.spacy.io) worked considerably differently. Please see
+> previous tagged versions of this README for documentation on prior versions.
+
+Use `spacy_stanza.load_pipeline()` to create an `nlp` object that you can use to
+process a text with a Stanza pipeline and create a spaCy [`Doc`
+object](https://spacy.io/api/doc). By default, both the spaCy pipeline and the
+Stanza pipeline will be initialized with the same `lang`, e.g. "en":
+
+```python
+import stanza
+import spacy_stanza
+
+# Download the stanza model if necessary
+stanza.download("en")
+
+# Initialize the pipeline
+nlp = spacy_stanza.load_pipeline("en")
+
+doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
+for token in doc:
+ print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
+print(doc.ents)
+```
+
+If language data for the given language is available in spaCy, the respective
+language class can be used as the base for the `nlp` object – for example,
+`English()`. This lets you use spaCy's lexical attributes like `is_stop` or
+`like_num`. The `nlp` object follows the same API as any other spaCy `Language`
+class – so you can visualize the `Doc` objects with displaCy, add custom
+components to the pipeline, use the rule-based matcher and do pretty much
+anything else you'd normally do in spaCy.
+
+```python
+# Access spaCy's lexical attributes
+print([token.is_stop for token in doc])
+print([token.like_num for token in doc])
+
+# Visualize dependencies
+from spacy import displacy
+displacy.serve(doc) # or displacy.render if you're in a Jupyter notebook
+
+# Process texts with nlp.pipe
+for doc in nlp.pipe(["Lots of texts", "Even more texts", "..."]):
+ print(doc.text)
+
+# Combine with your own custom pipeline components
+from spacy import Language
+@Language.component("custom_component")
+def custom_component(doc):
+ # Do something to the doc here
+ print(f"Custom component called: {doc.text}")
+ return doc
+
+nlp.add_pipe("custom_component")
+doc = nlp("Some text")
+
+# Serialize attributes to a numpy array
+np_array = doc.to_array(['ORTH', 'LEMMA', 'POS'])
+```
+
+### Stanza Pipeline options
+
+Additional options for the Stanza
+[`Pipeline`](https://stanfordnlp.github.io/stanza/pipeline.html#pipeline) can be
+provided as keyword arguments following the `Pipeline` API:
+
+- Provide the Stanza language as `lang`. For Stanza languages without spaCy
+ support, use "xx" for the spaCy language setting:
+
+ ```python
+ # Initialize a pipeline for Coptic
+ nlp = spacy_stanza.load_pipeline("xx", lang="cop")
+ ```
+
+- Provide Stanza pipeline settings following the `Pipeline` API:
+
+ ```python
+ # Initialize a German pipeline with the `hdt` package
+ nlp = spacy_stanza.load_pipeline("de", package="hdt")
+ ```
+
+- Tokenize with spaCy rather than the statistical tokenizer (only for English):
+
+ ```python
+ nlp = spacy_stanza.load_pipeline("en", processors= {"tokenize": "spacy"})
+ ```
+
+- Provide any additional processor settings as additional keyword arguments:
+
+ ```python
+ # Provide pretokenized texts (whitespace tokenization)
+ nlp = spacy_stanza.load_pipeline("de", tokenize_pretokenized=True)
+ ```
+
+The spaCy config specifies all `Pipeline` options in the `[nlp.tokenizer]`
+block. For example, the config for the last example above, a German pipeline
+with pretokenized texts:
+
+```ini
+[nlp.tokenizer]
+@tokenizers = "spacy_stanza.PipelineAsTokenizer.v1"
+lang = "de"
+dir = null
+package = "default"
+logging_level = null
+verbose = null
+use_gpu = true
+
+[nlp.tokenizer.kwargs]
+tokenize_pretokenized = true
+
+[nlp.tokenizer.processors]
+```
+
+### Serialization
+
+The full Stanza pipeline configuration is stored in the spaCy pipeline
+[config](https://spacy.io/usage/training#config), so you can save and load the
+pipeline just like any other `nlp` pipeline:
+
+```python
+# Save to a local directory
+nlp.to_disk("./stanza-spacy-model")
+
+# Reload the pipeline
+nlp = spacy.load("./stanza-spacy-model")
+```
+
+Note that this **does not save any Stanza model data by default**. The Stanza
+models are very large, so for now, this package expects you to download the
+models separately with `stanza.download()` and have them available either in
+the default model directory or in the path specified under
+`[nlp.tokenizer.dir]` in the config.
+
+### Adding additional spaCy pipeline components
+
+By default, the spaCy pipeline in the `nlp` object returned by
+`spacy_stanza.load_pipeline()` will be empty, because all `stanza` attributes
+are computed and set within the custom tokenizer,
+[`StanzaTokenizer`](spacy_stanza/tokenizer.py). But since it's a regular `nlp`
+object, you can add your own components to the pipeline. For example, you could
+add [your own custom text classification
+component](https://spacy.io/usage/training) with `nlp.add_pipe("textcat",
+source=source_nlp)`, or augment the named entities with your own rule-based
+patterns using the [`EntityRuler`
+component](https://spacy.io/usage/rule-based-matching#entityruler).
+
+
+%prep
+%autosetup -n spacy-stanza-1.0.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-spacy-stanza -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 15 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..7e5795c
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+91fecbe74c0a2a83a3dcb3029d15c39a spacy-stanza-1.0.3.tar.gz