summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-spacy-conll.spec1084
-rw-r--r--sources1
3 files changed, 1086 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..de9960a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/spacy_conll-3.4.0.tar.gz
diff --git a/python-spacy-conll.spec b/python-spacy-conll.spec
new file mode 100644
index 0000000..b82a6f2
--- /dev/null
+++ b/python-spacy-conll.spec
@@ -0,0 +1,1084 @@
+%global _empty_manifest_terminate_build 0
+Name: python-spacy-conll
+Version: 3.4.0
+Release: 1
+Summary: A custom pipeline component for spaCy that can convert any parsed Doc and its sentences into CoNLL-U format. Also provides a command line entry point.
+License: BSD 2
+URL: https://github.com/BramVanroy/spacy_conll
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/47/d7/d44f97e01ab22c2c21b0c7c6d581c5888a4049dfd2c249f3ee5f3b44426b/spacy_conll-3.4.0.tar.gz
+BuildArch: noarch
+
+Requires: python3-spacy
+Requires: python3-dataclasses
+Requires: python3-pandas
+Requires: python3-spacy-udpipe
+Requires: python3-spacy-stanza
+Requires: python3-pandas
+Requires: python3-spacy-udpipe
+Requires: python3-spacy-stanza
+Requires: python3-pytest
+Requires: python3-flake8
+Requires: python3-isort
+Requires: python3-black
+Requires: python3-pygments
+Requires: python3-spacy-udpipe
+Requires: python3-spacy-stanza
+Requires: python3-pandas
+
+%description
+# Parsing to CoNLL with spaCy, spacy-stanza, and spacy-udpipe
+
+This module allows you to parse text into CoNLL-U format. You can use it as a command line tool, or embed it in your
+ own scripts by adding it as a custom pipeline component to a spaCy, `spacy-stanza`, or `spacy-udpipe` pipeline. It
+ also provides an easy-to-use function to quickly initialize a parser as well as a ConllParser class with built-in
+ functionality to parse files or text.
+
+Note that the module simply takes a parser's output and puts it in a formatted string adhering to the linked ConLL-U
+ format. The output tags depend on the spaCy model used. If you want Universal Depencies tags as output, I advise you
+ to use this library in combination with [spacy-stanza](https://github.com/explosion/spacy-stanza), which is a spaCy
+ interface using `stanza` and its models behind the scenes. Those models use the Universal Dependencies formalism and
+ yield state-of-the-art performance. `stanza` is a new and improved version of `stanfordnlp`. As an alternative to the
+ Stanford models, you can use the spaCy wrapper for `UDPipe`, [spacy-udpipe](https://github.com/TakeLab/spacy-udpipe),
+ which is slightly less accurate than `stanza` but much faster.
+
+
+## Installation
+
+By default, this package automatically installs only [spaCy](https://spacy.io/usage/models#section-quickstart) as
+ dependency. Because [spaCy's models](https://spacy.io/usage/models) are not necessarily trained on Universal
+ Dependencies conventions, their output labels are not UD either. By using `spacy-stanza` or `spacy-udpipe`, we get
+ the easy-to-use interface of spaCy as a wrapper around `stanza` and `UDPipe` respectively, including their models that
+ *are* trained on UD data.
+
+**NOTE**: `spacy-stanza` and `spacy-udpipe` are not installed automatically as a dependency for this library, because
+ it might be too much overhead for those who don't need UD. If you wish to use their functionality, you have to install
+them manually or use one of the available options as described below.
+
+If you want to retrieve CoNLL info as a `pandas` DataFrame, this library will automatically export it if it detects
+ that `pandas` is installed. See the Usage section for more.
+
+To install the library, simply use pip.
+
+```shell
+# only includes spacy by default
+pip install spacy_conll
+```
+
+A number of options are available to make installation of additional dependencies easier:
+
+```shell
+# include spacy-stanza and spacy-udpipe
+pip install spacy_conll[parsers]
+# include pandas
+pip install spacy_conll[pd]
+# include pandas, spacy-stanza and spacy-udpipe
+pip install spacy_conll[all]
+# include pandas, spacy-stanza and spacy-udpipe and additional libaries for testing and formatting
+pip install spacy_conll[dev]
+```
+
+
+## Usage
+
+When the ConllFormatter is added to a spaCy pipeline, it adds CoNLL properties for `Token`, sentence `Span` and `Doc`.
+ Note that arbitrary Span's are not included and do not receive these properties.
+
+On all three of these levels, two custom properties are exposed by default, `._.conll` and its string
+ representation `._.conll_str`. However, if you have `pandas` installed, then `._.conll_pd` will
+ be added automatically, too!
+
+- `._.conll`: raw CoNLL format
+ - in Token: a dictionary containing all the expected CoNLL fields as keys and the parsed properties as values.
+ - in sentence Span: a list of its tokens' `._.conll` dictionaries (list of dictionaries).
+ - in a Doc: a list of its sentences' `._.conll` lists (list of list of dictionaries).
+
+- `._.conll_str`: string representation of the CoNLL format
+ - in Token: tab-separated representation of the contents of the CoNLL fields ending with a newline.
+ - in sentence Span: the expected CoNLL format where each row represents a token. When
+ `ConllFormatter(include_headers=True)` is used, two header lines are included as well, as per the
+ [CoNLL format](https://universaldependencies.org/format.html#sentence-boundaries-and-comments).
+ - in Doc: all its sentences' `._.conll_str` combined and separated by new lines.
+
+- `._.conll_pd`: `pandas` representation of the CoNLL format
+ - in Token: a Series representation of this token's CoNLL properties.
+ - in sentence Span: a DataFrame representation of this sentence, with the CoNLL names as column headers.
+ - in Doc: a concatenation of its sentences' DataFrame's, leading to a new a DataFrame whose index is reset.
+
+You can use `spacy_conll` in your own Python code as a custom pipeline component, or you can use the built-in
+ command-line script which offers typically needed functionality. See the following section for more.
+
+
+### In Python
+
+This library offers the ConllFormatter class which serves as a custom spaCy pipeline component. It can be instantiated
+ as follows. It is important that you import `spacy_conll` before adding the pipe!
+
+```python
+import spacy
+nlp = spacy.load("en_core_web_sm")
+nlp.add_pipe("conll_formatter", last=True)
+```
+
+Because this library supports different spaCy wrappers (`spacy`, `stanza`, and `udpipe`), a convenience function is
+ available as well. With `utils.init_parser` you can easily instantiate a parser with a single line. You can
+ find the function's signature below. Have a look at the [source code](spacy_conll/utils.py) to read more about all the
+ possible arguments or try out the [examples](examples/).
+
+**NOTE**: `is_tokenized` does not work for `spacy-udpipe`. Using `is_tokenized` for `spacy-stanza` also affects sentence
+ segmentation, effectively *only* splitting on new lines. With `spacy`, `is_tokenized` disables sentence splitting completely.
+
+```python
+def init_parser(
+ model_or_lang: str,
+ parser: str,
+ *,
+ is_tokenized: bool = False,
+ disable_sbd: bool = False,
+ exclude_spacy_components: Optional[List[str]] = None,
+ parser_opts: Optional[Dict] = None,
+ **kwargs,
+)
+```
+
+For instance, if you want to load a Dutch `stanza` model in silent mode with the CoNLL formatter already attached, you
+ can simply use the following snippet. `parser_opts` is passed to the `stanza` pipeline initialisation automatically.
+ Any other keyword arguments (`kwargs`), on the other hand, are passed to the `ConllFormatter` initialisation.
+
+```python
+from spacy_conll import init_parser
+
+nlp = init_parser("nl", "stanza", parser_opts={"verbose": False})
+```
+
+The `ConllFormatter` allows you to customize the extension names, and you can also specify conversion maps for the
+output properties.
+
+To illustrate, here is an advanced example, showing the more complex options:
+
+- `ext_names`: changes the attribute names to a custom key by using a dictionary.
+- `conversion_maps`: a two-level dictionary that looks like `{field_name: {tag_name: replacement}}`. In
+ other words, you can specify in which field a certain value should be replaced by another. This is especially useful
+ when you are not satisfied with the tagset of a model and wish to change some tags to an alternative0.
+- `field_names`: allows you to change the default CoNLL-U field names to your own custom names. Similar to the
+ conversion map above, you should use any of the default field names as keys and add your own key as value.
+ Possible keys are : "ID", "FORM", "LEMMA", "UPOS", "XPOS", "FEATS", "HEAD", "DEPREL", "DEPS", "MISC".
+
+The example below
+
+- shows how to manually add the component;
+- changes the custom attribute `conll_pd` to pandas (`conll_pd` only availabe if `pandas` is installed);
+- converts any `nsubj` deprel tag to `subj`.
+
+```python
+import spacy
+
+
+nlp = spacy.load("en_core_web_sm")
+config = {"ext_names": {"conll_pd": "pandas"},
+ "conversion_maps": {"deprel": {"nsubj": "subj"}}}
+nlp.add_pipe("conll_formatter", config=config, last=True)
+doc = nlp("I like cookies.")
+print(doc._.pandas)
+```
+
+This is the same as:
+
+```python
+from spacy_conll import init_parser
+
+nlp = init_parser("en_core_web_sm",
+ "spacy",
+ ext_names={"conll_pd": "pandas"},
+ conversion_maps={"deprel": {"nsubj": "subj"}})
+doc = nlp("I like cookies.")
+print(doc._.pandas)
+```
+
+
+The snippets above will output a pandas DataFrame by using `._.pandas` rather than the standard
+`._.conll_pd`, and all occurrences of `nsubj` in the deprel field are replaced by `subj`.
+
+```
+ ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
+0 1 I I PRON PRP Case=Nom|Number=Sing|Person=1|PronType=Prs 2 subj _ _
+1 2 like like VERB VBP Tense=Pres|VerbForm=Fin 0 ROOT _ _
+2 3 cookies cookie NOUN NNS Number=Plur 2 dobj _ SpaceAfter=No
+3 4 . . PUNCT . PunctType=Peri 2 punct _ SpaceAfter=No
+```
+
+Another initialization example that would replace the column names "UPOS" with "upostag" amd "XPOS" with "xpostag":
+
+```python
+import spacy
+
+
+nlp = spacy.load("en_core_web_sm")
+config = {"field_names": {"UPOS": "upostag", "XPOS": "xpostag"}}
+nlp.add_pipe("conll_formatter", config=config, last=True)
+```
+
+#### Reading CoNLL into a spaCy object
+
+It is possible to read a CoNLL string or text file and parse it as a spaCy object. This can be useful if you have raw
+CoNLL data that you wish to process in different ways. The process is straightforward.
+
+```python
+from spacy_conll import init_parser
+from spacy_conll.parser import ConllParser
+
+
+nlp = ConllParser(init_parser("en_core_web_sm", "spacy"))
+
+doc = nlp.parse_conll_file_as_spacy("path/to/your/conll-sample.txt")
+'''
+or straight from raw text:
+conllstr = """
+# text = From the AP comes this story :
+1 From from ADP IN _ 3 case 3:case _
+2 the the DET DT Definite=Def|PronType=Art 3 det 3:det _
+3 AP AP PROPN NNP Number=Sing 4 obl 4:obl:from _
+4 comes come VERB VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root 0:root _
+5 this this DET DT Number=Sing|PronType=Dem 6 det 6:det _
+6 story story NOUN NN Number=Sing 4 nsubj 4:nsubj _
+"""
+doc = nlp.parse_conll_text_as_spacy(conllstr)
+'''
+
+# Multiple CoNLL entries (separated by two newlines) will be included as different sentences in the resulting Doc
+for sent in doc.sents:
+ for token in sent:
+ print(token.text, token.dep_, token.pos_)
+```
+
+### Command line
+
+Upon installation, a command-line script is added under tha alias `parse-as-conll`. You can use it to parse a
+string or file into CoNLL format given a number of options.
+
+```shell
+parse-as-conll -h
+usage: parse-as-conll [-h] [-f INPUT_FILE] [-a INPUT_ENCODING] [-b INPUT_STR] [-o OUTPUT_FILE]
+ [-c OUTPUT_ENCODING] [-s] [-t] [-d] [-e] [-j N_PROCESS] [-v]
+ [--ignore_pipe_errors] [--no_split_on_newline]
+ model_or_lang {spacy,stanza,udpipe}
+
+Parse an input string or input file to CoNLL-U format using a spaCy-wrapped parser. The output
+can be written to stdout or a file, or both.
+
+positional arguments:
+ model_or_lang Model or language to use. SpaCy models must be pre-installed, stanza
+ and udpipe models will be downloaded automatically
+ {spacy,stanza,udpipe}
+ Which parser to use. Parsers other than 'spacy' need to be installed
+ separately. For 'stanza' you need 'spacy-stanza', and for 'udpipe' the
+ 'spacy-udpipe' library is required.
+
+optional arguments:
+ -h, --help show this help message and exit
+ -f INPUT_FILE, --input_file INPUT_FILE
+ Path to file with sentences to parse. Has precedence over 'input_str'.
+ (default: None)
+ -a INPUT_ENCODING, --input_encoding INPUT_ENCODING
+ Encoding of the input file. Default value is system default. (default:
+ cp1252)
+ -b INPUT_STR, --input_str INPUT_STR
+ Input string to parse. (default: None)
+ -o OUTPUT_FILE, --output_file OUTPUT_FILE
+ Path to output file. If not specified, the output will be printed on
+ standard output. (default: None)
+ -c OUTPUT_ENCODING, --output_encoding OUTPUT_ENCODING
+ Encoding of the output file. Default value is system default. (default:
+ cp1252)
+ -s, --disable_sbd Whether to disable spaCy automatic sentence boundary detection. In
+ practice, disabling means that every line will be parsed as one
+ sentence, regardless of its actual content. When 'is_tokenized' is
+ enabled, 'disable_sbd' is enabled automatically (see 'is_tokenized').
+ Only works when using 'spacy' as 'parser'. (default: False)
+ -t, --is_tokenized Whether your text has already been tokenized (space-seperated). Setting
+ this option has as an important consequence that no sentence splitting
+ at all will be done except splitting on new lines. So if your input is
+ a file, and you want to use pretokenised text, make sure that each line
+ contains exactly one sentence. (default: False)
+ -d, --include_headers
+ Whether to include headers before the output of every sentence. These
+ headers include the sentence text and the sentence ID as per the CoNLL
+ format. (default: False)
+ -e, --no_force_counting
+ Whether to disable force counting the 'sent_id', starting from 1 and
+ increasing for each sentence. Instead, 'sent_id' will depend on how
+ spaCy returns the sentences. Must have 'include_headers' enabled.
+ (default: False)
+ -j N_PROCESS, --n_process N_PROCESS
+ Number of processes to use in nlp.pipe(). -1 will use as many cores as
+ available. Might not work for a 'parser' other than 'spacy' depending
+ on your environment. (default: 1)
+ -v, --verbose Whether to always print the output to stdout, regardless of
+ 'output_file'. (default: False)
+ --ignore_pipe_errors Whether to ignore a priori errors concerning 'n_process' By default we
+ try to determine whether processing works on your system and stop
+ execution if we think it doesn't. If you know what you are doing, you
+ can ignore such pre-emptive errors, though, and run the code as-is,
+ which will then throw the default Python errors when applicable.
+ (default: False)
+ --no_split_on_newline
+ By default, the input file or string is split on newlines for faster
+ processing of the split up parts. If you want to disable that behavior,
+ you can use this flag. (default: False)
+```
+
+
+For example, parsing a single line, multi-sentence string:
+
+```shell
+parse-as-conll en_core_web_sm spacy --input_str "I like cookies. What about you?" --include_headers
+
+# sent_id = 1
+# text = I like cookies.
+1 I I PRON PRP Case=Nom|Number=Sing|Person=1|PronType=Prs 2 nsubj _ _
+2 like like VERB VBP Tense=Pres|VerbForm=Fin 0 ROOT _ _
+3 cookies cookie NOUN NNS Number=Plur 2 dobj _ SpaceAfter=No
+4 . . PUNCT . PunctType=Peri 2 punct _ _
+
+# sent_id = 2
+# text = What about you?
+1 What what PRON WP _ 2 dep _ _
+2 about about ADP IN _ 0 ROOT _ _
+3 you you PRON PRP Case=Acc|Person=2|PronType=Prs 2 pobj _ SpaceAfter=No
+4 ? ? PUNCT . PunctType=Peri 2 punct _ SpaceAfter=No
+```
+
+For example, parsing a large input file and writing output to a given output file, using four processes:
+
+```shell
+parse-as-conll en_core_web_sm spacy --input_file large-input.txt --output_file large-conll-output.txt --include_headers --disable_sbd -j 4
+```
+
+
+## Credits
+
+The first version of this library was inspired by initial work by [rgalhama](https://github.com/rgalhama/spaCy2CoNLLU)
+ and has evolved a lot since then.
+
+
+%package -n python3-spacy-conll
+Summary: A custom pipeline component for spaCy that can convert any parsed Doc and its sentences into CoNLL-U format. Also provides a command line entry point.
+Provides: python-spacy-conll
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-spacy-conll
+# Parsing to CoNLL with spaCy, spacy-stanza, and spacy-udpipe
+
+This module allows you to parse text into CoNLL-U format. You can use it as a command line tool, or embed it in your
+ own scripts by adding it as a custom pipeline component to a spaCy, `spacy-stanza`, or `spacy-udpipe` pipeline. It
+ also provides an easy-to-use function to quickly initialize a parser as well as a ConllParser class with built-in
+ functionality to parse files or text.
+
+Note that the module simply takes a parser's output and puts it in a formatted string adhering to the linked ConLL-U
+ format. The output tags depend on the spaCy model used. If you want Universal Depencies tags as output, I advise you
+ to use this library in combination with [spacy-stanza](https://github.com/explosion/spacy-stanza), which is a spaCy
+ interface using `stanza` and its models behind the scenes. Those models use the Universal Dependencies formalism and
+ yield state-of-the-art performance. `stanza` is a new and improved version of `stanfordnlp`. As an alternative to the
+ Stanford models, you can use the spaCy wrapper for `UDPipe`, [spacy-udpipe](https://github.com/TakeLab/spacy-udpipe),
+ which is slightly less accurate than `stanza` but much faster.
+
+
+## Installation
+
+By default, this package automatically installs only [spaCy](https://spacy.io/usage/models#section-quickstart) as
+ dependency. Because [spaCy's models](https://spacy.io/usage/models) are not necessarily trained on Universal
+ Dependencies conventions, their output labels are not UD either. By using `spacy-stanza` or `spacy-udpipe`, we get
+ the easy-to-use interface of spaCy as a wrapper around `stanza` and `UDPipe` respectively, including their models that
+ *are* trained on UD data.
+
+**NOTE**: `spacy-stanza` and `spacy-udpipe` are not installed automatically as a dependency for this library, because
+ it might be too much overhead for those who don't need UD. If you wish to use their functionality, you have to install
+them manually or use one of the available options as described below.
+
+If you want to retrieve CoNLL info as a `pandas` DataFrame, this library will automatically export it if it detects
+ that `pandas` is installed. See the Usage section for more.
+
+To install the library, simply use pip.
+
+```shell
+# only includes spacy by default
+pip install spacy_conll
+```
+
+A number of options are available to make installation of additional dependencies easier:
+
+```shell
+# include spacy-stanza and spacy-udpipe
+pip install spacy_conll[parsers]
+# include pandas
+pip install spacy_conll[pd]
+# include pandas, spacy-stanza and spacy-udpipe
+pip install spacy_conll[all]
+# include pandas, spacy-stanza and spacy-udpipe and additional libaries for testing and formatting
+pip install spacy_conll[dev]
+```
+
+
+## Usage
+
+When the ConllFormatter is added to a spaCy pipeline, it adds CoNLL properties for `Token`, sentence `Span` and `Doc`.
+ Note that arbitrary Span's are not included and do not receive these properties.
+
+On all three of these levels, two custom properties are exposed by default, `._.conll` and its string
+ representation `._.conll_str`. However, if you have `pandas` installed, then `._.conll_pd` will
+ be added automatically, too!
+
+- `._.conll`: raw CoNLL format
+ - in Token: a dictionary containing all the expected CoNLL fields as keys and the parsed properties as values.
+ - in sentence Span: a list of its tokens' `._.conll` dictionaries (list of dictionaries).
+ - in a Doc: a list of its sentences' `._.conll` lists (list of list of dictionaries).
+
+- `._.conll_str`: string representation of the CoNLL format
+ - in Token: tab-separated representation of the contents of the CoNLL fields ending with a newline.
+ - in sentence Span: the expected CoNLL format where each row represents a token. When
+ `ConllFormatter(include_headers=True)` is used, two header lines are included as well, as per the
+ [CoNLL format](https://universaldependencies.org/format.html#sentence-boundaries-and-comments).
+ - in Doc: all its sentences' `._.conll_str` combined and separated by new lines.
+
+- `._.conll_pd`: `pandas` representation of the CoNLL format
+ - in Token: a Series representation of this token's CoNLL properties.
+ - in sentence Span: a DataFrame representation of this sentence, with the CoNLL names as column headers.
+ - in Doc: a concatenation of its sentences' DataFrame's, leading to a new a DataFrame whose index is reset.
+
+You can use `spacy_conll` in your own Python code as a custom pipeline component, or you can use the built-in
+ command-line script which offers typically needed functionality. See the following section for more.
+
+
+### In Python
+
+This library offers the ConllFormatter class which serves as a custom spaCy pipeline component. It can be instantiated
+ as follows. It is important that you import `spacy_conll` before adding the pipe!
+
+```python
+import spacy
+nlp = spacy.load("en_core_web_sm")
+nlp.add_pipe("conll_formatter", last=True)
+```
+
+Because this library supports different spaCy wrappers (`spacy`, `stanza`, and `udpipe`), a convenience function is
+ available as well. With `utils.init_parser` you can easily instantiate a parser with a single line. You can
+ find the function's signature below. Have a look at the [source code](spacy_conll/utils.py) to read more about all the
+ possible arguments or try out the [examples](examples/).
+
+**NOTE**: `is_tokenized` does not work for `spacy-udpipe`. Using `is_tokenized` for `spacy-stanza` also affects sentence
+ segmentation, effectively *only* splitting on new lines. With `spacy`, `is_tokenized` disables sentence splitting completely.
+
+```python
+def init_parser(
+ model_or_lang: str,
+ parser: str,
+ *,
+ is_tokenized: bool = False,
+ disable_sbd: bool = False,
+ exclude_spacy_components: Optional[List[str]] = None,
+ parser_opts: Optional[Dict] = None,
+ **kwargs,
+)
+```
+
+For instance, if you want to load a Dutch `stanza` model in silent mode with the CoNLL formatter already attached, you
+ can simply use the following snippet. `parser_opts` is passed to the `stanza` pipeline initialisation automatically.
+ Any other keyword arguments (`kwargs`), on the other hand, are passed to the `ConllFormatter` initialisation.
+
+```python
+from spacy_conll import init_parser
+
+nlp = init_parser("nl", "stanza", parser_opts={"verbose": False})
+```
+
+The `ConllFormatter` allows you to customize the extension names, and you can also specify conversion maps for the
+output properties.
+
+To illustrate, here is an advanced example, showing the more complex options:
+
+- `ext_names`: changes the attribute names to a custom key by using a dictionary.
+- `conversion_maps`: a two-level dictionary that looks like `{field_name: {tag_name: replacement}}`. In
+ other words, you can specify in which field a certain value should be replaced by another. This is especially useful
+ when you are not satisfied with the tagset of a model and wish to change some tags to an alternative0.
+- `field_names`: allows you to change the default CoNLL-U field names to your own custom names. Similar to the
+ conversion map above, you should use any of the default field names as keys and add your own key as value.
+ Possible keys are : "ID", "FORM", "LEMMA", "UPOS", "XPOS", "FEATS", "HEAD", "DEPREL", "DEPS", "MISC".
+
+The example below
+
+- shows how to manually add the component;
+- changes the custom attribute `conll_pd` to pandas (`conll_pd` only availabe if `pandas` is installed);
+- converts any `nsubj` deprel tag to `subj`.
+
+```python
+import spacy
+
+
+nlp = spacy.load("en_core_web_sm")
+config = {"ext_names": {"conll_pd": "pandas"},
+ "conversion_maps": {"deprel": {"nsubj": "subj"}}}
+nlp.add_pipe("conll_formatter", config=config, last=True)
+doc = nlp("I like cookies.")
+print(doc._.pandas)
+```
+
+This is the same as:
+
+```python
+from spacy_conll import init_parser
+
+nlp = init_parser("en_core_web_sm",
+ "spacy",
+ ext_names={"conll_pd": "pandas"},
+ conversion_maps={"deprel": {"nsubj": "subj"}})
+doc = nlp("I like cookies.")
+print(doc._.pandas)
+```
+
+
+The snippets above will output a pandas DataFrame by using `._.pandas` rather than the standard
+`._.conll_pd`, and all occurrences of `nsubj` in the deprel field are replaced by `subj`.
+
+```
+ ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
+0 1 I I PRON PRP Case=Nom|Number=Sing|Person=1|PronType=Prs 2 subj _ _
+1 2 like like VERB VBP Tense=Pres|VerbForm=Fin 0 ROOT _ _
+2 3 cookies cookie NOUN NNS Number=Plur 2 dobj _ SpaceAfter=No
+3 4 . . PUNCT . PunctType=Peri 2 punct _ SpaceAfter=No
+```
+
+Another initialization example that would replace the column names "UPOS" with "upostag" amd "XPOS" with "xpostag":
+
+```python
+import spacy
+
+
+nlp = spacy.load("en_core_web_sm")
+config = {"field_names": {"UPOS": "upostag", "XPOS": "xpostag"}}
+nlp.add_pipe("conll_formatter", config=config, last=True)
+```
+
+#### Reading CoNLL into a spaCy object
+
+It is possible to read a CoNLL string or text file and parse it as a spaCy object. This can be useful if you have raw
+CoNLL data that you wish to process in different ways. The process is straightforward.
+
+```python
+from spacy_conll import init_parser
+from spacy_conll.parser import ConllParser
+
+
+nlp = ConllParser(init_parser("en_core_web_sm", "spacy"))
+
+doc = nlp.parse_conll_file_as_spacy("path/to/your/conll-sample.txt")
+'''
+or straight from raw text:
+conllstr = """
+# text = From the AP comes this story :
+1 From from ADP IN _ 3 case 3:case _
+2 the the DET DT Definite=Def|PronType=Art 3 det 3:det _
+3 AP AP PROPN NNP Number=Sing 4 obl 4:obl:from _
+4 comes come VERB VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root 0:root _
+5 this this DET DT Number=Sing|PronType=Dem 6 det 6:det _
+6 story story NOUN NN Number=Sing 4 nsubj 4:nsubj _
+"""
+doc = nlp.parse_conll_text_as_spacy(conllstr)
+'''
+
+# Multiple CoNLL entries (separated by two newlines) will be included as different sentences in the resulting Doc
+for sent in doc.sents:
+ for token in sent:
+ print(token.text, token.dep_, token.pos_)
+```
+
+### Command line
+
+Upon installation, a command-line script is added under tha alias `parse-as-conll`. You can use it to parse a
+string or file into CoNLL format given a number of options.
+
+```shell
+parse-as-conll -h
+usage: parse-as-conll [-h] [-f INPUT_FILE] [-a INPUT_ENCODING] [-b INPUT_STR] [-o OUTPUT_FILE]
+ [-c OUTPUT_ENCODING] [-s] [-t] [-d] [-e] [-j N_PROCESS] [-v]
+ [--ignore_pipe_errors] [--no_split_on_newline]
+ model_or_lang {spacy,stanza,udpipe}
+
+Parse an input string or input file to CoNLL-U format using a spaCy-wrapped parser. The output
+can be written to stdout or a file, or both.
+
+positional arguments:
+ model_or_lang Model or language to use. SpaCy models must be pre-installed, stanza
+ and udpipe models will be downloaded automatically
+ {spacy,stanza,udpipe}
+ Which parser to use. Parsers other than 'spacy' need to be installed
+ separately. For 'stanza' you need 'spacy-stanza', and for 'udpipe' the
+ 'spacy-udpipe' library is required.
+
+optional arguments:
+ -h, --help show this help message and exit
+ -f INPUT_FILE, --input_file INPUT_FILE
+ Path to file with sentences to parse. Has precedence over 'input_str'.
+ (default: None)
+ -a INPUT_ENCODING, --input_encoding INPUT_ENCODING
+ Encoding of the input file. Default value is system default. (default:
+ cp1252)
+ -b INPUT_STR, --input_str INPUT_STR
+ Input string to parse. (default: None)
+ -o OUTPUT_FILE, --output_file OUTPUT_FILE
+ Path to output file. If not specified, the output will be printed on
+ standard output. (default: None)
+ -c OUTPUT_ENCODING, --output_encoding OUTPUT_ENCODING
+ Encoding of the output file. Default value is system default. (default:
+ cp1252)
+ -s, --disable_sbd Whether to disable spaCy automatic sentence boundary detection. In
+ practice, disabling means that every line will be parsed as one
+ sentence, regardless of its actual content. When 'is_tokenized' is
+ enabled, 'disable_sbd' is enabled automatically (see 'is_tokenized').
+ Only works when using 'spacy' as 'parser'. (default: False)
+ -t, --is_tokenized Whether your text has already been tokenized (space-seperated). Setting
+ this option has as an important consequence that no sentence splitting
+ at all will be done except splitting on new lines. So if your input is
+ a file, and you want to use pretokenised text, make sure that each line
+ contains exactly one sentence. (default: False)
+ -d, --include_headers
+ Whether to include headers before the output of every sentence. These
+ headers include the sentence text and the sentence ID as per the CoNLL
+ format. (default: False)
+ -e, --no_force_counting
+ Whether to disable force counting the 'sent_id', starting from 1 and
+ increasing for each sentence. Instead, 'sent_id' will depend on how
+ spaCy returns the sentences. Must have 'include_headers' enabled.
+ (default: False)
+ -j N_PROCESS, --n_process N_PROCESS
+ Number of processes to use in nlp.pipe(). -1 will use as many cores as
+ available. Might not work for a 'parser' other than 'spacy' depending
+ on your environment. (default: 1)
+ -v, --verbose Whether to always print the output to stdout, regardless of
+ 'output_file'. (default: False)
+ --ignore_pipe_errors Whether to ignore a priori errors concerning 'n_process' By default we
+ try to determine whether processing works on your system and stop
+ execution if we think it doesn't. If you know what you are doing, you
+ can ignore such pre-emptive errors, though, and run the code as-is,
+ which will then throw the default Python errors when applicable.
+ (default: False)
+ --no_split_on_newline
+ By default, the input file or string is split on newlines for faster
+ processing of the split up parts. If you want to disable that behavior,
+ you can use this flag. (default: False)
+```
+
+
+For example, parsing a single line, multi-sentence string:
+
+```shell
+parse-as-conll en_core_web_sm spacy --input_str "I like cookies. What about you?" --include_headers
+
+# sent_id = 1
+# text = I like cookies.
+1 I I PRON PRP Case=Nom|Number=Sing|Person=1|PronType=Prs 2 nsubj _ _
+2 like like VERB VBP Tense=Pres|VerbForm=Fin 0 ROOT _ _
+3 cookies cookie NOUN NNS Number=Plur 2 dobj _ SpaceAfter=No
+4 . . PUNCT . PunctType=Peri 2 punct _ _
+
+# sent_id = 2
+# text = What about you?
+1 What what PRON WP _ 2 dep _ _
+2 about about ADP IN _ 0 ROOT _ _
+3 you you PRON PRP Case=Acc|Person=2|PronType=Prs 2 pobj _ SpaceAfter=No
+4 ? ? PUNCT . PunctType=Peri 2 punct _ SpaceAfter=No
+```
+
+For example, parsing a large input file and writing output to a given output file, using four processes:
+
+```shell
+parse-as-conll en_core_web_sm spacy --input_file large-input.txt --output_file large-conll-output.txt --include_headers --disable_sbd -j 4
+```
+
+
+## Credits
+
+The first version of this library was inspired by initial work by [rgalhama](https://github.com/rgalhama/spaCy2CoNLLU)
+ and has evolved a lot since then.
+
+
+%package help
+Summary: Development documents and examples for spacy-conll
+Provides: python3-spacy-conll-doc
+%description help
+# Parsing to CoNLL with spaCy, spacy-stanza, and spacy-udpipe
+
+This module allows you to parse text into CoNLL-U format. You can use it as a command line tool, or embed it in your
+ own scripts by adding it as a custom pipeline component to a spaCy, `spacy-stanza`, or `spacy-udpipe` pipeline. It
+ also provides an easy-to-use function to quickly initialize a parser as well as a ConllParser class with built-in
+ functionality to parse files or text.
+
+Note that the module simply takes a parser's output and puts it in a formatted string adhering to the linked ConLL-U
+ format. The output tags depend on the spaCy model used. If you want Universal Depencies tags as output, I advise you
+ to use this library in combination with [spacy-stanza](https://github.com/explosion/spacy-stanza), which is a spaCy
+ interface using `stanza` and its models behind the scenes. Those models use the Universal Dependencies formalism and
+ yield state-of-the-art performance. `stanza` is a new and improved version of `stanfordnlp`. As an alternative to the
+ Stanford models, you can use the spaCy wrapper for `UDPipe`, [spacy-udpipe](https://github.com/TakeLab/spacy-udpipe),
+ which is slightly less accurate than `stanza` but much faster.
+
+
+## Installation
+
+By default, this package automatically installs only [spaCy](https://spacy.io/usage/models#section-quickstart) as
+ dependency. Because [spaCy's models](https://spacy.io/usage/models) are not necessarily trained on Universal
+ Dependencies conventions, their output labels are not UD either. By using `spacy-stanza` or `spacy-udpipe`, we get
+ the easy-to-use interface of spaCy as a wrapper around `stanza` and `UDPipe` respectively, including their models that
+ *are* trained on UD data.
+
+**NOTE**: `spacy-stanza` and `spacy-udpipe` are not installed automatically as a dependency for this library, because
+ it might be too much overhead for those who don't need UD. If you wish to use their functionality, you have to install
+them manually or use one of the available options as described below.
+
+If you want to retrieve CoNLL info as a `pandas` DataFrame, this library will automatically export it if it detects
+ that `pandas` is installed. See the Usage section for more.
+
+To install the library, simply use pip.
+
+```shell
+# only includes spacy by default
+pip install spacy_conll
+```
+
+A number of options are available to make installation of additional dependencies easier:
+
+```shell
+# include spacy-stanza and spacy-udpipe
+pip install spacy_conll[parsers]
+# include pandas
+pip install spacy_conll[pd]
+# include pandas, spacy-stanza and spacy-udpipe
+pip install spacy_conll[all]
+# include pandas, spacy-stanza and spacy-udpipe and additional libaries for testing and formatting
+pip install spacy_conll[dev]
+```
+
+
+## Usage
+
+When the ConllFormatter is added to a spaCy pipeline, it adds CoNLL properties for `Token`, sentence `Span` and `Doc`.
+ Note that arbitrary Span's are not included and do not receive these properties.
+
+On all three of these levels, two custom properties are exposed by default, `._.conll` and its string
+ representation `._.conll_str`. However, if you have `pandas` installed, then `._.conll_pd` will
+ be added automatically, too!
+
+- `._.conll`: raw CoNLL format
+ - in Token: a dictionary containing all the expected CoNLL fields as keys and the parsed properties as values.
+ - in sentence Span: a list of its tokens' `._.conll` dictionaries (list of dictionaries).
+ - in a Doc: a list of its sentences' `._.conll` lists (list of list of dictionaries).
+
+- `._.conll_str`: string representation of the CoNLL format
+ - in Token: tab-separated representation of the contents of the CoNLL fields ending with a newline.
+ - in sentence Span: the expected CoNLL format where each row represents a token. When
+ `ConllFormatter(include_headers=True)` is used, two header lines are included as well, as per the
+ [CoNLL format](https://universaldependencies.org/format.html#sentence-boundaries-and-comments).
+ - in Doc: all its sentences' `._.conll_str` combined and separated by new lines.
+
+- `._.conll_pd`: `pandas` representation of the CoNLL format
+ - in Token: a Series representation of this token's CoNLL properties.
+ - in sentence Span: a DataFrame representation of this sentence, with the CoNLL names as column headers.
+ - in Doc: a concatenation of its sentences' DataFrame's, leading to a new a DataFrame whose index is reset.
+
+You can use `spacy_conll` in your own Python code as a custom pipeline component, or you can use the built-in
+ command-line script which offers typically needed functionality. See the following section for more.
+
+
+### In Python
+
+This library offers the ConllFormatter class which serves as a custom spaCy pipeline component. It can be instantiated
+ as follows. It is important that you import `spacy_conll` before adding the pipe!
+
+```python
+import spacy
+nlp = spacy.load("en_core_web_sm")
+nlp.add_pipe("conll_formatter", last=True)
+```
+
+Because this library supports different spaCy wrappers (`spacy`, `stanza`, and `udpipe`), a convenience function is
+ available as well. With `utils.init_parser` you can easily instantiate a parser with a single line. You can
+ find the function's signature below. Have a look at the [source code](spacy_conll/utils.py) to read more about all the
+ possible arguments or try out the [examples](examples/).
+
+**NOTE**: `is_tokenized` does not work for `spacy-udpipe`. Using `is_tokenized` for `spacy-stanza` also affects sentence
+ segmentation, effectively *only* splitting on new lines. With `spacy`, `is_tokenized` disables sentence splitting completely.
+
+```python
+def init_parser(
+ model_or_lang: str,
+ parser: str,
+ *,
+ is_tokenized: bool = False,
+ disable_sbd: bool = False,
+ exclude_spacy_components: Optional[List[str]] = None,
+ parser_opts: Optional[Dict] = None,
+ **kwargs,
+)
+```
+
+For instance, if you want to load a Dutch `stanza` model in silent mode with the CoNLL formatter already attached, you
+ can simply use the following snippet. `parser_opts` is passed to the `stanza` pipeline initialisation automatically.
+ Any other keyword arguments (`kwargs`), on the other hand, are passed to the `ConllFormatter` initialisation.
+
+```python
+from spacy_conll import init_parser
+
+nlp = init_parser("nl", "stanza", parser_opts={"verbose": False})
+```
+
+The `ConllFormatter` allows you to customize the extension names, and you can also specify conversion maps for the
+output properties.
+
+To illustrate, here is an advanced example, showing the more complex options:
+
+- `ext_names`: changes the attribute names to a custom key by using a dictionary.
+- `conversion_maps`: a two-level dictionary that looks like `{field_name: {tag_name: replacement}}`. In
+ other words, you can specify in which field a certain value should be replaced by another. This is especially useful
+ when you are not satisfied with the tagset of a model and wish to change some tags to an alternative0.
+- `field_names`: allows you to change the default CoNLL-U field names to your own custom names. Similar to the
+ conversion map above, you should use any of the default field names as keys and add your own key as value.
+ Possible keys are : "ID", "FORM", "LEMMA", "UPOS", "XPOS", "FEATS", "HEAD", "DEPREL", "DEPS", "MISC".
+
+The example below
+
+- shows how to manually add the component;
+- changes the custom attribute `conll_pd` to pandas (`conll_pd` only availabe if `pandas` is installed);
+- converts any `nsubj` deprel tag to `subj`.
+
+```python
+import spacy
+
+
+nlp = spacy.load("en_core_web_sm")
+config = {"ext_names": {"conll_pd": "pandas"},
+ "conversion_maps": {"deprel": {"nsubj": "subj"}}}
+nlp.add_pipe("conll_formatter", config=config, last=True)
+doc = nlp("I like cookies.")
+print(doc._.pandas)
+```
+
+This is the same as:
+
+```python
+from spacy_conll import init_parser
+
+nlp = init_parser("en_core_web_sm",
+ "spacy",
+ ext_names={"conll_pd": "pandas"},
+ conversion_maps={"deprel": {"nsubj": "subj"}})
+doc = nlp("I like cookies.")
+print(doc._.pandas)
+```
+
+
+The snippets above will output a pandas DataFrame by using `._.pandas` rather than the standard
+`._.conll_pd`, and all occurrences of `nsubj` in the deprel field are replaced by `subj`.
+
+```
+ ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
+0 1 I I PRON PRP Case=Nom|Number=Sing|Person=1|PronType=Prs 2 subj _ _
+1 2 like like VERB VBP Tense=Pres|VerbForm=Fin 0 ROOT _ _
+2 3 cookies cookie NOUN NNS Number=Plur 2 dobj _ SpaceAfter=No
+3 4 . . PUNCT . PunctType=Peri 2 punct _ SpaceAfter=No
+```
+
+Another initialization example that would replace the column names "UPOS" with "upostag" amd "XPOS" with "xpostag":
+
+```python
+import spacy
+
+
+nlp = spacy.load("en_core_web_sm")
+config = {"field_names": {"UPOS": "upostag", "XPOS": "xpostag"}}
+nlp.add_pipe("conll_formatter", config=config, last=True)
+```
+
+#### Reading CoNLL into a spaCy object
+
+It is possible to read a CoNLL string or text file and parse it as a spaCy object. This can be useful if you have raw
+CoNLL data that you wish to process in different ways. The process is straightforward.
+
+```python
+from spacy_conll import init_parser
+from spacy_conll.parser import ConllParser
+
+
+nlp = ConllParser(init_parser("en_core_web_sm", "spacy"))
+
+doc = nlp.parse_conll_file_as_spacy("path/to/your/conll-sample.txt")
+'''
+or straight from raw text:
+conllstr = """
+# text = From the AP comes this story :
+1 From from ADP IN _ 3 case 3:case _
+2 the the DET DT Definite=Def|PronType=Art 3 det 3:det _
+3 AP AP PROPN NNP Number=Sing 4 obl 4:obl:from _
+4 comes come VERB VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root 0:root _
+5 this this DET DT Number=Sing|PronType=Dem 6 det 6:det _
+6 story story NOUN NN Number=Sing 4 nsubj 4:nsubj _
+"""
+doc = nlp.parse_conll_text_as_spacy(conllstr)
+'''
+
+# Multiple CoNLL entries (separated by two newlines) will be included as different sentences in the resulting Doc
+for sent in doc.sents:
+ for token in sent:
+ print(token.text, token.dep_, token.pos_)
+```
+
+### Command line
+
+Upon installation, a command-line script is added under tha alias `parse-as-conll`. You can use it to parse a
+string or file into CoNLL format given a number of options.
+
+```shell
+parse-as-conll -h
+usage: parse-as-conll [-h] [-f INPUT_FILE] [-a INPUT_ENCODING] [-b INPUT_STR] [-o OUTPUT_FILE]
+ [-c OUTPUT_ENCODING] [-s] [-t] [-d] [-e] [-j N_PROCESS] [-v]
+ [--ignore_pipe_errors] [--no_split_on_newline]
+ model_or_lang {spacy,stanza,udpipe}
+
+Parse an input string or input file to CoNLL-U format using a spaCy-wrapped parser. The output
+can be written to stdout or a file, or both.
+
+positional arguments:
+ model_or_lang Model or language to use. SpaCy models must be pre-installed, stanza
+ and udpipe models will be downloaded automatically
+ {spacy,stanza,udpipe}
+ Which parser to use. Parsers other than 'spacy' need to be installed
+ separately. For 'stanza' you need 'spacy-stanza', and for 'udpipe' the
+ 'spacy-udpipe' library is required.
+
+optional arguments:
+ -h, --help show this help message and exit
+ -f INPUT_FILE, --input_file INPUT_FILE
+ Path to file with sentences to parse. Has precedence over 'input_str'.
+ (default: None)
+ -a INPUT_ENCODING, --input_encoding INPUT_ENCODING
+ Encoding of the input file. Default value is system default. (default:
+ cp1252)
+ -b INPUT_STR, --input_str INPUT_STR
+ Input string to parse. (default: None)
+ -o OUTPUT_FILE, --output_file OUTPUT_FILE
+ Path to output file. If not specified, the output will be printed on
+ standard output. (default: None)
+ -c OUTPUT_ENCODING, --output_encoding OUTPUT_ENCODING
+ Encoding of the output file. Default value is system default. (default:
+ cp1252)
+ -s, --disable_sbd Whether to disable spaCy automatic sentence boundary detection. In
+ practice, disabling means that every line will be parsed as one
+ sentence, regardless of its actual content. When 'is_tokenized' is
+ enabled, 'disable_sbd' is enabled automatically (see 'is_tokenized').
+ Only works when using 'spacy' as 'parser'. (default: False)
+ -t, --is_tokenized Whether your text has already been tokenized (space-seperated). Setting
+ this option has as an important consequence that no sentence splitting
+ at all will be done except splitting on new lines. So if your input is
+ a file, and you want to use pretokenised text, make sure that each line
+ contains exactly one sentence. (default: False)
+ -d, --include_headers
+ Whether to include headers before the output of every sentence. These
+ headers include the sentence text and the sentence ID as per the CoNLL
+ format. (default: False)
+ -e, --no_force_counting
+ Whether to disable force counting the 'sent_id', starting from 1 and
+ increasing for each sentence. Instead, 'sent_id' will depend on how
+ spaCy returns the sentences. Must have 'include_headers' enabled.
+ (default: False)
+ -j N_PROCESS, --n_process N_PROCESS
+ Number of processes to use in nlp.pipe(). -1 will use as many cores as
+ available. Might not work for a 'parser' other than 'spacy' depending
+ on your environment. (default: 1)
+ -v, --verbose Whether to always print the output to stdout, regardless of
+ 'output_file'. (default: False)
+ --ignore_pipe_errors Whether to ignore a priori errors concerning 'n_process' By default we
+ try to determine whether processing works on your system and stop
+ execution if we think it doesn't. If you know what you are doing, you
+ can ignore such pre-emptive errors, though, and run the code as-is,
+ which will then throw the default Python errors when applicable.
+ (default: False)
+ --no_split_on_newline
+ By default, the input file or string is split on newlines for faster
+ processing of the split up parts. If you want to disable that behavior,
+ you can use this flag. (default: False)
+```
+
+
+For example, parsing a single line, multi-sentence string:
+
+```shell
+parse-as-conll en_core_web_sm spacy --input_str "I like cookies. What about you?" --include_headers
+
+# sent_id = 1
+# text = I like cookies.
+1 I I PRON PRP Case=Nom|Number=Sing|Person=1|PronType=Prs 2 nsubj _ _
+2 like like VERB VBP Tense=Pres|VerbForm=Fin 0 ROOT _ _
+3 cookies cookie NOUN NNS Number=Plur 2 dobj _ SpaceAfter=No
+4 . . PUNCT . PunctType=Peri 2 punct _ _
+
+# sent_id = 2
+# text = What about you?
+1 What what PRON WP _ 2 dep _ _
+2 about about ADP IN _ 0 ROOT _ _
+3 you you PRON PRP Case=Acc|Person=2|PronType=Prs 2 pobj _ SpaceAfter=No
+4 ? ? PUNCT . PunctType=Peri 2 punct _ SpaceAfter=No
+```
+
+For example, parsing a large input file and writing output to a given output file, using four processes:
+
+```shell
+parse-as-conll en_core_web_sm spacy --input_file large-input.txt --output_file large-conll-output.txt --include_headers --disable_sbd -j 4
+```
+
+
+## Credits
+
+The first version of this library was inspired by initial work by [rgalhama](https://github.com/rgalhama/spaCy2CoNLLU)
+ and has evolved a lot since then.
+
+
+%prep
+%autosetup -n spacy-conll-3.4.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-spacy-conll -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 10 2023 Python_Bot <Python_Bot@openeuler.org> - 3.4.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..b1850ae
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+9cabacc829a31a9f3d0b39bfa2f64d39 spacy_conll-3.4.0.tar.gz