summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-contextualspellcheck.spec811
-rw-r--r--sources1
3 files changed, 813 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..fec8c83 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/contextualSpellCheck-0.4.3.tar.gz
diff --git a/python-contextualspellcheck.spec b/python-contextualspellcheck.spec
new file mode 100644
index 0000000..985cdc0
--- /dev/null
+++ b/python-contextualspellcheck.spec
@@ -0,0 +1,811 @@
+%global _empty_manifest_terminate_build 0
+Name: python-contextualSpellCheck
+Version: 0.4.3
+Release: 1
+Summary: Contextual spell correction using BERT (bidirectional representations)
+License: MIT License
+URL: https://github.com/R1j1t/contextualSpellCheck
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/09/1d/96d5d6be2c4d5eda07eb1d5f21a3480259dd0c2f066edb466f0d9b9e7309/contextualSpellCheck-0.4.3.tar.gz
+BuildArch: noarch
+
+Requires: python3-torch
+Requires: python3-editdistance
+Requires: python3-transformers
+Requires: python3-spacy
+
+%description
+# spellCheck
+<a href="https://github.com/R1j1t/contextualSpellCheck"><img src="https://user-images.githubusercontent.com/22280243/82138959-2852cd00-9842-11ea-918a-49b2a7873ef6.png" width="276" height="120" align="right" /></a>
+
+Contextual word checker for better suggestions
+
+[![license](https://img.shields.io/github/license/r1j1t/contextualSpellCheck)](https://github.com/R1j1t/contextualSpellCheck/blob/master/LICENSE)
+[![PyPI](https://img.shields.io/pypi/v/contextualSpellCheck?color=green)](https://pypi.org/project/contextualSpellCheck/)
+[![Python-Version](https://img.shields.io/badge/Python-3.6+-green)](https://github.com/R1j1t/contextualSpellCheck#install)
+[![Downloads](https://pepy.tech/badge/contextualspellcheck/week)](https://pepy.tech/project/contextualspellcheck)
+[![GitHub contributors](https://img.shields.io/github/contributors/r1j1t/contextualSpellCheck)](https://github.com/R1j1t/contextualSpellCheck/graphs/contributors)
+[![Help Wanted](https://img.shields.io/badge/Help%20Wanted-Task%20List-violet)](https://github.com/R1j1t/contextualSpellCheck#task-list)
+[![DOI](https://zenodo.org/badge/254703118.svg)](https://zenodo.org/badge/latestdoi/254703118)
+
+## Types of spelling mistakes
+
+It is essential to understand that identifying whether a candidate is a spelling error is a big task.
+
+> Spelling errors are broadly classified as non- word errors (NWE) and real word errors (RWE). If the misspelt string is a valid word in the language, then it is called an RWE, else it is an NWE.
+>
+> -- [Monojit Choudhury et. al. (2007)][1]
+
+This package currently focuses on Out of Vocabulary (OOV) word or non-word error (NWE) correction using BERT model. The idea of using BERT was to use the context when correcting OOV. To improve this package, I would like to extend the functionality to identify RWE, optimising the package, and improving the documentation.
+
+## Install
+
+The package can be installed using [pip](https://pypi.org/project/contextualSpellCheck/). You would require python 3.6+
+
+```bash
+pip install contextualSpellCheck
+```
+
+## Usage
+
+**Note:** For use in other languages check [`examples`](https://github.com/R1j1t/contextualSpellCheck/tree/master/examples) folder.
+
+### How to load the package in spacy pipeline
+
+```python
+>>> import contextualSpellCheck
+>>> import spacy
+>>> nlp = spacy.load("en_core_web_sm")
+>>>
+>>> ## We require NER to identify if a token is a PERSON
+>>> ## also require parser because we use `Token.sent` for context
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
+>>> contextualSpellCheck.add_to_pipe(nlp)
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'contextual spellchecker']
+>>>
+>>> doc = nlp('Income was $9.4 milion compared to the prior year of $2.7 milion.')
+>>> doc._.outcome_spellCheck
+'Income was $9.4 million compared to the prior year of $2.7 million.'
+```
+
+Or you can add to spaCy pipeline manually!
+
+```python
+>>> import spacy
+>>> import contextualSpellCheck
+>>>
+>>> nlp = spacy.load("en_core_web_sm")
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
+>>> # You can pass the optional parameters to the contextualSpellCheck
+>>> # eg. pass max edit distance use config={"max_edit_dist": 3}
+>>> nlp.add_pipe("contextual spellchecker")
+<contextualSpellCheck.contextualSpellCheck.ContextualSpellCheck object at 0x1049f82b0>
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'contextual spellchecker']
+>>>
+>>> doc = nlp("Income was $9.4 milion compared to the prior year of $2.7 milion.")
+>>> print(doc._.performed_spellCheck)
+True
+>>> print(doc._.outcome_spellCheck)
+Income was $9.4 million compared to the prior year of $2.7 million.
+```
+
+After adding `contextual spellchecker` in the pipeline, you use the pipeline normally. The spell check suggestions and other data can be accessed using [extensions](#Extensions).
+
+### Using the pipeline
+
+```python
+>>> doc = nlp(u'Income was $9.4 milion compared to the prior year of $2.7 milion.')
+>>>
+>>> # Doc Extention
+>>> print(doc._.contextual_spellCheck)
+True
+>>> print(doc._.performed_spellCheck)
+True
+>>> print(doc._.suggestions_spellCheck)
+{milion: 'million', milion: 'million'}
+>>> print(doc._.outcome_spellCheck)
+Income was $9.4 million compared to the prior year of $2.7 million.
+>>> print(doc._.score_spellCheck)
+{milion: [('million', 0.59422), ('billion', 0.24349), (',', 0.08809), ('trillion', 0.01835), ('Million', 0.00826), ('%', 0.00672), ('##M', 0.00591), ('annually', 0.0038), ('##B', 0.00205), ('USD', 0.00113)], milion: [('billion', 0.65934), ('million', 0.26185), ('trillion', 0.05391), ('##M', 0.0051), ('Million', 0.00425), ('##B', 0.00268), ('USD', 0.00153), ('##b', 0.00077), ('millions', 0.00059), ('%', 0.00041)]}
+>>>
+>>> # Token Extention
+>>> print(doc[4]._.get_require_spellCheck)
+True
+>>> print(doc[4]._.get_suggestion_spellCheck)
+'million'
+>>> print(doc[4]._.score_spellCheck)
+[('million', 0.59422), ('billion', 0.24349), (',', 0.08809), ('trillion', 0.01835), ('Million', 0.00826), ('%', 0.00672), ('##M', 0.00591), ('annually', 0.0038), ('##B', 0.00205), ('USD', 0.00113)]
+>>>
+>>> # Span Extention
+>>> print(doc[2:6]._.get_has_spellCheck)
+True
+>>> print(doc[2:6]._.score_spellCheck)
+{$: [], 9.4: [], milion: [('million', 0.59422), ('billion', 0.24349), (',', 0.08809), ('trillion', 0.01835), ('Million', 0.00826), ('%', 0.00672), ('##M', 0.00591), ('annually', 0.0038), ('##B', 0.00205), ('USD', 0.00113)], compared: []}
+```
+
+## Extensions
+
+To make the usage easy, `contextual spellchecker` provides custom spacy extensions which your code can consume. This makes it easier for the user to get the desired data. contextualSpellCheck provides extensions on the `doc`, `span` and `token` level. The below tables summarise the extensions.
+
+### `spaCy.Doc` level extensions
+
+| Extension | Type | Description | Default |
+|------------------------------|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| doc._.contextual_spellCheck | `Boolean` | To check whether contextualSpellCheck is added as extension | `True` |
+| doc._.performed_spellCheck | `Boolean` | To check whether contextualSpellCheck identified any misspells and performed correction | `False` |
+| doc._.suggestions_spellCheck | `{Spacy.Token:str}` | if corrections are performed, it returns the mapping of misspell token (`spaCy.Token`) with suggested word(`str`) | `{}` |
+| doc._.outcome_spellCheck | `str` | corrected sentence(`str`) as output | `""` |
+| doc._.score_spellCheck | `{Spacy.Token:List(str,float)}` | if corrections are identified, it returns the mapping of misspell token (`spaCy.Token`) with suggested words(`str`) and probability of that correction | `None` |
+
+### `spaCy.Span` level extensions
+| Extension | Type | Description | Default |
+|-------------------------------|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
+| span._.get_has_spellCheck | `Boolean` | To check whether contextualSpellCheck identified any misspells and performed correction in this span | `False` |
+| span._.score_spellCheck | `{Spacy.Token:List(str,float)}` | if corrections are identified, it returns the mapping of misspell token (`spaCy.Token`) with suggested words(`str`) and probability of that correction for tokens in this `span` | `{spaCy.Token: []}` |
+
+### `spaCy.Token` level extensions
+
+| Extension | Type | Description | Default |
+|-----------------------------------|-----------------|-------------------------------------------------------------------------------------------------------------|---------|
+| token._.get_require_spellCheck | `Boolean` | To check whether contextualSpellCheck identified any misspells and performed correction on this `token` | `False` |
+| token._.get_suggestion_spellCheck | `str` | if corrections are performed, it returns the suggested word(`str`) | `""` |
+| token._.score_spellCheck | `[(str,float)]` | if corrections are identified, it returns suggested words(`str`) and probability(`float`) of that correction | `[]` |
+
+## API
+
+At present, there is a simple GET API to get you started. You can run the app in your local and play with it.
+
+Query: You can use the endpoint http://127.0.0.1:5000/?query=YOUR-QUERY
+Note: Your browser can handle the text encoding
+
+```
+GET Request: http://localhost:5000/?query=Income%20was%20$9.4%20milion%20compared%20to%20the%20prior%20year%20of%20$2.7%20milion.
+```
+
+Response:
+
+```json
+{
+ "success": true,
+ "input": "Income was $9.4 milion compared to the prior year of $2.7 milion.",
+ "corrected": "Income was $9.4 milion compared to the prior year of $2.7 milion.",
+ "suggestion_score": {
+ "milion": [
+ [
+ "million",
+ 0.59422
+ ],
+ [
+ "billion",
+ 0.24349
+ ],
+ ...
+ ],
+ "milion:1": [
+ [
+ "billion",
+ 0.65934
+ ],
+ [
+ "million",
+ 0.26185
+ ],
+ ...
+ ]
+ }
+}
+```
+
+## Task List
+
+- [ ] use cython for part of the code to improve performance ([#39](https://github.com/R1j1t/contextualSpellCheck/issues/39))
+- [ ] Improve metric for candidate selection ([#40](https://github.com/R1j1t/contextualSpellCheck/issues/40))
+- [ ] Add examples for other langauges ([#41](https://github.com/R1j1t/contextualSpellCheck/issues/41))
+- [ ] Update the logic of misspell identification (OOV) ([#44](https://github.com/R1j1t/contextualSpellCheck/issues/44))
+- [ ] better candidate generation (solved by [#44](https://github.com/R1j1t/contextualSpellCheck/issues/44)?)
+- [ ] add metric by testing on datasets
+- [ ] Improve documentation
+- [ ] Improve logging in code
+- [ ] Add support for Real Word Error (RWE) (Big Task)
+- [ ] add multi mask out capability
+
+<details><summary>Completed Task</summary>
+<p>
+
+- [x] specify maximum edit distance for `candidateRanking`
+- [x] allow user to specify bert model
+- [x] Include transformers deTokenizer to get better suggestions
+- [x] dependency version in setup.py ([#38](https://github.com/R1j1t/contextualSpellCheck/issues/38))
+
+</p>
+</details>
+
+## Support and contribution
+
+If you like the project, please ⭑ the project and show your support! Also, if you feel, the current behaviour is not as expected, please feel free to raise an [issue](https://github.com/R1j1t/contextualSpellCheck/issues). If you can help with any of the above tasks, please open a [PR](https://github.com/R1j1t/contextualSpellCheck/pulls) with necessary changes to documentation and tests.
+
+## Cite
+
+If you are using contextualSpellCheck in your academic work, please consider citing the library using the below BibTex entry:
+
+```bibtex
+@misc{Goel_Contextual_Spell_Check_2021,
+author = {Goel, Rajat},
+doi = {10.5281/zenodo.4642379},
+month = {3},
+title = {{Contextual Spell Check}},
+url = {https://github.com/R1j1t/contextualSpellCheck},
+year = {2021}
+}
+```
+
+
+
+## Reference
+
+Below are some of the projects/work I referred to while developing this package
+
+1. Explosion AI.Architecture. May 2020. url:https://spacy.io/api.
+2. Monojit Choudhury et al. “How difficult is it to develop a perfect spell-checker? A cross-linguistic analysis through complex network approach”. In:arXiv preprint physics/0703198(2007).
+3. Jacob Devlin et al. BERT: Pre-training of Deep Bidirectional Transform-ers for Language Understanding. 2019. arXiv:1810.04805 [cs.CL].
+4. Hugging Face.Fast Coreference Resolution in spaCy with Neural Net-works. May 2020. url:https://github.com/huggingface/neuralcoref.
+5. Ines.Chapter 3: Processing Pipelines. May 20202. url:https://course.spacy.io/en/chapter3.
+6. Eric Mays, Fred J Damerau, and Robert L Mercer. “Context based spellingcorrection”. In:Information Processing & Management27.5 (1991), pp. 517–522.
+7. Peter Norvig. How to Write a Spelling Corrector. May 2020. url:http://norvig.com/spell-correct.html.
+8. Yifu Sun and Haoming Jiang.Contextual Text Denoising with MaskedLanguage Models. 2019. arXiv:1910.14080 [cs.CL].
+9. Thomas Wolf et al. “Transformers: State-of-the-Art Natural LanguageProcessing”. In:Proceedings of the 2020 Conference on Empirical Methodsin Natural Language Processing: System Demonstrations. Online: Associ-ation for Computational Linguistics, Oct. 2020, pp. 38–45. url:https://www.aclweb.org/anthology/2020.emnlp-demos.6.
+
+[1]: <http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=52A3B869596656C9DA285DCE83A0339F?doi=10.1.1.146.4390&rep=rep1&type=pdf>
+
+
+%package -n python3-contextualSpellCheck
+Summary: Contextual spell correction using BERT (bidirectional representations)
+Provides: python-contextualSpellCheck
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-contextualSpellCheck
+# spellCheck
+<a href="https://github.com/R1j1t/contextualSpellCheck"><img src="https://user-images.githubusercontent.com/22280243/82138959-2852cd00-9842-11ea-918a-49b2a7873ef6.png" width="276" height="120" align="right" /></a>
+
+Contextual word checker for better suggestions
+
+[![license](https://img.shields.io/github/license/r1j1t/contextualSpellCheck)](https://github.com/R1j1t/contextualSpellCheck/blob/master/LICENSE)
+[![PyPI](https://img.shields.io/pypi/v/contextualSpellCheck?color=green)](https://pypi.org/project/contextualSpellCheck/)
+[![Python-Version](https://img.shields.io/badge/Python-3.6+-green)](https://github.com/R1j1t/contextualSpellCheck#install)
+[![Downloads](https://pepy.tech/badge/contextualspellcheck/week)](https://pepy.tech/project/contextualspellcheck)
+[![GitHub contributors](https://img.shields.io/github/contributors/r1j1t/contextualSpellCheck)](https://github.com/R1j1t/contextualSpellCheck/graphs/contributors)
+[![Help Wanted](https://img.shields.io/badge/Help%20Wanted-Task%20List-violet)](https://github.com/R1j1t/contextualSpellCheck#task-list)
+[![DOI](https://zenodo.org/badge/254703118.svg)](https://zenodo.org/badge/latestdoi/254703118)
+
+## Types of spelling mistakes
+
+It is essential to understand that identifying whether a candidate is a spelling error is a big task.
+
+> Spelling errors are broadly classified as non- word errors (NWE) and real word errors (RWE). If the misspelt string is a valid word in the language, then it is called an RWE, else it is an NWE.
+>
+> -- [Monojit Choudhury et. al. (2007)][1]
+
+This package currently focuses on Out of Vocabulary (OOV) word or non-word error (NWE) correction using BERT model. The idea of using BERT was to use the context when correcting OOV. To improve this package, I would like to extend the functionality to identify RWE, optimising the package, and improving the documentation.
+
+## Install
+
+The package can be installed using [pip](https://pypi.org/project/contextualSpellCheck/). You would require python 3.6+
+
+```bash
+pip install contextualSpellCheck
+```
+
+## Usage
+
+**Note:** For use in other languages check [`examples`](https://github.com/R1j1t/contextualSpellCheck/tree/master/examples) folder.
+
+### How to load the package in spacy pipeline
+
+```python
+>>> import contextualSpellCheck
+>>> import spacy
+>>> nlp = spacy.load("en_core_web_sm")
+>>>
+>>> ## We require NER to identify if a token is a PERSON
+>>> ## also require parser because we use `Token.sent` for context
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
+>>> contextualSpellCheck.add_to_pipe(nlp)
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'contextual spellchecker']
+>>>
+>>> doc = nlp('Income was $9.4 milion compared to the prior year of $2.7 milion.')
+>>> doc._.outcome_spellCheck
+'Income was $9.4 million compared to the prior year of $2.7 million.'
+```
+
+Or you can add to spaCy pipeline manually!
+
+```python
+>>> import spacy
+>>> import contextualSpellCheck
+>>>
+>>> nlp = spacy.load("en_core_web_sm")
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
+>>> # You can pass the optional parameters to the contextualSpellCheck
+>>> # eg. pass max edit distance use config={"max_edit_dist": 3}
+>>> nlp.add_pipe("contextual spellchecker")
+<contextualSpellCheck.contextualSpellCheck.ContextualSpellCheck object at 0x1049f82b0>
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'contextual spellchecker']
+>>>
+>>> doc = nlp("Income was $9.4 milion compared to the prior year of $2.7 milion.")
+>>> print(doc._.performed_spellCheck)
+True
+>>> print(doc._.outcome_spellCheck)
+Income was $9.4 million compared to the prior year of $2.7 million.
+```
+
+After adding `contextual spellchecker` in the pipeline, you use the pipeline normally. The spell check suggestions and other data can be accessed using [extensions](#Extensions).
+
+### Using the pipeline
+
+```python
+>>> doc = nlp(u'Income was $9.4 milion compared to the prior year of $2.7 milion.')
+>>>
+>>> # Doc Extention
+>>> print(doc._.contextual_spellCheck)
+True
+>>> print(doc._.performed_spellCheck)
+True
+>>> print(doc._.suggestions_spellCheck)
+{milion: 'million', milion: 'million'}
+>>> print(doc._.outcome_spellCheck)
+Income was $9.4 million compared to the prior year of $2.7 million.
+>>> print(doc._.score_spellCheck)
+{milion: [('million', 0.59422), ('billion', 0.24349), (',', 0.08809), ('trillion', 0.01835), ('Million', 0.00826), ('%', 0.00672), ('##M', 0.00591), ('annually', 0.0038), ('##B', 0.00205), ('USD', 0.00113)], milion: [('billion', 0.65934), ('million', 0.26185), ('trillion', 0.05391), ('##M', 0.0051), ('Million', 0.00425), ('##B', 0.00268), ('USD', 0.00153), ('##b', 0.00077), ('millions', 0.00059), ('%', 0.00041)]}
+>>>
+>>> # Token Extention
+>>> print(doc[4]._.get_require_spellCheck)
+True
+>>> print(doc[4]._.get_suggestion_spellCheck)
+'million'
+>>> print(doc[4]._.score_spellCheck)
+[('million', 0.59422), ('billion', 0.24349), (',', 0.08809), ('trillion', 0.01835), ('Million', 0.00826), ('%', 0.00672), ('##M', 0.00591), ('annually', 0.0038), ('##B', 0.00205), ('USD', 0.00113)]
+>>>
+>>> # Span Extention
+>>> print(doc[2:6]._.get_has_spellCheck)
+True
+>>> print(doc[2:6]._.score_spellCheck)
+{$: [], 9.4: [], milion: [('million', 0.59422), ('billion', 0.24349), (',', 0.08809), ('trillion', 0.01835), ('Million', 0.00826), ('%', 0.00672), ('##M', 0.00591), ('annually', 0.0038), ('##B', 0.00205), ('USD', 0.00113)], compared: []}
+```
+
+## Extensions
+
+To make the usage easy, `contextual spellchecker` provides custom spacy extensions which your code can consume. This makes it easier for the user to get the desired data. contextualSpellCheck provides extensions on the `doc`, `span` and `token` level. The below tables summarise the extensions.
+
+### `spaCy.Doc` level extensions
+
+| Extension | Type | Description | Default |
+|------------------------------|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| doc._.contextual_spellCheck | `Boolean` | To check whether contextualSpellCheck is added as extension | `True` |
+| doc._.performed_spellCheck | `Boolean` | To check whether contextualSpellCheck identified any misspells and performed correction | `False` |
+| doc._.suggestions_spellCheck | `{Spacy.Token:str}` | if corrections are performed, it returns the mapping of misspell token (`spaCy.Token`) with suggested word(`str`) | `{}` |
+| doc._.outcome_spellCheck | `str` | corrected sentence(`str`) as output | `""` |
+| doc._.score_spellCheck | `{Spacy.Token:List(str,float)}` | if corrections are identified, it returns the mapping of misspell token (`spaCy.Token`) with suggested words(`str`) and probability of that correction | `None` |
+
+### `spaCy.Span` level extensions
+| Extension | Type | Description | Default |
+|-------------------------------|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
+| span._.get_has_spellCheck | `Boolean` | To check whether contextualSpellCheck identified any misspells and performed correction in this span | `False` |
+| span._.score_spellCheck | `{Spacy.Token:List(str,float)}` | if corrections are identified, it returns the mapping of misspell token (`spaCy.Token`) with suggested words(`str`) and probability of that correction for tokens in this `span` | `{spaCy.Token: []}` |
+
+### `spaCy.Token` level extensions
+
+| Extension | Type | Description | Default |
+|-----------------------------------|-----------------|-------------------------------------------------------------------------------------------------------------|---------|
+| token._.get_require_spellCheck | `Boolean` | To check whether contextualSpellCheck identified any misspells and performed correction on this `token` | `False` |
+| token._.get_suggestion_spellCheck | `str` | if corrections are performed, it returns the suggested word(`str`) | `""` |
+| token._.score_spellCheck | `[(str,float)]` | if corrections are identified, it returns suggested words(`str`) and probability(`float`) of that correction | `[]` |
+
+## API
+
+At present, there is a simple GET API to get you started. You can run the app in your local and play with it.
+
+Query: You can use the endpoint http://127.0.0.1:5000/?query=YOUR-QUERY
+Note: Your browser can handle the text encoding
+
+```
+GET Request: http://localhost:5000/?query=Income%20was%20$9.4%20milion%20compared%20to%20the%20prior%20year%20of%20$2.7%20milion.
+```
+
+Response:
+
+```json
+{
+ "success": true,
+ "input": "Income was $9.4 milion compared to the prior year of $2.7 milion.",
+ "corrected": "Income was $9.4 milion compared to the prior year of $2.7 milion.",
+ "suggestion_score": {
+ "milion": [
+ [
+ "million",
+ 0.59422
+ ],
+ [
+ "billion",
+ 0.24349
+ ],
+ ...
+ ],
+ "milion:1": [
+ [
+ "billion",
+ 0.65934
+ ],
+ [
+ "million",
+ 0.26185
+ ],
+ ...
+ ]
+ }
+}
+```
+
+## Task List
+
+- [ ] use cython for part of the code to improve performance ([#39](https://github.com/R1j1t/contextualSpellCheck/issues/39))
+- [ ] Improve metric for candidate selection ([#40](https://github.com/R1j1t/contextualSpellCheck/issues/40))
+- [ ] Add examples for other langauges ([#41](https://github.com/R1j1t/contextualSpellCheck/issues/41))
+- [ ] Update the logic of misspell identification (OOV) ([#44](https://github.com/R1j1t/contextualSpellCheck/issues/44))
+- [ ] better candidate generation (solved by [#44](https://github.com/R1j1t/contextualSpellCheck/issues/44)?)
+- [ ] add metric by testing on datasets
+- [ ] Improve documentation
+- [ ] Improve logging in code
+- [ ] Add support for Real Word Error (RWE) (Big Task)
+- [ ] add multi mask out capability
+
+<details><summary>Completed Task</summary>
+<p>
+
+- [x] specify maximum edit distance for `candidateRanking`
+- [x] allow user to specify bert model
+- [x] Include transformers deTokenizer to get better suggestions
+- [x] dependency version in setup.py ([#38](https://github.com/R1j1t/contextualSpellCheck/issues/38))
+
+</p>
+</details>
+
+## Support and contribution
+
+If you like the project, please ⭑ the project and show your support! Also, if you feel, the current behaviour is not as expected, please feel free to raise an [issue](https://github.com/R1j1t/contextualSpellCheck/issues). If you can help with any of the above tasks, please open a [PR](https://github.com/R1j1t/contextualSpellCheck/pulls) with necessary changes to documentation and tests.
+
+## Cite
+
+If you are using contextualSpellCheck in your academic work, please consider citing the library using the below BibTex entry:
+
+```bibtex
+@misc{Goel_Contextual_Spell_Check_2021,
+author = {Goel, Rajat},
+doi = {10.5281/zenodo.4642379},
+month = {3},
+title = {{Contextual Spell Check}},
+url = {https://github.com/R1j1t/contextualSpellCheck},
+year = {2021}
+}
+```
+
+
+
+## Reference
+
+Below are some of the projects/work I referred to while developing this package
+
+1. Explosion AI.Architecture. May 2020. url:https://spacy.io/api.
+2. Monojit Choudhury et al. “How difficult is it to develop a perfect spell-checker? A cross-linguistic analysis through complex network approach”. In:arXiv preprint physics/0703198(2007).
+3. Jacob Devlin et al. BERT: Pre-training of Deep Bidirectional Transform-ers for Language Understanding. 2019. arXiv:1810.04805 [cs.CL].
+4. Hugging Face.Fast Coreference Resolution in spaCy with Neural Net-works. May 2020. url:https://github.com/huggingface/neuralcoref.
+5. Ines.Chapter 3: Processing Pipelines. May 20202. url:https://course.spacy.io/en/chapter3.
+6. Eric Mays, Fred J Damerau, and Robert L Mercer. “Context based spellingcorrection”. In:Information Processing & Management27.5 (1991), pp. 517–522.
+7. Peter Norvig. How to Write a Spelling Corrector. May 2020. url:http://norvig.com/spell-correct.html.
+8. Yifu Sun and Haoming Jiang.Contextual Text Denoising with MaskedLanguage Models. 2019. arXiv:1910.14080 [cs.CL].
+9. Thomas Wolf et al. “Transformers: State-of-the-Art Natural LanguageProcessing”. In:Proceedings of the 2020 Conference on Empirical Methodsin Natural Language Processing: System Demonstrations. Online: Associ-ation for Computational Linguistics, Oct. 2020, pp. 38–45. url:https://www.aclweb.org/anthology/2020.emnlp-demos.6.
+
+[1]: <http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=52A3B869596656C9DA285DCE83A0339F?doi=10.1.1.146.4390&rep=rep1&type=pdf>
+
+
+%package help
+Summary: Development documents and examples for contextualSpellCheck
+Provides: python3-contextualSpellCheck-doc
+%description help
+# spellCheck
+<a href="https://github.com/R1j1t/contextualSpellCheck"><img src="https://user-images.githubusercontent.com/22280243/82138959-2852cd00-9842-11ea-918a-49b2a7873ef6.png" width="276" height="120" align="right" /></a>
+
+Contextual word checker for better suggestions
+
+[![license](https://img.shields.io/github/license/r1j1t/contextualSpellCheck)](https://github.com/R1j1t/contextualSpellCheck/blob/master/LICENSE)
+[![PyPI](https://img.shields.io/pypi/v/contextualSpellCheck?color=green)](https://pypi.org/project/contextualSpellCheck/)
+[![Python-Version](https://img.shields.io/badge/Python-3.6+-green)](https://github.com/R1j1t/contextualSpellCheck#install)
+[![Downloads](https://pepy.tech/badge/contextualspellcheck/week)](https://pepy.tech/project/contextualspellcheck)
+[![GitHub contributors](https://img.shields.io/github/contributors/r1j1t/contextualSpellCheck)](https://github.com/R1j1t/contextualSpellCheck/graphs/contributors)
+[![Help Wanted](https://img.shields.io/badge/Help%20Wanted-Task%20List-violet)](https://github.com/R1j1t/contextualSpellCheck#task-list)
+[![DOI](https://zenodo.org/badge/254703118.svg)](https://zenodo.org/badge/latestdoi/254703118)
+
+## Types of spelling mistakes
+
+It is essential to understand that identifying whether a candidate is a spelling error is a big task.
+
+> Spelling errors are broadly classified as non- word errors (NWE) and real word errors (RWE). If the misspelt string is a valid word in the language, then it is called an RWE, else it is an NWE.
+>
+> -- [Monojit Choudhury et. al. (2007)][1]
+
+This package currently focuses on Out of Vocabulary (OOV) word or non-word error (NWE) correction using BERT model. The idea of using BERT was to use the context when correcting OOV. To improve this package, I would like to extend the functionality to identify RWE, optimising the package, and improving the documentation.
+
+## Install
+
+The package can be installed using [pip](https://pypi.org/project/contextualSpellCheck/). You would require python 3.6+
+
+```bash
+pip install contextualSpellCheck
+```
+
+## Usage
+
+**Note:** For use in other languages check [`examples`](https://github.com/R1j1t/contextualSpellCheck/tree/master/examples) folder.
+
+### How to load the package in spacy pipeline
+
+```python
+>>> import contextualSpellCheck
+>>> import spacy
+>>> nlp = spacy.load("en_core_web_sm")
+>>>
+>>> ## We require NER to identify if a token is a PERSON
+>>> ## also require parser because we use `Token.sent` for context
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
+>>> contextualSpellCheck.add_to_pipe(nlp)
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'contextual spellchecker']
+>>>
+>>> doc = nlp('Income was $9.4 milion compared to the prior year of $2.7 milion.')
+>>> doc._.outcome_spellCheck
+'Income was $9.4 million compared to the prior year of $2.7 million.'
+```
+
+Or you can add to spaCy pipeline manually!
+
+```python
+>>> import spacy
+>>> import contextualSpellCheck
+>>>
+>>> nlp = spacy.load("en_core_web_sm")
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
+>>> # You can pass the optional parameters to the contextualSpellCheck
+>>> # eg. pass max edit distance use config={"max_edit_dist": 3}
+>>> nlp.add_pipe("contextual spellchecker")
+<contextualSpellCheck.contextualSpellCheck.ContextualSpellCheck object at 0x1049f82b0>
+>>> nlp.pipe_names
+['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'contextual spellchecker']
+>>>
+>>> doc = nlp("Income was $9.4 milion compared to the prior year of $2.7 milion.")
+>>> print(doc._.performed_spellCheck)
+True
+>>> print(doc._.outcome_spellCheck)
+Income was $9.4 million compared to the prior year of $2.7 million.
+```
+
+After adding `contextual spellchecker` in the pipeline, you use the pipeline normally. The spell check suggestions and other data can be accessed using [extensions](#Extensions).
+
+### Using the pipeline
+
+```python
+>>> doc = nlp(u'Income was $9.4 milion compared to the prior year of $2.7 milion.')
+>>>
+>>> # Doc Extention
+>>> print(doc._.contextual_spellCheck)
+True
+>>> print(doc._.performed_spellCheck)
+True
+>>> print(doc._.suggestions_spellCheck)
+{milion: 'million', milion: 'million'}
+>>> print(doc._.outcome_spellCheck)
+Income was $9.4 million compared to the prior year of $2.7 million.
+>>> print(doc._.score_spellCheck)
+{milion: [('million', 0.59422), ('billion', 0.24349), (',', 0.08809), ('trillion', 0.01835), ('Million', 0.00826), ('%', 0.00672), ('##M', 0.00591), ('annually', 0.0038), ('##B', 0.00205), ('USD', 0.00113)], milion: [('billion', 0.65934), ('million', 0.26185), ('trillion', 0.05391), ('##M', 0.0051), ('Million', 0.00425), ('##B', 0.00268), ('USD', 0.00153), ('##b', 0.00077), ('millions', 0.00059), ('%', 0.00041)]}
+>>>
+>>> # Token Extention
+>>> print(doc[4]._.get_require_spellCheck)
+True
+>>> print(doc[4]._.get_suggestion_spellCheck)
+'million'
+>>> print(doc[4]._.score_spellCheck)
+[('million', 0.59422), ('billion', 0.24349), (',', 0.08809), ('trillion', 0.01835), ('Million', 0.00826), ('%', 0.00672), ('##M', 0.00591), ('annually', 0.0038), ('##B', 0.00205), ('USD', 0.00113)]
+>>>
+>>> # Span Extention
+>>> print(doc[2:6]._.get_has_spellCheck)
+True
+>>> print(doc[2:6]._.score_spellCheck)
+{$: [], 9.4: [], milion: [('million', 0.59422), ('billion', 0.24349), (',', 0.08809), ('trillion', 0.01835), ('Million', 0.00826), ('%', 0.00672), ('##M', 0.00591), ('annually', 0.0038), ('##B', 0.00205), ('USD', 0.00113)], compared: []}
+```
+
+## Extensions
+
+To make the usage easy, `contextual spellchecker` provides custom spacy extensions which your code can consume. This makes it easier for the user to get the desired data. contextualSpellCheck provides extensions on the `doc`, `span` and `token` level. The below tables summarise the extensions.
+
+### `spaCy.Doc` level extensions
+
+| Extension | Type | Description | Default |
+|------------------------------|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| doc._.contextual_spellCheck | `Boolean` | To check whether contextualSpellCheck is added as extension | `True` |
+| doc._.performed_spellCheck | `Boolean` | To check whether contextualSpellCheck identified any misspells and performed correction | `False` |
+| doc._.suggestions_spellCheck | `{Spacy.Token:str}` | if corrections are performed, it returns the mapping of misspell token (`spaCy.Token`) with suggested word(`str`) | `{}` |
+| doc._.outcome_spellCheck | `str` | corrected sentence(`str`) as output | `""` |
+| doc._.score_spellCheck | `{Spacy.Token:List(str,float)}` | if corrections are identified, it returns the mapping of misspell token (`spaCy.Token`) with suggested words(`str`) and probability of that correction | `None` |
+
+### `spaCy.Span` level extensions
+| Extension | Type | Description | Default |
+|-------------------------------|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
+| span._.get_has_spellCheck | `Boolean` | To check whether contextualSpellCheck identified any misspells and performed correction in this span | `False` |
+| span._.score_spellCheck | `{Spacy.Token:List(str,float)}` | if corrections are identified, it returns the mapping of misspell token (`spaCy.Token`) with suggested words(`str`) and probability of that correction for tokens in this `span` | `{spaCy.Token: []}` |
+
+### `spaCy.Token` level extensions
+
+| Extension | Type | Description | Default |
+|-----------------------------------|-----------------|-------------------------------------------------------------------------------------------------------------|---------|
+| token._.get_require_spellCheck | `Boolean` | To check whether contextualSpellCheck identified any misspells and performed correction on this `token` | `False` |
+| token._.get_suggestion_spellCheck | `str` | if corrections are performed, it returns the suggested word(`str`) | `""` |
+| token._.score_spellCheck | `[(str,float)]` | if corrections are identified, it returns suggested words(`str`) and probability(`float`) of that correction | `[]` |
+
+## API
+
+At present, there is a simple GET API to get you started. You can run the app in your local and play with it.
+
+Query: You can use the endpoint http://127.0.0.1:5000/?query=YOUR-QUERY
+Note: Your browser can handle the text encoding
+
+```
+GET Request: http://localhost:5000/?query=Income%20was%20$9.4%20milion%20compared%20to%20the%20prior%20year%20of%20$2.7%20milion.
+```
+
+Response:
+
+```json
+{
+ "success": true,
+ "input": "Income was $9.4 milion compared to the prior year of $2.7 milion.",
+ "corrected": "Income was $9.4 milion compared to the prior year of $2.7 milion.",
+ "suggestion_score": {
+ "milion": [
+ [
+ "million",
+ 0.59422
+ ],
+ [
+ "billion",
+ 0.24349
+ ],
+ ...
+ ],
+ "milion:1": [
+ [
+ "billion",
+ 0.65934
+ ],
+ [
+ "million",
+ 0.26185
+ ],
+ ...
+ ]
+ }
+}
+```
+
+## Task List
+
+- [ ] use cython for part of the code to improve performance ([#39](https://github.com/R1j1t/contextualSpellCheck/issues/39))
+- [ ] Improve metric for candidate selection ([#40](https://github.com/R1j1t/contextualSpellCheck/issues/40))
+- [ ] Add examples for other langauges ([#41](https://github.com/R1j1t/contextualSpellCheck/issues/41))
+- [ ] Update the logic of misspell identification (OOV) ([#44](https://github.com/R1j1t/contextualSpellCheck/issues/44))
+- [ ] better candidate generation (solved by [#44](https://github.com/R1j1t/contextualSpellCheck/issues/44)?)
+- [ ] add metric by testing on datasets
+- [ ] Improve documentation
+- [ ] Improve logging in code
+- [ ] Add support for Real Word Error (RWE) (Big Task)
+- [ ] add multi mask out capability
+
+<details><summary>Completed Task</summary>
+<p>
+
+- [x] specify maximum edit distance for `candidateRanking`
+- [x] allow user to specify bert model
+- [x] Include transformers deTokenizer to get better suggestions
+- [x] dependency version in setup.py ([#38](https://github.com/R1j1t/contextualSpellCheck/issues/38))
+
+</p>
+</details>
+
+## Support and contribution
+
+If you like the project, please ⭑ the project and show your support! Also, if you feel, the current behaviour is not as expected, please feel free to raise an [issue](https://github.com/R1j1t/contextualSpellCheck/issues). If you can help with any of the above tasks, please open a [PR](https://github.com/R1j1t/contextualSpellCheck/pulls) with necessary changes to documentation and tests.
+
+## Cite
+
+If you are using contextualSpellCheck in your academic work, please consider citing the library using the below BibTex entry:
+
+```bibtex
+@misc{Goel_Contextual_Spell_Check_2021,
+author = {Goel, Rajat},
+doi = {10.5281/zenodo.4642379},
+month = {3},
+title = {{Contextual Spell Check}},
+url = {https://github.com/R1j1t/contextualSpellCheck},
+year = {2021}
+}
+```
+
+
+
+## Reference
+
+Below are some of the projects/work I referred to while developing this package
+
+1. Explosion AI.Architecture. May 2020. url:https://spacy.io/api.
+2. Monojit Choudhury et al. “How difficult is it to develop a perfect spell-checker? A cross-linguistic analysis through complex network approach”. In:arXiv preprint physics/0703198(2007).
+3. Jacob Devlin et al. BERT: Pre-training of Deep Bidirectional Transform-ers for Language Understanding. 2019. arXiv:1810.04805 [cs.CL].
+4. Hugging Face.Fast Coreference Resolution in spaCy with Neural Net-works. May 2020. url:https://github.com/huggingface/neuralcoref.
+5. Ines.Chapter 3: Processing Pipelines. May 20202. url:https://course.spacy.io/en/chapter3.
+6. Eric Mays, Fred J Damerau, and Robert L Mercer. “Context based spellingcorrection”. In:Information Processing & Management27.5 (1991), pp. 517–522.
+7. Peter Norvig. How to Write a Spelling Corrector. May 2020. url:http://norvig.com/spell-correct.html.
+8. Yifu Sun and Haoming Jiang.Contextual Text Denoising with MaskedLanguage Models. 2019. arXiv:1910.14080 [cs.CL].
+9. Thomas Wolf et al. “Transformers: State-of-the-Art Natural LanguageProcessing”. In:Proceedings of the 2020 Conference on Empirical Methodsin Natural Language Processing: System Demonstrations. Online: Associ-ation for Computational Linguistics, Oct. 2020, pp. 38–45. url:https://www.aclweb.org/anthology/2020.emnlp-demos.6.
+
+[1]: <http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=52A3B869596656C9DA285DCE83A0339F?doi=10.1.1.146.4390&rep=rep1&type=pdf>
+
+
+%prep
+%autosetup -n contextualSpellCheck-0.4.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-contextualSpellCheck -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 10 2023 Python_Bot <Python_Bot@openeuler.org> - 0.4.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..b3ea094
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+3a060695fddd9b2fb0865e312f0c9f43 contextualSpellCheck-0.4.3.tar.gz