Contextual word checker for better suggestions
[](https://github.com/R1j1t/contextualSpellCheck/blob/master/LICENSE)
[](https://pypi.org/project/contextualSpellCheck/)
[](https://github.com/R1j1t/contextualSpellCheck#install)
[](https://pepy.tech/project/contextualspellcheck)
[](https://github.com/R1j1t/contextualSpellCheck/graphs/contributors)
[](https://github.com/R1j1t/contextualSpellCheck#task-list)
[](https://zenodo.org/badge/latestdoi/254703118)
## Types of spelling mistakes
It is essential to understand that identifying whether a candidate is a spelling error is a big task.
> Spelling errors are broadly classified as non- word errors (NWE) and real word errors (RWE). If the misspelt string is a valid word in the language, then it is called an RWE, else it is an NWE.
>
> -- [Monojit Choudhury et. al. (2007)][1]
This package currently focuses on Out of Vocabulary (OOV) word or non-word error (NWE) correction using BERT model. The idea of using BERT was to use the context when correcting OOV. To improve this package, I would like to extend the functionality to identify RWE, optimising the package, and improving the documentation.
## Install
The package can be installed using [pip](https://pypi.org/project/contextualSpellCheck/). You would require python 3.6+
```bash
pip install contextualSpellCheck
```
## Usage
**Note:** For use in other languages check [`examples`](https://github.com/R1j1t/contextualSpellCheck/tree/master/examples) folder.
### How to load the package in spacy pipeline
```python
>>> import contextualSpellCheck
>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
>>>
>>> ## We require NER to identify if a token is a PERSON
>>> ## also require parser because we use `Token.sent` for context
>>> nlp.pipe_names
['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
>>> contextualSpellCheck.add_to_pipe(nlp)
>>> nlp.pipe_names
['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'contextual spellchecker']
>>>
>>> doc = nlp('Income was $9.4 milion compared to the prior year of $2.7 milion.')
>>> doc._.outcome_spellCheck
'Income was $9.4 million compared to the prior year of $2.7 million.'
```
Or you can add to spaCy pipeline manually!
```python
>>> import spacy
>>> import contextualSpellCheck
>>>
>>> nlp = spacy.load("en_core_web_sm")
>>> nlp.pipe_names
['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
>>> # You can pass the optional parameters to the contextualSpellCheck
>>> # eg. pass max edit distance use config={"max_edit_dist": 3}
>>> nlp.add_pipe("contextual spellchecker")
- [x] specify maximum edit distance for `candidateRanking` - [x] allow user to specify bert model - [x] Include transformers deTokenizer to get better suggestions - [x] dependency version in setup.py ([#38](https://github.com/R1j1t/contextualSpellCheck/issues/38))
Contextual word checker for better suggestions
[](https://github.com/R1j1t/contextualSpellCheck/blob/master/LICENSE)
[](https://pypi.org/project/contextualSpellCheck/)
[](https://github.com/R1j1t/contextualSpellCheck#install)
[](https://pepy.tech/project/contextualspellcheck)
[](https://github.com/R1j1t/contextualSpellCheck/graphs/contributors)
[](https://github.com/R1j1t/contextualSpellCheck#task-list)
[](https://zenodo.org/badge/latestdoi/254703118)
## Types of spelling mistakes
It is essential to understand that identifying whether a candidate is a spelling error is a big task.
> Spelling errors are broadly classified as non- word errors (NWE) and real word errors (RWE). If the misspelt string is a valid word in the language, then it is called an RWE, else it is an NWE.
>
> -- [Monojit Choudhury et. al. (2007)][1]
This package currently focuses on Out of Vocabulary (OOV) word or non-word error (NWE) correction using BERT model. The idea of using BERT was to use the context when correcting OOV. To improve this package, I would like to extend the functionality to identify RWE, optimising the package, and improving the documentation.
## Install
The package can be installed using [pip](https://pypi.org/project/contextualSpellCheck/). You would require python 3.6+
```bash
pip install contextualSpellCheck
```
## Usage
**Note:** For use in other languages check [`examples`](https://github.com/R1j1t/contextualSpellCheck/tree/master/examples) folder.
### How to load the package in spacy pipeline
```python
>>> import contextualSpellCheck
>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
>>>
>>> ## We require NER to identify if a token is a PERSON
>>> ## also require parser because we use `Token.sent` for context
>>> nlp.pipe_names
['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
>>> contextualSpellCheck.add_to_pipe(nlp)
>>> nlp.pipe_names
['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'contextual spellchecker']
>>>
>>> doc = nlp('Income was $9.4 milion compared to the prior year of $2.7 milion.')
>>> doc._.outcome_spellCheck
'Income was $9.4 million compared to the prior year of $2.7 million.'
```
Or you can add to spaCy pipeline manually!
```python
>>> import spacy
>>> import contextualSpellCheck
>>>
>>> nlp = spacy.load("en_core_web_sm")
>>> nlp.pipe_names
['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
>>> # You can pass the optional parameters to the contextualSpellCheck
>>> # eg. pass max edit distance use config={"max_edit_dist": 3}
>>> nlp.add_pipe("contextual spellchecker")
- [x] specify maximum edit distance for `candidateRanking` - [x] allow user to specify bert model - [x] Include transformers deTokenizer to get better suggestions - [x] dependency version in setup.py ([#38](https://github.com/R1j1t/contextualSpellCheck/issues/38))
Contextual word checker for better suggestions
[](https://github.com/R1j1t/contextualSpellCheck/blob/master/LICENSE)
[](https://pypi.org/project/contextualSpellCheck/)
[](https://github.com/R1j1t/contextualSpellCheck#install)
[](https://pepy.tech/project/contextualspellcheck)
[](https://github.com/R1j1t/contextualSpellCheck/graphs/contributors)
[](https://github.com/R1j1t/contextualSpellCheck#task-list)
[](https://zenodo.org/badge/latestdoi/254703118)
## Types of spelling mistakes
It is essential to understand that identifying whether a candidate is a spelling error is a big task.
> Spelling errors are broadly classified as non- word errors (NWE) and real word errors (RWE). If the misspelt string is a valid word in the language, then it is called an RWE, else it is an NWE.
>
> -- [Monojit Choudhury et. al. (2007)][1]
This package currently focuses on Out of Vocabulary (OOV) word or non-word error (NWE) correction using BERT model. The idea of using BERT was to use the context when correcting OOV. To improve this package, I would like to extend the functionality to identify RWE, optimising the package, and improving the documentation.
## Install
The package can be installed using [pip](https://pypi.org/project/contextualSpellCheck/). You would require python 3.6+
```bash
pip install contextualSpellCheck
```
## Usage
**Note:** For use in other languages check [`examples`](https://github.com/R1j1t/contextualSpellCheck/tree/master/examples) folder.
### How to load the package in spacy pipeline
```python
>>> import contextualSpellCheck
>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
>>>
>>> ## We require NER to identify if a token is a PERSON
>>> ## also require parser because we use `Token.sent` for context
>>> nlp.pipe_names
['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
>>> contextualSpellCheck.add_to_pipe(nlp)
>>> nlp.pipe_names
['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'contextual spellchecker']
>>>
>>> doc = nlp('Income was $9.4 milion compared to the prior year of $2.7 milion.')
>>> doc._.outcome_spellCheck
'Income was $9.4 million compared to the prior year of $2.7 million.'
```
Or you can add to spaCy pipeline manually!
```python
>>> import spacy
>>> import contextualSpellCheck
>>>
>>> nlp = spacy.load("en_core_web_sm")
>>> nlp.pipe_names
['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
>>> # You can pass the optional parameters to the contextualSpellCheck
>>> # eg. pass max edit distance use config={"max_edit_dist": 3}
>>> nlp.add_pipe("contextual spellchecker")
- [x] specify maximum edit distance for `candidateRanking` - [x] allow user to specify bert model - [x] Include transformers deTokenizer to get better suggestions - [x] dependency version in setup.py ([#38](https://github.com/R1j1t/contextualSpellCheck/issues/38))