automatic import of python-neattext

author: CoprDistGit <infra@openeuler.org> 2023-05-31 04:24:08 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-31 04:24:08 +0000
commit: bf59a527964d6833eef525f885506fe7d3bffb50 (patch)
tree: 89e0ecbc6354d755c1f200725e2b69f9f4c7f73e
parent: b38233e209ea19c378a59eec2652096c0b8d4709 (diff)
3 files changed, 1181 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..6076671 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/neattext-0.1.3.tar.gz
diff --git a/python-neattext.spec b/python-neattext.spec
new file mode 100644
index 0000000..736162e
--- /dev/null
+++ b/python-neattext.spec
@@ -0,0 +1,1179 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-neattext
+Version:	0.1.3
+Release:	1
+Summary:	Neattext - a simple NLP package for cleaning text
+License:	MIT
+URL:		https://github.com/Jcharis/neattext
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/49/cd/b1224076d1b370a0172643f96155f97017ac5aba2c8b56f2999ba4061056/neattext-0.1.3.tar.gz
+BuildArch:	noarch
+
+
+%description
+# neattext
+NeatText:a simple NLP package for cleaning textual data and text preprocessing.
+Simplifying Text Cleaning For NLP & ML
+
+[![Build Status](https://travis-ci.org/Jcharis/neattext.svg?branch=master)](https://travis-ci.org/Jcharis/neattext)
+
+[![GitHub license](https://img.shields.io/github/license/Jcharis/neattext)](https://github.com/Jcharis/neattext/blob/master/LICENSE)
+
+#### Problem
++ Cleaning of unstructured text data
++ Reduce noise [special characters,stopwords]
++ Reducing repetition of using the same code for text preprocessing
+
+#### Solution
++ convert the already known solution for cleaning text into a reuseable package
+
+#### Docs
++ Check out the full docs [here](https://jcharis.github.io/neattext/)
+
+#### Installation
+```bash
+pip install neattext
+```
+
+### Usage
++ The OOP Way(Object Oriented Way)
++ NeatText offers 5 main classes for working with text data
+	- TextFrame : a frame-like object for cleaning text
+	- TextCleaner: remove or replace specifics
+	- TextExtractor: extract unwanted text data
+	- TextMetrics: word stats and metrics
+	- TextPipeline: combine multiple functions in a pipeline
+
+### Overall Components of NeatText
+![](images/neattext_features_jcharistech.png)
+
+### Using TextFrame
++ Keeps the text as `TextFrame` object. This allows us to do more with our text. 
++ It inherits the benefits of the TextCleaner and the TextMetrics out of the box with some additional features for handling text data.
++ This is the simplest way for text preprocessing with this library alternatively you can utilize the other classes too.
+
+
+```python
+>>> import neattext as nt 
+>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> docx = nt.TextFrame(text=mytext)
+>>> docx.text 
+"This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>>
+>>> docx.describe()
+Key      Value          
+Length  : 73             
+vowels  : 21             
+consonants: 34             
+stopwords: 4              
+punctuations: 8              
+special_char: 8              
+tokens(whitespace): 10             
+tokens(words): 14             
+>>> 
+>>> docx.length
+73
+>>> # Scan Percentage of Noise(Unclean data) in text
+>>> d.noise_scan()
+{'text_noise': 19.17808219178082, 'text_length': 73, 'noise_count': 14}
+>>> 
+>>> docs.head(16)
+'This is the mail'
+>>> docx.tail()
+>>> docx.count_vowels()
+>>> docx.count_stopwords()
+>>> docx.count_consonants()
+>>> docx.nlongest()
+>>> docx.nshortest()
+>>> docx.readability()
+```
+#### Basic NLP Task (Tokenization,Ngram,Text Generation)
+```python
+>>> docx.word_tokens()
+>>>
+>>> docx.sent_tokens()
+>>>
+>>> docx.term_freq()
+>>>
+>>> docx.bow()
+```
+#### Basic Text Preprocessing
+```python
+>>> docx.normalize()
+'this is the mail example@gmail.com ,our website is https://example.com 😊.'
+>>> docx.normalize(level='deep')
+'this is the mail examplegmailcom our website is httpsexamplecom '
+
+>>> docx.remove_puncts()
+>>> docx.remove_stopwords()
+>>> docx.remove_html_tags()
+>>> docx.remove_special_characters()
+>>> docx.remove_emojis()
+>>> docx.fix_contractions()
+```
+
+##### Handling Files with NeatText
++ Read txt file directly into TextFrame
+```python
+>>> import neattext as nt 
+>>> docx_df = nt.read_txt('file.txt')
+```
++ Alternatively you can instantiate a TextFrame and read a text file into it
+```python
+>>> import neattext as nt 
+>>> docx_df = nt.TextFrame().read_txt('file.txt')
+```
+
+##### Chaining Methods on TextFrame
+```python
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊 and it will cost $100 to subscribe."
+>>> docx = TextFrame(t1)
+>>> result = docx.remove_emails().remove_urls().remove_emojis()
+>>> print(result)
+'This is the mail  ,our WEBSITE is   and it will cost $100 to subscribe.'
+```
+
+
+
+#### Clean Text
++ Clean text by removing emails,numbers,stopwords,emojis,etc
++ A simplified method for cleaning text by specifying as True/False what to clean from a text
+```python
+>>> from neattext.functions import clean_text
+>>> 
+>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> 
+>>> clean_text(mytext)
+'mail example@gmail.com ,our website https://example.com .'
+```
++ You can remove punctuations,stopwords,urls,emojis,multiple_whitespaces,etc by setting them to True.
+
++ You can choose to remove or not remove punctuations by setting to True/False respectively
+
+```python
+>>> clean_text(mytext,puncts=True)
+'mail example@gmailcom website https://examplecom '
+>>> 
+>>> clean_text(mytext,puncts=False)
+'mail example@gmail.com ,our website https://example.com .'
+>>> 
+>>> clean_text(mytext,puncts=False,stopwords=False)
+'this is the mail example@gmail.com ,our website is https://example.com .'
+>>> 
+```
++ You can also remove the other non-needed items accordingly
+```python
+>>> clean_text(mytext,stopwords=False)
+'this is the mail example@gmail.com ,our website is https://example.com .'
+>>>
+>>> clean_text(mytext,urls=False)
+'mail example@gmail.com ,our website https://example.com .'
+>>> 
+>>> clean_text(mytext,urls=True)
+'mail example@gmail.com ,our website .'
+>>> 
+
+```
+
+#### Removing Punctuations [A Very Common Text Preprocessing Step]
++ You remove the most common punctuations such as fullstop,comma,exclamation marks and question marks by setting most_common=True which is the default
++ Alternatively you can also remove all known punctuations from a text.
+```python
+>>> import neattext as nt 
+>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. Please don't forget the email when you enter !!!!!"
+>>> docx = nt.TextFrame(mytext)
+>>> docx.remove_puncts()
+TextFrame(text="This is the mail example@gmailcom our WEBSITE is https://examplecom 😊 Please dont forget the email when you enter ")
+
+>>> docx.remove_puncts(most_common=False)
+TextFrame(text="This is the mail examplegmailcom our WEBSITE is httpsexamplecom 😊 Please dont forget the email when you enter ")
+```
+
+#### Removing Stopwords [A Very Common Text Preprocessing Step]
++ You can remove stopwords from a text by specifying the language. The default language is English
++ Supported Languages include English(en),Spanish(es),French(fr)|Russian(ru)|Yoruba(yo)|German(de)
+
+```python
+>>> import neattext as nt 
+>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. Please don't forget the email when you enter !!!!!"
+>>> docx = nt.TextFrame(mytext)
+>>> docx.remove_stopwords(lang='en')
+TextFrame(text="mail example@gmail.com ,our WEBSITE https://example.com 😊. forget email enter !!!!!")
+```
+
+
+#### Remove Emails,Numbers,Phone Numbers,Dates,Btc Address,VisaCard Address,etc 
+```python
+>>> print(docx.remove_emails())
+>>> 'This is the mail  ,our WEBSITE is https://example.com 😊.'
+>>>
+>>> print(docx.remove_stopwords())
+>>> 'This mail example@gmail.com ,our WEBSITE https://example.com 😊.'
+>>>
+>>> print(docx.remove_numbers())
+>>> docx.remove_phone_numbers()
+>>> docx.remove_btc_address()
+```
+
+
+#### Remove Special Characters
+```python
+>>> docx.remove_special_characters()
+```
+
+#### Remove Emojis
+```python
+>>> print(docx.remove_emojis())
+>>> 'This is the mail example@gmail.com ,our WEBSITE is https://example.com .'
+```
+
+
+#### Remove Custom Pattern
++ You can also specify your own custom pattern, incase you cannot find what you need in the functions using the `remove_custom_pattern()` function
+```python
+>>> import neattext.functions as nfx 
+>>> ex = "Last !RT tweeter multiple &#7777"
+>>> 
+>>> nfx.remove_custom_pattern(e,r'&#\d+')
+'Last !RT tweeter multiple  '
+
+
+
+```
+
+#### Replace Emails,Numbers,Phone Numbers
+```python
+>>> docx.replace_emails()
+>>> docx.replace_numbers()
+>>> docx.replace_phone_numbers()
+```
+
+#### Chain Multiple Methods
+```python
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊 and it will cost $100 to subscribe."
+>>> docx = TextCleaner(t1)
+>>> result = docx.remove_emails().remove_urls().remove_emojis()
+>>> print(result)
+'This is the mail  ,our WEBSITE is   and it will cost $100 to subscribe.'
+
+```
+
+### Using TextExtractor
++ To Extract emails,phone numbers,numbers,urls,emojis from text
+```python
+>>> from neattext import TextExtractor
+>>> docx = TextExtractor()
+>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> docx.extract_emails()
+>>> ['example@gmail.com']
+>>>
+>>> docx.extract_emojis()
+>>> ['😊']
+```
+
+
+### Using TextMetrics
++ To Find the Words Stats such as counts of vowels,consonants,stopwords,word-stats
+```python
+>>> from neattext import TextMetrics
+>>> docx = TextMetrics()
+>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> docx.count_vowels()
+>>> docx.count_consonants()
+>>> docx.count_stopwords()
+>>> docx.word_stats()
+>>> docx.memory_usage()
+```
+
+### Usage 
++ The MOP(method/function oriented way) Way
+
+```python
+>>> from neattext.functions import clean_text,extract_emails
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com ."
+>>> clean_text(t1,puncts=True,stopwords=True)
+>>>'this mail examplegmailcom website httpsexamplecom'
+>>> extract_emails(t1)
+>>> ['example@gmail.com']
+```
+
++ Alternatively you can also use this approach
+```python
+>>> import neattext.functions as nfx 
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com ."
+>>> nfx.clean_text(t1,puncts=True,stopwords=True)
+>>>'this mail examplegmailcom website httpsexamplecom'
+>>> nfx.extract_emails(t1)
+>>> ['example@gmail.com']
+```
+
+### Explainer
++ Explain an emoji or unicode for emoji 
+	- emoji_explainer()
+	- emojify()
+	- unicode_2_emoji()
+
+
+```python
+>>> from neattext.explainer import emojify
+>>> emojify('Smiley')
+>>> '😃'
+```
+
+```python
+>>> from neattext.explainer import emoji_explainer
+>>> emoji_explainer('😃')
+>>> 'SMILING FACE WITH OPEN MOUTH'
+```
+
+```python
+>>> from neattext.explainer import unicode_2_emoji
+>>> unicode_2_emoji('0x1f49b')
+	'FLUSHED FACE'
+```
+
+### Usage 
++ The Pipeline Way
+
+```python
+>>> from neattext.pipeline import TextPipeline
+>>> t1 = """This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. This is visa 4111 1111 1111 1111 and bitcoin 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 with mastercard 5500 0000 0000 0004. Send it to PO Box 555, KNU"""
+
+>>> p = TextPipeline(steps=[remove_emails,remove_numbers,remove_emojis])
+>>> p.fit(t1)
+'This is the mail  ,our WEBSITE is https://example.com . This is visa     and bitcoin BvBMSEYstWetqTFnAumGFgxJaNVN with mastercard    . Send it to PO Box , KNU'
+
+```
++ Check For steps and named steps
+```python
+>>> p.steps
+>>> p.named_steps
+```
+
++ Alternatively you can also use this approach
+
+
+
+
+### Documentation
+Please read the [documentation](https://github.com/Jcharis/neattext/wiki) for more information on what neattext does and how to use is for your needs.You can also check 
+out our readthedocs page [here](https://jcharis.github.io/neattext/)
+
+
+### More Features To Add
++ basic nlp task
++ currency normalizer
+
+#### Acknowledgements
++ Inspired by packages like `clean-text` from Johannes Fillter and `textify` by JCharisTech
+
+
+#### NB
++ Contributions Are Welcomed
++ Notice a bug, please let us know.
++ Thanks A lot
+
+
+#### By
++ Jesse E.Agbe(JCharis)
++ Jesus Saves @JCharisTech
+
+
+
+
+
+%package -n python3-neattext
+Summary:	Neattext - a simple NLP package for cleaning text
+Provides:	python-neattext
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-neattext
+# neattext
+NeatText:a simple NLP package for cleaning textual data and text preprocessing.
+Simplifying Text Cleaning For NLP & ML
+
+[![Build Status](https://travis-ci.org/Jcharis/neattext.svg?branch=master)](https://travis-ci.org/Jcharis/neattext)
+
+[![GitHub license](https://img.shields.io/github/license/Jcharis/neattext)](https://github.com/Jcharis/neattext/blob/master/LICENSE)
+
+#### Problem
++ Cleaning of unstructured text data
++ Reduce noise [special characters,stopwords]
++ Reducing repetition of using the same code for text preprocessing
+
+#### Solution
++ convert the already known solution for cleaning text into a reuseable package
+
+#### Docs
++ Check out the full docs [here](https://jcharis.github.io/neattext/)
+
+#### Installation
+```bash
+pip install neattext
+```
+
+### Usage
++ The OOP Way(Object Oriented Way)
++ NeatText offers 5 main classes for working with text data
+	- TextFrame : a frame-like object for cleaning text
+	- TextCleaner: remove or replace specifics
+	- TextExtractor: extract unwanted text data
+	- TextMetrics: word stats and metrics
+	- TextPipeline: combine multiple functions in a pipeline
+
+### Overall Components of NeatText
+![](images/neattext_features_jcharistech.png)
+
+### Using TextFrame
++ Keeps the text as `TextFrame` object. This allows us to do more with our text. 
++ It inherits the benefits of the TextCleaner and the TextMetrics out of the box with some additional features for handling text data.
++ This is the simplest way for text preprocessing with this library alternatively you can utilize the other classes too.
+
+
+```python
+>>> import neattext as nt 
+>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> docx = nt.TextFrame(text=mytext)
+>>> docx.text 
+"This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>>
+>>> docx.describe()
+Key      Value          
+Length  : 73             
+vowels  : 21             
+consonants: 34             
+stopwords: 4              
+punctuations: 8              
+special_char: 8              
+tokens(whitespace): 10             
+tokens(words): 14             
+>>> 
+>>> docx.length
+73
+>>> # Scan Percentage of Noise(Unclean data) in text
+>>> d.noise_scan()
+{'text_noise': 19.17808219178082, 'text_length': 73, 'noise_count': 14}
+>>> 
+>>> docs.head(16)
+'This is the mail'
+>>> docx.tail()
+>>> docx.count_vowels()
+>>> docx.count_stopwords()
+>>> docx.count_consonants()
+>>> docx.nlongest()
+>>> docx.nshortest()
+>>> docx.readability()
+```
+#### Basic NLP Task (Tokenization,Ngram,Text Generation)
+```python
+>>> docx.word_tokens()
+>>>
+>>> docx.sent_tokens()
+>>>
+>>> docx.term_freq()
+>>>
+>>> docx.bow()
+```
+#### Basic Text Preprocessing
+```python
+>>> docx.normalize()
+'this is the mail example@gmail.com ,our website is https://example.com 😊.'
+>>> docx.normalize(level='deep')
+'this is the mail examplegmailcom our website is httpsexamplecom '
+
+>>> docx.remove_puncts()
+>>> docx.remove_stopwords()
+>>> docx.remove_html_tags()
+>>> docx.remove_special_characters()
+>>> docx.remove_emojis()
+>>> docx.fix_contractions()
+```
+
+##### Handling Files with NeatText
++ Read txt file directly into TextFrame
+```python
+>>> import neattext as nt 
+>>> docx_df = nt.read_txt('file.txt')
+```
++ Alternatively you can instantiate a TextFrame and read a text file into it
+```python
+>>> import neattext as nt 
+>>> docx_df = nt.TextFrame().read_txt('file.txt')
+```
+
+##### Chaining Methods on TextFrame
+```python
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊 and it will cost $100 to subscribe."
+>>> docx = TextFrame(t1)
+>>> result = docx.remove_emails().remove_urls().remove_emojis()
+>>> print(result)
+'This is the mail  ,our WEBSITE is   and it will cost $100 to subscribe.'
+```
+
+
+
+#### Clean Text
++ Clean text by removing emails,numbers,stopwords,emojis,etc
++ A simplified method for cleaning text by specifying as True/False what to clean from a text
+```python
+>>> from neattext.functions import clean_text
+>>> 
+>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> 
+>>> clean_text(mytext)
+'mail example@gmail.com ,our website https://example.com .'
+```
++ You can remove punctuations,stopwords,urls,emojis,multiple_whitespaces,etc by setting them to True.
+
++ You can choose to remove or not remove punctuations by setting to True/False respectively
+
+```python
+>>> clean_text(mytext,puncts=True)
+'mail example@gmailcom website https://examplecom '
+>>> 
+>>> clean_text(mytext,puncts=False)
+'mail example@gmail.com ,our website https://example.com .'
+>>> 
+>>> clean_text(mytext,puncts=False,stopwords=False)
+'this is the mail example@gmail.com ,our website is https://example.com .'
+>>> 
+```
++ You can also remove the other non-needed items accordingly
+```python
+>>> clean_text(mytext,stopwords=False)
+'this is the mail example@gmail.com ,our website is https://example.com .'
+>>>
+>>> clean_text(mytext,urls=False)
+'mail example@gmail.com ,our website https://example.com .'
+>>> 
+>>> clean_text(mytext,urls=True)
+'mail example@gmail.com ,our website .'
+>>> 
+
+```
+
+#### Removing Punctuations [A Very Common Text Preprocessing Step]
++ You remove the most common punctuations such as fullstop,comma,exclamation marks and question marks by setting most_common=True which is the default
++ Alternatively you can also remove all known punctuations from a text.
+```python
+>>> import neattext as nt 
+>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. Please don't forget the email when you enter !!!!!"
+>>> docx = nt.TextFrame(mytext)
+>>> docx.remove_puncts()
+TextFrame(text="This is the mail example@gmailcom our WEBSITE is https://examplecom 😊 Please dont forget the email when you enter ")
+
+>>> docx.remove_puncts(most_common=False)
+TextFrame(text="This is the mail examplegmailcom our WEBSITE is httpsexamplecom 😊 Please dont forget the email when you enter ")
+```
+
+#### Removing Stopwords [A Very Common Text Preprocessing Step]
++ You can remove stopwords from a text by specifying the language. The default language is English
++ Supported Languages include English(en),Spanish(es),French(fr)|Russian(ru)|Yoruba(yo)|German(de)
+
+```python
+>>> import neattext as nt 
+>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. Please don't forget the email when you enter !!!!!"
+>>> docx = nt.TextFrame(mytext)
+>>> docx.remove_stopwords(lang='en')
+TextFrame(text="mail example@gmail.com ,our WEBSITE https://example.com 😊. forget email enter !!!!!")
+```
+
+
+#### Remove Emails,Numbers,Phone Numbers,Dates,Btc Address,VisaCard Address,etc 
+```python
+>>> print(docx.remove_emails())
+>>> 'This is the mail  ,our WEBSITE is https://example.com 😊.'
+>>>
+>>> print(docx.remove_stopwords())
+>>> 'This mail example@gmail.com ,our WEBSITE https://example.com 😊.'
+>>>
+>>> print(docx.remove_numbers())
+>>> docx.remove_phone_numbers()
+>>> docx.remove_btc_address()
+```
+
+
+#### Remove Special Characters
+```python
+>>> docx.remove_special_characters()
+```
+
+#### Remove Emojis
+```python
+>>> print(docx.remove_emojis())
+>>> 'This is the mail example@gmail.com ,our WEBSITE is https://example.com .'
+```
+
+
+#### Remove Custom Pattern
++ You can also specify your own custom pattern, incase you cannot find what you need in the functions using the `remove_custom_pattern()` function
+```python
+>>> import neattext.functions as nfx 
+>>> ex = "Last !RT tweeter multiple &#7777"
+>>> 
+>>> nfx.remove_custom_pattern(e,r'&#\d+')
+'Last !RT tweeter multiple  '
+
+
+
+```
+
+#### Replace Emails,Numbers,Phone Numbers
+```python
+>>> docx.replace_emails()
+>>> docx.replace_numbers()
+>>> docx.replace_phone_numbers()
+```
+
+#### Chain Multiple Methods
+```python
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊 and it will cost $100 to subscribe."
+>>> docx = TextCleaner(t1)
+>>> result = docx.remove_emails().remove_urls().remove_emojis()
+>>> print(result)
+'This is the mail  ,our WEBSITE is   and it will cost $100 to subscribe.'
+
+```
+
+### Using TextExtractor
++ To Extract emails,phone numbers,numbers,urls,emojis from text
+```python
+>>> from neattext import TextExtractor
+>>> docx = TextExtractor()
+>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> docx.extract_emails()
+>>> ['example@gmail.com']
+>>>
+>>> docx.extract_emojis()
+>>> ['😊']
+```
+
+
+### Using TextMetrics
++ To Find the Words Stats such as counts of vowels,consonants,stopwords,word-stats
+```python
+>>> from neattext import TextMetrics
+>>> docx = TextMetrics()
+>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> docx.count_vowels()
+>>> docx.count_consonants()
+>>> docx.count_stopwords()
+>>> docx.word_stats()
+>>> docx.memory_usage()
+```
+
+### Usage 
++ The MOP(method/function oriented way) Way
+
+```python
+>>> from neattext.functions import clean_text,extract_emails
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com ."
+>>> clean_text(t1,puncts=True,stopwords=True)
+>>>'this mail examplegmailcom website httpsexamplecom'
+>>> extract_emails(t1)
+>>> ['example@gmail.com']
+```
+
++ Alternatively you can also use this approach
+```python
+>>> import neattext.functions as nfx 
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com ."
+>>> nfx.clean_text(t1,puncts=True,stopwords=True)
+>>>'this mail examplegmailcom website httpsexamplecom'
+>>> nfx.extract_emails(t1)
+>>> ['example@gmail.com']
+```
+
+### Explainer
++ Explain an emoji or unicode for emoji 
+	- emoji_explainer()
+	- emojify()
+	- unicode_2_emoji()
+
+
+```python
+>>> from neattext.explainer import emojify
+>>> emojify('Smiley')
+>>> '😃'
+```
+
+```python
+>>> from neattext.explainer import emoji_explainer
+>>> emoji_explainer('😃')
+>>> 'SMILING FACE WITH OPEN MOUTH'
+```
+
+```python
+>>> from neattext.explainer import unicode_2_emoji
+>>> unicode_2_emoji('0x1f49b')
+	'FLUSHED FACE'
+```
+
+### Usage 
++ The Pipeline Way
+
+```python
+>>> from neattext.pipeline import TextPipeline
+>>> t1 = """This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. This is visa 4111 1111 1111 1111 and bitcoin 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 with mastercard 5500 0000 0000 0004. Send it to PO Box 555, KNU"""
+
+>>> p = TextPipeline(steps=[remove_emails,remove_numbers,remove_emojis])
+>>> p.fit(t1)
+'This is the mail  ,our WEBSITE is https://example.com . This is visa     and bitcoin BvBMSEYstWetqTFnAumGFgxJaNVN with mastercard    . Send it to PO Box , KNU'
+
+```
++ Check For steps and named steps
+```python
+>>> p.steps
+>>> p.named_steps
+```
+
++ Alternatively you can also use this approach
+
+
+
+
+### Documentation
+Please read the [documentation](https://github.com/Jcharis/neattext/wiki) for more information on what neattext does and how to use is for your needs.You can also check 
+out our readthedocs page [here](https://jcharis.github.io/neattext/)
+
+
+### More Features To Add
++ basic nlp task
++ currency normalizer
+
+#### Acknowledgements
++ Inspired by packages like `clean-text` from Johannes Fillter and `textify` by JCharisTech
+
+
+#### NB
++ Contributions Are Welcomed
++ Notice a bug, please let us know.
++ Thanks A lot
+
+
+#### By
++ Jesse E.Agbe(JCharis)
++ Jesus Saves @JCharisTech
+
+
+
+
+
+%package help
+Summary:	Development documents and examples for neattext
+Provides:	python3-neattext-doc
+%description help
+# neattext
+NeatText:a simple NLP package for cleaning textual data and text preprocessing.
+Simplifying Text Cleaning For NLP & ML
+
+[![Build Status](https://travis-ci.org/Jcharis/neattext.svg?branch=master)](https://travis-ci.org/Jcharis/neattext)
+
+[![GitHub license](https://img.shields.io/github/license/Jcharis/neattext)](https://github.com/Jcharis/neattext/blob/master/LICENSE)
+
+#### Problem
++ Cleaning of unstructured text data
++ Reduce noise [special characters,stopwords]
++ Reducing repetition of using the same code for text preprocessing
+
+#### Solution
++ convert the already known solution for cleaning text into a reuseable package
+
+#### Docs
++ Check out the full docs [here](https://jcharis.github.io/neattext/)
+
+#### Installation
+```bash
+pip install neattext
+```
+
+### Usage
++ The OOP Way(Object Oriented Way)
++ NeatText offers 5 main classes for working with text data
+	- TextFrame : a frame-like object for cleaning text
+	- TextCleaner: remove or replace specifics
+	- TextExtractor: extract unwanted text data
+	- TextMetrics: word stats and metrics
+	- TextPipeline: combine multiple functions in a pipeline
+
+### Overall Components of NeatText
+![](images/neattext_features_jcharistech.png)
+
+### Using TextFrame
++ Keeps the text as `TextFrame` object. This allows us to do more with our text. 
++ It inherits the benefits of the TextCleaner and the TextMetrics out of the box with some additional features for handling text data.
++ This is the simplest way for text preprocessing with this library alternatively you can utilize the other classes too.
+
+
+```python
+>>> import neattext as nt 
+>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> docx = nt.TextFrame(text=mytext)
+>>> docx.text 
+"This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>>
+>>> docx.describe()
+Key      Value          
+Length  : 73             
+vowels  : 21             
+consonants: 34             
+stopwords: 4              
+punctuations: 8              
+special_char: 8              
+tokens(whitespace): 10             
+tokens(words): 14             
+>>> 
+>>> docx.length
+73
+>>> # Scan Percentage of Noise(Unclean data) in text
+>>> d.noise_scan()
+{'text_noise': 19.17808219178082, 'text_length': 73, 'noise_count': 14}
+>>> 
+>>> docs.head(16)
+'This is the mail'
+>>> docx.tail()
+>>> docx.count_vowels()
+>>> docx.count_stopwords()
+>>> docx.count_consonants()
+>>> docx.nlongest()
+>>> docx.nshortest()
+>>> docx.readability()
+```
+#### Basic NLP Task (Tokenization,Ngram,Text Generation)
+```python
+>>> docx.word_tokens()
+>>>
+>>> docx.sent_tokens()
+>>>
+>>> docx.term_freq()
+>>>
+>>> docx.bow()
+```
+#### Basic Text Preprocessing
+```python
+>>> docx.normalize()
+'this is the mail example@gmail.com ,our website is https://example.com 😊.'
+>>> docx.normalize(level='deep')
+'this is the mail examplegmailcom our website is httpsexamplecom '
+
+>>> docx.remove_puncts()
+>>> docx.remove_stopwords()
+>>> docx.remove_html_tags()
+>>> docx.remove_special_characters()
+>>> docx.remove_emojis()
+>>> docx.fix_contractions()
+```
+
+##### Handling Files with NeatText
++ Read txt file directly into TextFrame
+```python
+>>> import neattext as nt 
+>>> docx_df = nt.read_txt('file.txt')
+```
++ Alternatively you can instantiate a TextFrame and read a text file into it
+```python
+>>> import neattext as nt 
+>>> docx_df = nt.TextFrame().read_txt('file.txt')
+```
+
+##### Chaining Methods on TextFrame
+```python
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊 and it will cost $100 to subscribe."
+>>> docx = TextFrame(t1)
+>>> result = docx.remove_emails().remove_urls().remove_emojis()
+>>> print(result)
+'This is the mail  ,our WEBSITE is   and it will cost $100 to subscribe.'
+```
+
+
+
+#### Clean Text
++ Clean text by removing emails,numbers,stopwords,emojis,etc
++ A simplified method for cleaning text by specifying as True/False what to clean from a text
+```python
+>>> from neattext.functions import clean_text
+>>> 
+>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> 
+>>> clean_text(mytext)
+'mail example@gmail.com ,our website https://example.com .'
+```
++ You can remove punctuations,stopwords,urls,emojis,multiple_whitespaces,etc by setting them to True.
+
++ You can choose to remove or not remove punctuations by setting to True/False respectively
+
+```python
+>>> clean_text(mytext,puncts=True)
+'mail example@gmailcom website https://examplecom '
+>>> 
+>>> clean_text(mytext,puncts=False)
+'mail example@gmail.com ,our website https://example.com .'
+>>> 
+>>> clean_text(mytext,puncts=False,stopwords=False)
+'this is the mail example@gmail.com ,our website is https://example.com .'
+>>> 
+```
++ You can also remove the other non-needed items accordingly
+```python
+>>> clean_text(mytext,stopwords=False)
+'this is the mail example@gmail.com ,our website is https://example.com .'
+>>>
+>>> clean_text(mytext,urls=False)
+'mail example@gmail.com ,our website https://example.com .'
+>>> 
+>>> clean_text(mytext,urls=True)
+'mail example@gmail.com ,our website .'
+>>> 
+
+```
+
+#### Removing Punctuations [A Very Common Text Preprocessing Step]
++ You remove the most common punctuations such as fullstop,comma,exclamation marks and question marks by setting most_common=True which is the default
++ Alternatively you can also remove all known punctuations from a text.
+```python
+>>> import neattext as nt 
+>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. Please don't forget the email when you enter !!!!!"
+>>> docx = nt.TextFrame(mytext)
+>>> docx.remove_puncts()
+TextFrame(text="This is the mail example@gmailcom our WEBSITE is https://examplecom 😊 Please dont forget the email when you enter ")
+
+>>> docx.remove_puncts(most_common=False)
+TextFrame(text="This is the mail examplegmailcom our WEBSITE is httpsexamplecom 😊 Please dont forget the email when you enter ")
+```
+
+#### Removing Stopwords [A Very Common Text Preprocessing Step]
++ You can remove stopwords from a text by specifying the language. The default language is English
++ Supported Languages include English(en),Spanish(es),French(fr)|Russian(ru)|Yoruba(yo)|German(de)
+
+```python
+>>> import neattext as nt 
+>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. Please don't forget the email when you enter !!!!!"
+>>> docx = nt.TextFrame(mytext)
+>>> docx.remove_stopwords(lang='en')
+TextFrame(text="mail example@gmail.com ,our WEBSITE https://example.com 😊. forget email enter !!!!!")
+```
+
+
+#### Remove Emails,Numbers,Phone Numbers,Dates,Btc Address,VisaCard Address,etc 
+```python
+>>> print(docx.remove_emails())
+>>> 'This is the mail  ,our WEBSITE is https://example.com 😊.'
+>>>
+>>> print(docx.remove_stopwords())
+>>> 'This mail example@gmail.com ,our WEBSITE https://example.com 😊.'
+>>>
+>>> print(docx.remove_numbers())
+>>> docx.remove_phone_numbers()
+>>> docx.remove_btc_address()
+```
+
+
+#### Remove Special Characters
+```python
+>>> docx.remove_special_characters()
+```
+
+#### Remove Emojis
+```python
+>>> print(docx.remove_emojis())
+>>> 'This is the mail example@gmail.com ,our WEBSITE is https://example.com .'
+```
+
+
+#### Remove Custom Pattern
++ You can also specify your own custom pattern, incase you cannot find what you need in the functions using the `remove_custom_pattern()` function
+```python
+>>> import neattext.functions as nfx 
+>>> ex = "Last !RT tweeter multiple &#7777"
+>>> 
+>>> nfx.remove_custom_pattern(e,r'&#\d+')
+'Last !RT tweeter multiple  '
+
+
+
+```
+
+#### Replace Emails,Numbers,Phone Numbers
+```python
+>>> docx.replace_emails()
+>>> docx.replace_numbers()
+>>> docx.replace_phone_numbers()
+```
+
+#### Chain Multiple Methods
+```python
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊 and it will cost $100 to subscribe."
+>>> docx = TextCleaner(t1)
+>>> result = docx.remove_emails().remove_urls().remove_emojis()
+>>> print(result)
+'This is the mail  ,our WEBSITE is   and it will cost $100 to subscribe.'
+
+```
+
+### Using TextExtractor
++ To Extract emails,phone numbers,numbers,urls,emojis from text
+```python
+>>> from neattext import TextExtractor
+>>> docx = TextExtractor()
+>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> docx.extract_emails()
+>>> ['example@gmail.com']
+>>>
+>>> docx.extract_emojis()
+>>> ['😊']
+```
+
+
+### Using TextMetrics
++ To Find the Words Stats such as counts of vowels,consonants,stopwords,word-stats
+```python
+>>> from neattext import TextMetrics
+>>> docx = TextMetrics()
+>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."
+>>> docx.count_vowels()
+>>> docx.count_consonants()
+>>> docx.count_stopwords()
+>>> docx.word_stats()
+>>> docx.memory_usage()
+```
+
+### Usage 
++ The MOP(method/function oriented way) Way
+
+```python
+>>> from neattext.functions import clean_text,extract_emails
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com ."
+>>> clean_text(t1,puncts=True,stopwords=True)
+>>>'this mail examplegmailcom website httpsexamplecom'
+>>> extract_emails(t1)
+>>> ['example@gmail.com']
+```
+
++ Alternatively you can also use this approach
+```python
+>>> import neattext.functions as nfx 
+>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com ."
+>>> nfx.clean_text(t1,puncts=True,stopwords=True)
+>>>'this mail examplegmailcom website httpsexamplecom'
+>>> nfx.extract_emails(t1)
+>>> ['example@gmail.com']
+```
+
+### Explainer
++ Explain an emoji or unicode for emoji 
+	- emoji_explainer()
+	- emojify()
+	- unicode_2_emoji()
+
+
+```python
+>>> from neattext.explainer import emojify
+>>> emojify('Smiley')
+>>> '😃'
+```
+
+```python
+>>> from neattext.explainer import emoji_explainer
+>>> emoji_explainer('😃')
+>>> 'SMILING FACE WITH OPEN MOUTH'
+```
+
+```python
+>>> from neattext.explainer import unicode_2_emoji
+>>> unicode_2_emoji('0x1f49b')
+	'FLUSHED FACE'
+```
+
+### Usage 
++ The Pipeline Way
+
+```python
+>>> from neattext.pipeline import TextPipeline
+>>> t1 = """This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. This is visa 4111 1111 1111 1111 and bitcoin 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 with mastercard 5500 0000 0000 0004. Send it to PO Box 555, KNU"""
+
+>>> p = TextPipeline(steps=[remove_emails,remove_numbers,remove_emojis])
+>>> p.fit(t1)
+'This is the mail  ,our WEBSITE is https://example.com . This is visa     and bitcoin BvBMSEYstWetqTFnAumGFgxJaNVN with mastercard    . Send it to PO Box , KNU'
+
+```
++ Check For steps and named steps
+```python
+>>> p.steps
+>>> p.named_steps
+```
+
++ Alternatively you can also use this approach
+
+
+
+
+### Documentation
+Please read the [documentation](https://github.com/Jcharis/neattext/wiki) for more information on what neattext does and how to use is for your needs.You can also check 
+out our readthedocs page [here](https://jcharis.github.io/neattext/)
+
+
+### More Features To Add
++ basic nlp task
++ currency normalizer
+
+#### Acknowledgements
++ Inspired by packages like `clean-text` from Johannes Fillter and `textify` by JCharisTech
+
+
+#### NB
++ Contributions Are Welcomed
++ Notice a bug, please let us know.
++ Thanks A lot
+
+
+#### By
++ Jesse E.Agbe(JCharis)
++ Jesus Saves @JCharisTech
+
+
+
+
+
+%prep
+%autosetup -n neattext-0.1.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-neattext -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 31 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..3ecb3f2
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+43dc7e1df9d75a1fa0ef42c6339a91a1  neattext-0.1.3.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-31 04:24:08 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-31 04:24:08 +0000
commit	bf59a527964d6833eef525f885506fe7d3bffb50 (patch)
tree	89e0ecbc6354d755c1f200725e2b69f9f4c7f73e
parent	b38233e209ea19c378a59eec2652096c0b8d4709 (diff)