API reference
This page gives an overview of all neattext
objects,functions and methods.
Main Class
General Functions
For TextCleaner : remove
- remove_emails
- remove_emails
- remove_numbers
- remove_phone_numbers
- remove_urls
- remove_special_characters
- remove_emojis
- remove_stopwords
- remove_terms_in_bracket
- remove_accents
For TextCleaner : replace
- replace_emails
- replace_numbers
- replace_phone_numbers
- replace_urls
- replace_special_characters
- replace_emojis
- replace_stopwords
For TextExtract : extract
- extract_emails
- extract_numbers
- extract_phone_numbers
- extract_urls
- extract_special_characters
- extract_emojis
- extract_stopwords
- extract_terms_in_bracket
For TextMetrics : word statistics and counts
For TextPipeline : combine neattext function in a pipeline
The neattext TextFrame API is a frame-like class useful for cleaning text.It inherits all the methods of the TextCleaner hence it can be used for removing or replacing emails,numbers,phone numbers,special characters,emojis,etc
It can receive a text file and process it for a better output.
TextFrame(text=None)
The neattext TextCleaner API is a class useful for cleaning text by either removing or replacing emails,numbers,phone numbers,special characters,emojis,etc
TextCleaner(text=None)
The neattext TextExtractor API is a class useful for extracting as a list emails,numbers,phone numbers,special characters,emojis,etc . In other word the same things that you would want cleaned or removed.In that case when you use the TextExtractor class you inherit all the methods of the TextCleaning class.
TextExtractor(text=None)
The neattext TextMetrics API is a class useful for cleaning text by either removing or replacing emails,numbers,phone numbers,special characters,emojis,etc
TextMetrics(text=None)
The neattext TextPipeline API is a class useful for combining or chaining several text cleaning functions as one in a format of a pipeline that implement the fit
method on a given text.
TextPipeline(steps=[fxns])
TextCleaner remove methods
Clean text by using custom regex to remove emails
Clean text by using custom regex to remove numbers and digits
Clean text by using custom regex to remove phone numbers
Clean text by using custom regex to remove urls and websites
Clean text by using custom regex to remove special characters
Clean text by using custom regex to remove emojis and unicode representing emojis
Clean text by using custom regex to remove english stopwords
Clean text by using custom regex to remove dates
Clean text by using custom regex to remove terms in the specified bracket either [] or {}
Clean text by removing diacritics and accents from a given text
TextCleaner replace methods
Processes text by using custom regex to replace emails
Processes text by using custom regex to replace numbers and digits
Processes text by using custom regex to replace phone numbers
Processes text by using custom regex to replace urls and websites
Processes text by using custom regex to replace special characters
Processes text by using custom regex to replace emojis and unicode representing emojis
Processes text by using custom regex to replace english stopwords
TextExtractor Methods
Works on text by using custom regex to extract emails
Works on text by using custom regex to extract numbers and digits
Works on text by using custom regex to extract phone numbers
Works on text by using custom regex to extract urls and websites
Works on text by using custom regex to extract special characters
Works on text by using custom regex to extract emojis and unicode representing emojis
Works on text by using custom regex to extract english stopwords
Works on text by using custom regex to extract terms inside bracket ([] or {})
TextFrame Basics
Reads a text file and Creates A TextFrame for cleaning text
Gives a basic description of the text which includes length,tokens,vowel and consonant count,etc
Returns the first N characters in a text
Returns the last N characters in a text
Normalizes a text using two levels(shallow and deep) that removes all necessary noise to generate a clean text
Generate a word token of the text supplied using white-space or words
Generate a sentence token of the text supplied
Generate an N-gram of the text supplied
Generate a simple bag of words of text
Returns a list of unique tokens/values in a text
Returns the count/number of unique tokens in a text
Returns the amount of memory/bytes used by a given text
TextPipeline
Fit a group of neattext functions on a given text
Returns all the functions/steps in a given TextPipeline
Transform a given text using a group of neattext functions to a cleaner text