API reference

This page gives an overview of all neattext objects,functions and methods.

Main Class

General Functions

For TextCleaner : remove

For TextCleaner : replace

For TextExtract : extract

For TextMetrics : word statistics and counts

For TextPipeline : combine neattext function in a pipeline

TextFrame

The neattext TextFrame API is a frame-like class useful for cleaning text.It inherits all the methods of the TextCleaner hence it can be used for removing or replacing emails,numbers,phone numbers,special characters,emojis,etc

It can receive a text file and process it for a better output.

TextFrame(text=None)

TextCleaner

The neattext TextCleaner API is a class useful for cleaning text by either removing or replacing emails,numbers,phone numbers,special characters,emojis,etc

TextCleaner(text=None)

TextExtractor

The neattext TextExtractor API is a class useful for extracting as a list emails,numbers,phone numbers,special characters,emojis,etc . In other word the same things that you would want cleaned or removed.In that case when you use the TextExtractor class you inherit all the methods of the TextCleaning class.

TextExtractor(text=None)

TextMetrics

The neattext TextMetrics API is a class useful for cleaning text by either removing or replacing emails,numbers,phone numbers,special characters,emojis,etc

TextMetrics(text=None)

TextPipeline

The neattext TextPipeline API is a class useful for combining or chaining several text cleaning functions as one in a format of a pipeline that implement the fit method on a given text.

TextPipeline(steps=[fxns])

TextCleaner remove methods

remove_emails

Clean text by using custom regex to remove emails

remove_numbers

Clean text by using custom regex to remove numbers and digits

remove_phone_numbers

Clean text by using custom regex to remove phone numbers

remove_urls

Clean text by using custom regex to remove urls and websites

remove_special_characters

Clean text by using custom regex to remove special characters

remove_emojis

Clean text by using custom regex to remove emojis and unicode representing emojis

remove_stopwords

Clean text by using custom regex to remove english stopwords

remove_dates

Clean text by using custom regex to remove dates

remove_terms_in_bracket

Clean text by using custom regex to remove terms in the specified bracket either [] or {}

remove_accents

Clean text by removing diacritics and accents from a given text

TextCleaner replace methods

replace_emails

Processes text by using custom regex to replace emails

replace_numbers

Processes text by using custom regex to replace numbers and digits

replace_phone_numbers

Processes text by using custom regex to replace phone numbers

replace_urls

Processes text by using custom regex to replace urls and websites

replace_special_characters

Processes text by using custom regex to replace special characters

replace_emojis

Processes text by using custom regex to replace emojis and unicode representing emojis

replace_stopwords

Processes text by using custom regex to replace english stopwords

TextExtractor Methods

extract_emails

Works on text by using custom regex to extract emails

extract_numbers

Works on text by using custom regex to extract numbers and digits

extract_phone_numbers

Works on text by using custom regex to extract phone numbers

extract_urls

Works on text by using custom regex to extract urls and websites

extract_special_characters

Works on text by using custom regex to extract special characters

extract_emojis

Works on text by using custom regex to extract emojis and unicode representing emojis

extract_stopwords

Works on text by using custom regex to extract english stopwords

extract_terms_in_bracket

Works on text by using custom regex to extract terms inside bracket ([] or {})

TextFrame Basics

read_text

Reads a text file and Creates A TextFrame for cleaning text

describe

Gives a basic description of the text which includes length,tokens,vowel and consonant count,etc

head

Returns the first N characters in a text

tail

Returns the last N characters in a text

normalize

Normalizes a text using two levels(shallow and deep) that removes all necessary noise to generate a clean text

word_tokens

Generate a word token of the text supplied using white-space or words

sent_tokens

Generate a sentence token of the text supplied

ngrams

Generate an N-gram of the text supplied

bow

Generate a simple bag of words of text

unique

Returns a list of unique tokens/values in a text

nunique

Returns the count/number of unique tokens in a text

memory_usage

Returns the amount of memory/bytes used by a given text

TextPipeline

TextPipeline.fit

Fit a group of neattext functions on a given text

named_steps

Returns all the functions/steps in a given TextPipeline

TextPipeline.transform

Transform a given text using a group of neattext functions to a cleaner text