Human beings are the most advanced species on earth without any doubt and as human beings, our success lies in the ability to communicate and share information. This is where the concept of developing a language comes in and human languages are one of the most complex and diverse as there is a total of 6500 languages in the entire world.
According to industry estimates, only 21% of data is available in a structured form. Data is continuously generated as we speak, tweet, and send messages on WhatsApp, Facebook, etc. and the majority of these data exist in the textual form which is highly unstructured in nature. To get significant insights into this data, we use the techniques like text analysis and Natural Language Processing.
What is Text Mining and NLP?
Text Mining/ Text Analytics is the process of deriving meaningful information from the natural language text. It usually involves the structuring of input text, deriving patterns within the structured data, and interpreting the output.
Natural Language Processing(NLP) on the other hand is a part of artificial intelligence and computer science which deals with human languages.
As the name says, it is used to process human languages and various applications include voice recognition, text to text, text to speech, speech to text, and speech to speech.
These 2 go hand in hand as text analysis is achieved using NLP which processes the natural languages and used for data analysis.
Real-time applications of NLP
Some of the application of NLP are:
Sentimental Analysis (Twitter &Facebook Sentimental Analysis are very popular)
Chatbots (eg. Customer care chats)
Speech recognition (eg. Google Assistant, Siri, Alexa, etc.)
Machine Translation (eg. Google Translate which translates the text of one language to another)
Spell checking (eg, Grammarly)
Information extraction (Information from any doc or Website)
Advertisement matching (recommendation of ads based on our search history) - this is one of the coolest applications of NLP. All the ads that we see in our browsers based on our browsing history are because of NLP.
The components of NLP are:
Natural Language understanding - refers to mapping of the given input into natural language representations and analyzing the aspects of the language.
Natural Language generation - refers to the process of producing meaningful sentences and phrases in the form of natural language from the internal representation.
Natural Language Processing steps
There are various steps involved in NLP. They are:
Tokenization - process that splits the input sequence in to words, phrases, sentences, etc. called tokens
Stemming - process of removing and replacing the suffixes to get to the root form of the word called stem
Lemmatization - this process uses the vocabulary or morphological analysis and returns the dictionary form of a word called lemma
POS tags - POS stands for "Parts of speech" which generally refers to the grammatical type of the word. It indicates how a word functions in meaning as well as grammatically within a sentence
Named entity recognition - process of detecting the named entities such as person names, company names,quantities or location etc.
Chunking - process of putting together different pieces or tokens of information and grouping them together into the bigger pieces called chunks
Python has come up with a library called NLTK which stands for "Natural Processing Toolkit " and this library is heavily used for natural language processing and text analysis.
To conclude, I just wanted to try writing a very simple program to auto correct a text using built-in functions for NLP in Python.
Here, I am using TextBlob module and passing the the text that needs to be corrected to the TextBlob inbuilt function and the output is stored in a variable which is then corrected using correct() function as shown below:
from textblob import TextBlob text = TextBlob("this is a vry beutful plac") print(text.correct())
The output of the correct() function on the original text:
Output: this is a very beautiful place
This is a very simple code snippet, which autocorrects the text.
Hope this blog has all the basic information on NLP that will be helpful for all who wants to start learning NLP.