top of page

NLP: OCR, Language Translation, Text to Speech, Speech to Text

OCR-Optical Character Recognition

We often have a requirement to extrax=ct text from pictures, we have a library easyocr to do this job easily for you.

This library has support for various languages from English, Hindi, Frech, Chinese, Kannada, Malayalam, and so on.

It uses 3 main components Resnet, transfer learning model, sequence labeling of (LSTM), and decoding (CTC).

For more details, you can go through its documentation.

pip install easyocr

Language Translation

Install google translator for translation between languages

pip install google_trans_new

Text to Speech

Install library google text to speech

pip install gTTs

Import following libraries

from google_trans_new import google_translator  
from gtts import gTTS
from IPython.display import Audio
import matplotlib.pyplot as plt
import cv2
from pylab import rcParams
from IPython.display import Image
import easyocr

We are currently using OCR for English and Hindi, hence we mentioned these two languages, for more you can add more.


I have just selected a random picture from google and used it as my input image, it has English and Hindi text both to check.

rcParams['figure.figsize'] = 8, 16
file_name = "../input/hinditoenglish/HtE.png"

output = reader.readtext(file_name)

This gives:

[([[89, 0], [399, 0], [399, 59], [89, 59]],
  'Hidi to Egliah',
 ([[14, 44], [477, 44], [477, 135], [14, 135]],
  'Story Traalation',
 ([[229, 121], [336, 121], [336, 163], [229, 163]],
  'मेले से एक',
 ([[128, 128], [198, 128], [198, 154], [128, 154]],
 ([[328, 130], [414, 130], [414, 160], [328, 160]],
  'सुंदर गाय',
 ([[90, 132], [130, 132], [130, 156], [90, 156]], 'एक', 0.928795337677002),
 ([[196, 132], [236, 132], [236, 156], [196, 156]], 'एक', 0.9367656111717224),
 ([[90, 154], [422, 154], [422, 184], [90, 184]],
  'खरीदा और वह अपने गांव को लौट रहा',
 ([[89, 177], [411, 177], [411, 218], [89, 218]],
  'था रास्ते में उसे एक डाकू ने देख लिया',
 ([[90, 208], [396, 208], [396, 238], [90, 238]],
  'वह गाय को लेना चाहता था इसलिए',
 ([[90, 234], [368, 234], [368, 264], [90, 264]],
  'व किसान के पास गया और कहा',

Select a text and try to make a boundary around it with the help of coordinates, to see it's selecting the correct text.

cordinates = output[4][0]
x_min, y_min = [int(min(idx)) for idx in zip(*cordinates)]
x_max, y_max = [int(max(idx)) for idx in zip(*cordinates)]
image = cv2.imread(file_name)
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

Trying to make boundaries on each text.

bounds = reader.readtext(file_name, add_margin=0.55, width_ths=0.7, link_threshold=0.8, decoder='beamsearch',blocklist='=-')

It gives the output as:

[([[68, 0], [418, 0], [418, 78], [68, 78]],
  ' Hindi to Engliah',
 ([[0, 9], [477, 9], [477, 170], [0, 170]],
  'Stry Traulalion',
 ([[76, 112], [428, 112], [428, 174], [76, 174]],
  'एक किसान एक मेले से एक सुंदर गाय',
 ([[78, 142], [434, 142], [434, 196], [78, 196]],
  'खरीदा औ़र वह अपने गांव को लौट रहा',
 ([[75, 165], [425, 165], [425, 231], [75, 231]],
  'था रास्ते में उसे एक डाकूने देखलिया',
 ([[78, 196], [408, 196], [408, 250], [78, 250]],
  'वहू गाय को लेना चाहर्ता था इसलिए',
 ([[77, 221], [381, 221], [381, 264], [77, 264]],
  'वकिसान के पास गया और कहा',

import PIL
from PIL import ImageDraw'../input/hinditoenglish/HtE.png')

def draw_boxes(image, bounds, color='yellow', width=2):
    draw = ImageDraw.Draw(image)
    for bound in bounds:
        p0, p1, p2, p3 = bound[0]
        draw.line([*p0, *p1, *p2, *p3, *p0], fill=color, width=width)
    return image

draw_boxes(im, bounds)

Extracting only text from an image without coordinates.

text_list = reader.readtext(file_name, add_margin=0.55, width_ths=0.7, link_threshold=0.8, decoder='beamsearch',blocklist='=-', detail=0)

Out of this code is as below:

[' Hindi to Engliah',
 'Stry Traulalion',
 'एक किसान एक मेले से एक सुंदर गाय',
 'खरीदा औ़र वह अपने गांव को लौट रहा',
 'था रास्ते में उसे एक डाकूने देखलिया',
 'वहू गाय को लेना चाहर्ता था इसलिए',
 'वकिसान के पास गया और कहा']

Separating English and Hindi text

text_hi=text_list[2] + " "+text_list[3] + " "+text_list[4] + " "+text_list[5] + " "+text_list[6] 

text_en=text_list[0] + " "+text_list[1] 

Text is converted to speech in Hindi ascent if we select the language as 'hi'

ta_tts=gTTS(text_hi, lang='hi')'trans.mp3')
Audio('trans.mp3' , autoplay=True)

Text is converted to speech in English ascent if we select the language as 'en'

ta_tts=gTTS(text_en, lang='en')'trans.mp3')
Audio('trans.mp3' , autoplay=True)

You can try around playing this with different languages and have fun.

If we want to translate Hindi into English it is done by google translator as below:

translator = google_translator()
text_en=translator.translate(text_hi, lang_tgt='en')

This now gives:

A farmer bought a beautiful cow from a fair and he was returning to his village, on the way he wanted to take a cow to see a robber, so went to the farmer and said 

You can do the same for translating into different languages.

Speech Recognition- Speech to Text

Converting speech to text needs a library called PyAudio.

PyAudio provides binding to python for PortAudio, the cross-platform audio Input/Output library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms.

PyAudio is required only if we want to use microphone input, else it will return an error.

pip install PyAudio   # or conda install PyAudio

Install library for performing speech recognition, it is a full-featured and easy-to-use Python speech recognition library.

pip install SpeechRecognition

Import following libraries for the speech recognizer.

import speech_recognition as sr
import pyaudio

Run this code speak something, and see it converting to text.

# NOTE: this requires PyAudio because it uses the Microphone class

import speech_recognition as sr     
r = sr.Recognizer()
with sr.Microphone() as source:          # use the default microphone as the audio source
    audio = r.listen(source)                   # listen for the first phrase and extract it into audio data

    print("You said " + r.recognize_google(audio))  
    # recognize speech using Google Speech Recognition
except LookupError:                     # speech is unintelligible
    print("Could not understand audio")

This functionality can be used in many things such as to pull search in google or youtube.

These are heavily used and main functions in NLP these days and can be done in a very easy manner with the help of libraries, however, anyone is welcome to contribute to making libraries more perfect for the help of the community.

These can be used in chatbots for supporting different languages, adding speech functionality, and even retrieving text from images for further processing or have information from.

895 views0 comments

Recent Posts

See All

Beginner Friendly Java String Interview Questions

Hello Everyone! Welcome to the second section of the Java Strings blog. Here are some interesting coding questions that has been solved with different methods and approaches. “Better late than never!”


Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page