OCR-Optical Character Recognition
We often have a requirement to extrax=ct text from pictures, we have a library easyocr to do this job easily for you.
This library has support for various languages from English, Hindi, Frech, Chinese, Kannada, Malayalam, and so on.
It uses 3 main components Resnet, transfer learning model, sequence labeling of (LSTM), and decoding (CTC).
For more details, you can go through its documentation.
pip install easyocr
Language Translation
Install google translator for translation between languages
pip install google_trans_new
Text to Speech
Install library google text to speech
pip install gTTs
Import following libraries
from google_trans_new import google_translator
from gtts import gTTS
from IPython.display import Audio
import matplotlib.pyplot as plt
import cv2
from pylab import rcParams
from IPython.display import Image
import easyocr
We are currently using OCR for English and Hindi, hence we mentioned these two languages, for more you can add more.
reader=easyocr.Reader(['hi','en'])
I have just selected a random picture from google and used it as my input image, it has English and Hindi text both to check.
rcParams['figure.figsize'] = 8, 16
file_name = "../input/hinditoenglish/HtE.png"
Image(file_name)
output = reader.readtext(file_name)
output
This gives:
[([[89, 0], [399, 0], [399, 59], [89, 59]],
'Hidi to Egliah',
0.05413757637143135),
([[14, 44], [477, 44], [477, 135], [14, 135]],
'Story Traalation',
0.12739677727222443),
([[229, 121], [336, 121], [336, 163], [229, 163]],
'मेले से एक',
0.11104584485292435),
([[128, 128], [198, 128], [198, 154], [128, 154]],
'किसान',
0.47020599246025085),
([[328, 130], [414, 130], [414, 160], [328, 160]],
'सुंदर गाय',
0.1349133998155594),
([[90, 132], [130, 132], [130, 156], [90, 156]], 'एक', 0.928795337677002),
([[196, 132], [236, 132], [236, 156], [196, 156]], 'एक', 0.9367656111717224),
([[90, 154], [422, 154], [422, 184], [90, 184]],
'खरीदा और वह अपने गांव को लौट रहा',
0.010415691882371902),
([[89, 177], [411, 177], [411, 218], [89, 218]],
'था रास्ते में उसे एक डाकू ने देख लिया',
0.0011817723279818892),
([[90, 208], [396, 208], [396, 238], [90, 238]],
'वह गाय को लेना चाहता था इसलिए',
0.009780121967196465),
([[90, 234], [368, 234], [368, 264], [90, 264]],
'व किसान के पास गया और कहा',
0.018567530438303947)]
Select a text and try to make a boundary around it with the help of coordinates, to see it's selecting the correct text.
cordinates = output[4][0]
x_min, y_min = [int(min(idx)) for idx in zip(*cordinates)]
x_max, y_max = [int(max(idx)) for idx in zip(*cordinates)]
image = cv2.imread(file_name)
cv2.rectangle(image,(x_min,y_min),(x_max,y_max),(255,0,0),2)
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
Trying to make boundaries on each text.
bounds = reader.readtext(file_name, add_margin=0.55, width_ths=0.7, link_threshold=0.8, decoder='beamsearch',blocklist='=-')
bounds
It gives the output as:
[([[68, 0], [418, 0], [418, 78], [68, 78]],
' Hindi to Engliah',
0.08436790853738785),
([[0, 9], [477, 9], [477, 170], [0, 170]],
'Stry Traulalion',
0.14103898406028748),
([[76, 112], [428, 112], [428, 174], [76, 174]],
'एक किसान एक मेले से एक सुंदर गाय',
0.19112536311149597),
([[78, 142], [434, 142], [434, 196], [78, 196]],
'खरीदा औ़र वह अपने गांव को लौट रहा',
0.07050563395023346),
([[75, 165], [425, 165], [425, 231], [75, 231]],
'था रास्ते में उसे एक डाकूने देखलिया',
0.0004027755931019783),
([[78, 196], [408, 196], [408, 250], [78, 250]],
'वहू गाय को लेना चाहर्ता था इसलिए',
0.031241701915860176),
([[77, 221], [381, 221], [381, 264], [77, 264]],
'वकिसान के पास गया और कहा',
0.02784109301865101)]
import PIL
from PIL import ImageDraw
im=PIL.Image.open('../input/hinditoenglish/HtE.png')
def draw_boxes(image, bounds, color='yellow', width=2):
draw = ImageDraw.Draw(image)
for bound in bounds:
p0, p1, p2, p3 = bound[0]
draw.line([*p0, *p1, *p2, *p3, *p0], fill=color, width=width)
return image
draw_boxes(im, bounds)
Extracting only text from an image without coordinates.
text_list = reader.readtext(file_name, add_margin=0.55, width_ths=0.7, link_threshold=0.8, decoder='beamsearch',blocklist='=-', detail=0)
text_list
Out of this code is as below:
[' Hindi to Engliah',
'Stry Traulalion',
'एक किसान एक मेले से एक सुंदर गाय',
'खरीदा औ़र वह अपने गांव को लौट रहा',
'था रास्ते में उसे एक डाकूने देखलिया',
'वहू गाय को लेना चाहर्ता था इसलिए',
'वकिसान के पास गया और कहा']
Separating English and Hindi text
text_hi=text_list[2] + " "+text_list[3] + " "+text_list[4] + " "+text_list[5] + " "+text_list[6]
text_en=text_list[0] + " "+text_list[1]
Text is converted to speech in Hindi ascent if we select the language as 'hi'
ta_tts=gTTS(text_hi, lang='hi')
ta_tts.save('trans.mp3')
Audio('trans.mp3' , autoplay=True)
Text is converted to speech in English ascent if we select the language as 'en'
ta_tts=gTTS(text_en, lang='en')
ta_tts.save('trans.mp3')
Audio('trans.mp3' , autoplay=True)
You can try around playing this with different languages and have fun.
If we want to translate Hindi into English it is done by google translator as below:
translator = google_translator()
text_en=translator.translate(text_hi, lang_tgt='en')
print(text_en)
This now gives:
A farmer bought a beautiful cow from a fair and he was returning to his village, on the way he wanted to take a cow to see a robber, so went to the farmer and said
You can do the same for translating into different languages.
Speech Recognition- Speech to Text
Converting speech to text needs a library called PyAudio.
PyAudio provides binding to python for PortAudio, the cross-platform audio Input/Output library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms.
PyAudio is required only if we want to use microphone input, else it will return an error.
pip install PyAudio # or conda install PyAudio
Install library for performing speech recognition, it is a full-featured and easy-to-use Python speech recognition library.
pip install SpeechRecognition
Import following libraries for the speech recognizer.
import speech_recognition as sr
import pyaudio
Run this code speak something, and see it converting to text.
# NOTE: this requires PyAudio because it uses the Microphone class
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source: # use the default microphone as the audio source
audio = r.listen(source) # listen for the first phrase and extract it into audio data
try:
print("You said " + r.recognize_google(audio))
# recognize speech using Google Speech Recognition
except LookupError: # speech is unintelligible
print("Could not understand audio")
This functionality can be used in many things such as to pull search in google or youtube.
These are heavily used and main functions in NLP these days and can be done in a very easy manner with the help of libraries, however, anyone is welcome to contribute to making libraries more perfect for the help of the community.
These can be used in chatbots for supporting different languages, adding speech functionality, and even retrieving text from images for further processing or have information from.
Comments