Making a simple and fast chatbot in 10 minutes

In real-world response time for a chatbot matters a lot. Be it the travel industry, banks, or doctors, if you want to really help your customers, response time should be less, and similar to what it is while talking to a customer care representative.

Besides the time it is also important to understand the main motive of the chatbot, every industry cannot use a similar chatbot as they have different purposes and have a different set of corpus to reply from.

While transformers are good to get a suitable reply, it may take time to respond back. On the other hand where time is concerned various other methodologies can be applied and even find some rule-based systems to get an appropriate reply which is apt for the question asked.

How many times you may have contacted a travel agency for the refund of your tickets booked last year during the lock-down, I am sure getting an apt reply to it was far from reality.

Now let’s make a simple chatbot and install these packages:

Install nltk 
Install newspaper3k

Package newspaper3k has few advantages as below:

  1. · Multi-threaded article download framework

  2. · News URL can be identified

  3. · Text extraction can be done from HTML

  4. · Top image extraction from HTML

  5. · All image extraction can be done from HTML

  6. · Keyword extraction can be done from the text

  7. · Summary extraction can be done from the text

  8. · Author extraction can be done from the text

  9. · Google trending terms extraction

  10. · Works in 10+ languages (English, German, Arabic, Chinese, …)

Import libraries as below:

#import libraries
from newspaper import Article
import random
import nltk
import string
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

I have already talked about CountVectorizer in my old blogs.

Cosine similarity, or the cosine kernel, computes similarity as the normalized dot product of X and Y:

sklearn.metrics.pairwise.cosine_similarity(X, Y=None, dense_output=True)

Parameters

X{ndarray, sparse matrix} of shape (n_samples_X, n_features) Input data. Y{ndarray, sparse matrix} of shape (n_samples_Y, n_features), default=None Input data.

If None, the output will be the pairwise similarities between all samples in X. dense_outputbool, default=True Whether to return dense output even when the input is sparse. If False, the output is sparse if both input arrays are sparse.

Returns

kernel matrix: ndarray of shape (n_samples_X, n_samples_Y)

import numpy as np
import warnings
warnings.filterwarnings('ignore')

Tokenization is already explained in my blog. Here we are taking data from a healthcare website

article=Article("https://www.mayoclinic.org/diseases-conditions/chronic-kidney-disease/symptoms-causes/syc-20354521")
article.download()
article.parse()
article.nlp()
corpus=article.text
print(corpus)

#tokenization
text=corpus
sentence_list=nltk.sent_tokenize(text) #A list of sentences

#Print the list of sentences
print(sentence_list)

Once you have the corpus ready, you may have to think about questions that a user or customer may ask or say, which doesn’t have any relation to the content we have.

It can be a greeting message, gratitude message, or a message like a bye. The team needs to brainstorm on such messages and their responses.

I tried to cover a few here.

Greeting bot response

#Random response to greeting
def greeting_response(text):
 text=text.lower()

 #Bots greeting
 bot_greetings=["howdy","hi","hola","hey","hello"]

  #User Greetings
 user_greetings=["wassup","howdy","hi","hola","hey","hello"]
 for word in text.split():
 if word in user_greetings:
 return random.choice(bot_greetings)
#Random response to greeting
def gratitude_response(text):
 text=text.lower()

Gratitude Bot Response:

#Bots gratitude
 bot_gratitude=["Glad to help","You are most welcome", "Pleasure to be of help"]

 #User Gratitude
 user_gratitude=["Thankyou so much","grateful","Thankyou","thankyou","thank you"]

 for word in text.split():
 if word in user_gratitude:
 return random.choice(bot_gratitude)

Sorting list

# Default title text
def index_sort(list_var):
 length=len(list_var)
 list_index=list(range(0,length))
 x=list_var
 for i in range(length):
 for j in range(length):
 if x[list_index[i]]>x[list_index[j]]:
 #swap
 temp=list_index[i]
 list_index[i]=list_index[j]
 list_index[j]=temp

 return list_index

Chatbot response function, which uses cosine similarities from predefined texts to respond from.

#Creat Bots Response
def bot_response(user_input):
 user_input=user_input.lower()
 sentence_list.append(user_input)
 bot_response=""
 cm=CountVectorizer().fit_transform(sentence_list)
 similarity_scores=cosine_similarity(cm[-1],cm)
 similarity_scores_list=similarity_scores.flatten()
 index=index_sort(similarity_scores_list)
 index=index[1:]
 response_flag=0
 j=0
 for i in range(len(index)):
 if similarity_scores_list[index[i]]>