EDA concepts (Data understanding & Visualization)

Updated: Dec 22, 2020

Exploratory Data Analysis or EDA is very crucial for the success of all data science projects. So, let’s try to understand what EDA is all about. It is an approach to analyze and understand the various aspects of the data. Through EDA, we must understand the relationship between the features and we must be able to make out conclusions or gather insights about the data.

So, what is the purpose of doing EDA on any dataset?


The purpose of performing EDA on any dataset is to make sure that the data is clean and there are no redundancies, missing values or null values in the dataset. We need to identify the significant features in the dataset and remove the unnecessary noise in the dataset that could hamper the accuracy of our conclusions when we work on building the model. In order to move on to more complex processes in the data processing lifecycle, we need to have proper interpretation of the dataset.

Following are the steps involved in the whole process of Exploratory Data Analysis:


Step 1: Understand the data

Step 2: Clean the data from the irregularities in the data

Step 3: Analyze the relationship between the features


Below image shows where EDA fits in the process of any data science project:


Python code implementation to understand EDA:

Let’s go through this example to perform EDA on the Student dataset. You can find this dataset at following kaggle link.


Please read the inline-comments of each code cell to understand the implementation.

Note: In another blog, I will include example to handle null values and explain how to make the data clean.

Note: If you are interested to learn more about Seaborn Heatmap, please check out this link.


Note: If you are interested to learn more about Seaborn Pairplot, please check out this link


Note: If you are interested to learn more about Seaborn Relplot, please check out this link


Note: If you are interested to learn more about Seaborn Distplot, please check out this link





Note: If you are interested to learn more about Seaborn Catplot, please check out this link



Hope you enjoyed learning Data understanding and Data Visualization concepts of EDA.

If you wish to try this example and execute it yourself, please use the following link to open the kaggle notebook.


Thanks for reading!

80 views0 comments

Recent Posts

See All

Text Summarization through use of Spacy library

Text summarization in NLP means telling a long story in short with a limited number of words and convey an important message in brief. There can be many strategies to make the large message short and

 

© Numpy Ninja.