First, let's see what does Exploratory Data Analysis (EDA) mean?
EDA is like "interviewing" the data. The Data Analyst gets to know and learn about the interesting things that data has to say. Analysts should explore the data for potential research questions before jumping into confirming the answers with hypothesis and inferential statistics.
EDA involves the following steps:
Classifying variables as continuous, categorical, and etc.,
Summarizing variables using descriptive statistics.
Visualizing variables using charts.
Now, let's explore each step in-detail.
I. Classifying variables.
What are variables?
Variables are something that vary across observations. Each variable provides different information about our observation. Classifying the variables will provide us with some sort of distinctions that would be helpful in our analysis.
Classifying variables is somewhat arbitrary and built on rules of thumb rather than hard-and-fast-criteria.
Categorical (Qualitative) variables describe a quality or characteristic of an observation. A typical question answered by categorical variables is" Which kind/type?". They are often represented by non-numeric values.
Binary Variables - these variables can only take two levels; often stated as yes/no responses. Some examples are: * Married? (Y/N) * Sex(F/M) * Vegan diet(Y/N)
Nominal Variables - any qualitative variables with more than two levels. Some examples are: * Country of Origin * Favorite color * Favorite travel destinations
Ordinal Variables - these variables take more than two levels, and there is an intrinsic ordering between the levels. Some examples are: * Beverage size (small, medium, large) * Class (freshman, sophomore, junior, senior)
Quantitative variables describe a measurable quantity of an observation. A typical question answered by quantitative variables is "How much?" or "How many?". They are mostly represented by numbers.
Continuous Variables - these variables can take an infinite number of values between any two other values. Some examples are: * Height * Surface area
Discrete Variables - these variables can take only a fixed number of countable values between any two values. Some examples are: * Number of individuals in a household * Total strength in a classroom
II. Summarizing variables using descriptive statistics.
III. Visualizing variables using charts.
Comments