Data Analysis is a process of defining problem statement, gathering, cleaning and analyzing data and then creating visuals with data to make decisions. Let’s discuss more about these phases.
Step 1 – Ask
To solve any problem, we have to understand it first.
Ask is a very first step of Data Analysis. In this phase Define the Problem that you are trying to solve. To Define the problem, Ask SMART questions,
S – Specific – Question should have context and it should be Open-Ended to get all relevant information.
M -Measurable – The Question should give you Measurable Answers which helps to Analyze data.
A – Action-Oriented - Question should help you to define an Action plan
R- Relevant – Question should be Relevant to the problem you are trying to solve.
T- Time-bound – Answers should come from Time-bound studies.
After understanding the Problem, do understand Stakeholder’s Expectation regarding that problem. Target the problem and avoid any distractions.
Step 2 – Prepare
After defining correct Problem, here decide what data need to be collected and organized to solve the problem.
To collect right data, consider followings-
1. Select Right Type of Data
2. Determine Time Frame
3. If Data is already available then choose Data Source otherwise decide how to collect the data.
4. Decide what data and how much data to use
Step 3: Process
One need to Process the data to clean it for any inaccuracies, possible errors. This includes,
1. Duplicate data- multiple entries of same info.
2. Outdated data – when Data is too old it becomes irrelevant and out of date info.
3. Incomplete Data – Data not present for mandatory fields.
4. Incorrect/Inaccurate Data – e.g. -Recording Fake info, data entry errors.
5. Inconsistent data - where same type of data recorded in different formats. E.g. – Date format varies as per country so there is a possibility that for a worldwide organization maintain there data as per country format.
Determining Data Cleaning Tool –
Data Cleaning tool should be finalized as per the size of your data set
o If you have small set of data go for Spreadsheets/MS Excel.
o When you have a very big data set, then go for SQL.
Here are some Data cleaning functions using EXCEL –
Removing Duplicate rows
Finding and Replacing Text
Changing CASE of data column
Removing SPACES, non-printing characters –
a. TRIM () - Removes the 7-bit ASCII space character (value 32) from text.
b. SUBSTITUTE () - to replace the higher value Unicode characters (values 127, 129, 141, 143, 144, 157, and 160) with the 7-bit ASCII characters for which the TRIM and CLEAN functions were designed.
c. CLEAN ()- Removes the first 32 nonprinting characters in the 7-bit ASCII code (values 0 through 31) from text.
d. Find and Replace – Find a pattern of string and replace as necessary
6. Fixing Numbers and Number Signs –
a. Convert the Data type of a column from Text to Number
b. Define Decimals as per the requirement
c. Currency conversion signs – Use correct currency as per the country
d. FIXED () – Round the decimal positions of a number
e. TEXT() – Convert Number to TEXT format
f. VALUE () - Converts a text string that represents a number to a number
7. Fixing Date and Time –
a. Define a consistent DATE FORMAT for the dataset – such as DD-MM-YYYY or MM/DD/YYYY or 2-digit Year etc.
b. TIME FORMAT – Define consistent time format for your dataset
c. Check the data type of column storing Date info – make sure it is in DATE format.
d. DATE () - Returns the sequential serial number that represents a particular date. If the cell format was General before the function was entered, the result is formatted as a date.
e. DATEVALUE () - Converts a date represented by text to a serial number.
f. TIME ()- Returns the decimal number for a particular time. If the cell format was General before the function was entered, the result is formatted as a date.
g. TIMEVALUE () - Returns the decimal number of the time represented by a text string. The decimal number is a value ranging from 0 (zero) to 0.99999999, representing the times from 0:00:00 (12:00:00 AM) to 23:59:59 (11:59:59 P.M.).
8. Merging Columns – Depending as per data sometimes we need to Merge Two or more columns. e.g. – Combining First Name, LAST name columns together to make a FULL Name.
Following functions help to combine data from two or more columns –
9. Splitting Data from One column to multiple columns –
e.g. – If Address info is recorded in one single column and for better analysis we need to breakdown this info as City, Country, zip code then we can use following functions –
o Using Delimiter one can spilt the data in multiple columns
o Convert Text to Columns Wizard available in EXCEL
o SEARCH and
o LEN functions to split a name column into two or more columns
10. To Analyze data, it should be in Proper order. For this one can use TRANSPOSE () function in EXCEL. It returns a vertical range of cells as a horizontal range, or vice versa
11. Data Reconciling – Excel offers following functions for data reconciliation -
a. LOOKUP ()
Data Cleaning Functions in SQL –
SQL also has many of the EXCEL functions. SQL Stands strong when you have to sort and filter data. Following are the clauses used for sorting and filtering –
o Inner Query
Once Data Cleaning is done,
· Verify the alignment problem statement with the goal of your project
· Verify that data can solve the problem
· Document the data cleaning process
Step 4: Analyze –
To analyze the data to find insights,
· Data need to be Organized, Formatted, sorted, and transformed correctly.
· For this one may have to
o perform calculations,
o combine data from different resource and
o create new tables with your results.
While Analyzing data focus on End User who is going use this Data Analysis Report.
Step 5: Share
Meaningful Visuals are best way to Share the Insights you found after Analyzing the data.
Graphs or Dashboards are best ways to visualize the data findings.
While working on enticing visuals, keep in mind that Data Visuals should be easy to understand and should lead to Data driven Decision-making
Do tell a Data story with your visuals, use Legends, Labels where they are necessary.
Do not overcrowd the dashboard with too much information keep it simple but yet super strong which will help to make a conclusion.
Step 6: Act
In this phase,
· provide your recommendation with the help of your data analysis findings.
· Do help Stakeholders with defining Action Plan.
· If it needs to be an iterative process then define new timelines
· Do take a feedback and think that how you can utilize that next time.
For effective Data Analysis we need Structured Thinking to Understand the Problem, Get Relevant data, find patterns work across the goal and come with insights to solve the problem.