Tips to excel in EXCEL as a Data Analyst
The ability to analyze data quickly and accurately has become of paramount importance. Excel with its extraordinarily broad range of features and capabilities is one of the most widely used programs for doing this analysis and visualization. As a Data Analyst we can consider Excel as the stepping stone for the Analyst Journey. Excel is so popular because it is packed with features and functions that can be used to clean, aggregate, pivot, and graph data.
Image:Photo by Rubaitul Azad on Unsplash
Many new features like Power Pivot, get and transform, and DAX has added to the latest version of Excel. In this blog, the topics I am covering are based on Excel's old versions but it is available in the latest version too.. it will be easily available for everyone and can try it with me.
4.Pivot Table And Charts
I am using 19 records from the sample-superstore dataset for my explanations.
1. Conditional Formatting
It is very importantto learn how to use conditional formatting features in Excel.It will help us to highlight cells that contain particular text,above average , and we can define the rule for highlighting the cell.
So let’s see a few uses of conditional formatting through Examples
1.To find the product name that contains phone
We can already see that cells got highlighted that are having text 'phone'
To clear the highlight from the entire sheet or selected cell
2.Top 10 product that make a high profit
3.To find details from the region 'west'
Conditional formatting allows you to highlight or hide cells based on a rule you specify. Apply the rules to one cell or multiple cells in the same worksheet. It is useful for highlighting duplicate data.
2. Remove Duplicates
Often the chance of having duplicates in the dataset is more,So it will be good to remove them . For that first use Conditional formatting to highlight the duplicate cells .Then go the option Data>>Remove Duplicates
We have the option to select column wise ,if we want to remove duplicate from particular columns.
MATCH can be used when you need the position of a value in a range instead of the value itself.
This is the syntax for MATCH:
=MATCH(lookup_value, lookup_array, [match_type])
lookup_value:value to be searched
lookup_array:in which range we have to search
When writing the function, it is important to know which match type to use. Although it is optional and defaults to 1, the available options are -1, 0, or 1.
-1: Finds the smallest value that is greater than or equal to lookup_value.
0: Finds the first value that is exactly equal to lookup_value.
1: Finds the largest value that is less than or equal to lookup_value.
Now let’s work on it
My task is to check if a particular order id is there in our dataset
As you can see even though same order id was present for 7 times.Match function returns the relative position of the orderid it appears for first time in the range we selected.
How many values are null, you have a better understanding of how to approach them. For example, if a lot of values are null you should drop the column. If few values are null you should impute a value to fill the null. COUNTBLANK counts the number of empty cells in a range.
The syntax :=COUNTBLANK(range)
Now let’s check our function with an example for explanation purpose I made 3 blanks in the column segment
The DAYS360 function returns the number of days between two dates based on a 360-day year (twelve 30-day months), which is used in some accounting calculations. Use this function to help compute payments if your accounting system is based on twelve 30-day months.
The DAYS360 function syntax has the following arguments:
Start_date, end_date : The two dates between which you want to know the number of days. If start_date occurs after end_date, the DAYS360 function returns a negative number. Dates should be entered by using the DATE function, or derived from the results of other formulas or functions.
Method : Optional, A logical value that specifies whether to use the U.S. or European method in the calculation.
Let's find out how many days it takes to ship after ordering. So we need to count the days between the shipping date,order date
The RANK function orders a number by its size relative to other values in a list and returns the desired rank. That means the rank of the number would be its position if the list becomes sorted by ascending or descending order
The RANK function syntax has the following arguments:
Number : The number whose rank you want to find.
Ref : An array of, or a reference to, a list of numbers. Non-numeric values in ref
Order: Optional. A number specifying how to rank number.
So carefully check the next picture, to verify the rank I got through the function is correct or Not. I have done one thing, sort the sales column in descending order and counted it ..so the answer is correct ..6.
But there is big disadvantage for the Rank function RANK gives duplicate numbers the same rank, but cumulatively counts. That means if two values are rank four, the next rank will be six, not five (1,2,3,4,4,6).
Xlookup is a latest addition to the function,it killed the functions VLOOKUP,HLOOKUP,INDEX,MATCH.
The XLOOKUP syntax is as follows:
=XLOOKUP(lookup_value, lookup_array, return_array, [if_not_found],[match_mode], [search_mode])
Like VLOOKUP or INDEX/MATCH, there are 3 mandatory arguments. The last 3, which are in square brackets, are optional arguments:
lookup_value: this is the value you are searching for
lookup_array: this is the range where you are searching for it
return_array: this is the range with the result you want
[if_not_found]: instead of # N/A, you can specify what should be returned in case no match is found.
[match_mode]: like with VLOOKUP or INDEX/MATCH, use 0 for an exact match (# N/A will be returned if no match is found). But the exact match is the default value, so basically most of the time you won't need to use this argument! You can also use -1 to return the next smaller value if no match is found, or 1 to return the next larger value if no match is found. It is also possible to use 2, which is a wildcard match with special meaning for characters "*", "?" and "~".
· [search_mode]: Use 1 to search starting with the first item, and -1 to search starting from the last item.
It can search vertically, Horizontally, so it covers the properties of HLOOKUP,VLOOKUP.
value: The argument that is checked for an error.
value_if_eror: The value to return if the formula evaluates to an error. The following error types are evaluated: #N/A, #VALUE!, #REF!, #DIV/0!, #NUM!, #NAME?, or #NULL
4.Pivot Table and Pivot Chart
This is a very wide topic and there is so much to learn .There are some steps ,we have to follow before we make a pivot table and chart
Study the dataset
Check for blank header
Check for blank cell
Check for merged cells
Pivot Table helps use to a tool to explore and summarize large amounts of data, and analyze related totals, even if we don’t know much about using functions in Excel.We will able to
Summarize data by categories and subcategories.
Filter, group, sort and conditionally format different subsets of data so that you can focus on the most relevant information.
Rotate rows to columns or columns to rows (which is called "pivoting") to view different summaries of the source data.
Subtotal and aggregate numeric data in the spreadsheet.
Expand or collapse the levels of data and drill down to see the details behind any total.
Now let’s create a pivot table
Go to Insert>>Pivot Table
2.Select the Range
3. Just drag and drop the required field in the areas we need .The area where you work with the fields of your summary report is called PivotTable Field List. It is located in the right-hand part of the worksheet and divided into the header and body sections:
The Field Section contains the names of the fields that you can add to your table. The filed names correspond to the column names of your source table.
The Layout Section contains the Report Filter area, Column Labels, Row Labels area, and the Values area. Here you can arrange and re-arrange the fields of your table.
By default, Excel adds the fields to the Layout section in the following way:
Non-numeric fields are added to the Row Labels area;
Numeric fields are added to the Values area;
Date and time re added to the Column Labels area.
So if I want to see Profit ,state and ship wise, I have placed the State in Row and Ship mode in column.Then I added the Category as Filter .
We have the option to change the Summarize value to count,Average ,Max,Min,Product
PivotCharts complement PivotTables by adding visualizations to the summary data in a PivotTable, and allow you to easily see comparisons, patterns, and trends. We can choose any chart for showing the Pattern.
Excel is a great tool to do import, explore, clean, analyze, and visualize your data. Compare to other Visualization Tool, Excel is less price . It is a very useful tool for Business Analyst,Data Analyst people to get rough details of data.Excel allows you to examine and interpret data in a variety of ways. The information could come from several different places. A variety of formats and conversions are available for the data. Conditional Formatting, Ranges, Tables, Text functions, Date functions, Time functions, financial functions, Subtotals, Quick Analysis, Formula and other Excel commands, functions, all be used to analyze it.