Analyse Data in Excel with Power Query and a Pivot Table
What is Power Query?
A self-service tool named Power Query can Extract, Transform, and Load with perfection. Power Query tool in an ETL that works as an add-in for Excel. Using this tool will make you able to use different sources for the EXTRACTION of data, manipulation of the data into a type that fulfills your needs, and then LOADS into Excel.
Cleaning the data from the “waste” such as repetitions, extra columns or rows, extra spaces, or unprintable characters, etc. is included.
Make the right sequence of the data such as correct case fill in the blanks, number as text, disassemble the text, add the right table header, divide the data into components, etc.
You can make changes in the tables by putting them in your favorite view, for instance, filter, sort, transpose, change in the order of columns, add totals, transpose, expand cross-tables in flat ones, and rollback.
What is Power Pivot?
An in-memory data modeling element that offers highly constricted data storage along with exceptionally fast calculation is known as Power Pivot. You can use it as a part of Excel besides making the data model in an Excel workbook.
Using the Power Pivot data model, you can make data visualization in the Excel workbooks conveniently. For Microsoft Excel, Power Pivot is used as an add-in that expects to do a few other tasks. In the case of importing and processing of Power Query, the Power Pivot is used specifically for the intricate analysis of vast amounts of data.
Analyse Data in Excel
Now I will start with how to analyse data and Excel spreadsheets using Power Query and Pivot Tables. It will use an example of a data set to examine the popularity and length of songs by their release year. The analysis will use Spotify data, which you can access and download from the Kaggle website.
Step 1: Use Power Query Editor to Inspect Data
Before you can analyse your data in Excel, make sure to inspect it first. This will allow you to understand the structure of the data set and the usefulness of each record. You can then select and limit the data that you need to load into your spreadsheet.
To inspect and transform your csv or Excel file, load the data into Power Query first. To import a csv file:
Create a new Excel workbook and click on the Data tab.
Click the Get Data icon, then select From File, and finally From Text/CSV.
Choose the file you want to import. You will see a window with a preview of the data.
Click on the Transform Data button to load the data into the Power Query editor.
Step 1: Loading a csv file into power Query
We can divide our data set into two parts: categorical and quantitative. In our example, the categorical data will consist of id, artist, or track name. Similarly, the quantitative data will consist of numerical values such as tempo, volume, or duration.
Step 2: Handling Large Excel Files in Power Query
Analyzing big data sets in Excel can be overwhelming, so we want to limit the size of the imported data. Our goal is to examine tracks released in or after the year 1960 and to analyse their popularity and duration. Consequently, we can cut our spreadsheet’s size by only keeping those records, which we are interested in.
First, let’s limit the number of columns. After loading the data into the Power Query editor, go to the home tab and click on Choose Columns. We want to analyse metrics such as duration, popularity, and tempo, so tick those columns. Also, keep the data that will allow you to identify and describe tracks, so select id, name, year, and artists.
Step 2: Choosing Columns in Power Query Editor
Since our analysis will focus only on data released after 1960, I will exclude any records released before that year. To filter the row selections, click on the filter arrow next to the header name. Then, select Number Filters and click on Greater Than or Equal To…. Finally, type the value 1960 in the Filter Rows and click OK.
Step 2.1: Filtering Data in Power Query Editor
To load the data into Excel, click on the Close & Load icon at the upper left corner of the Power Query editor. Your spreadsheet will now populate the data with a table containing the limited number of rows and columns. Head to the Data tab to explore how you can further filter, group and sort your Excel’s data table.
Step 3: Transforming and Converting Data with Power Query
Instead of using Excel formulas to calculate new values, you can add a custom column in Power Query. Our goal is to use the existing data and convert it into new records. Calculating new data in Power Query will make the analysis faster and our spreadsheet more robust.
To add a new column in your existing Power Query, go to Excel’s Query tab and click on the Edit button. This will take you back to the Power Query editor. Then, click on the Custom Column, which you will find in the Add Column tab.
Our goal is to convert the ‘Duration_ms’ column from milliseconds to minutes. Go to the Available Columns area, where you will see a list of all available column names. Then, double-click on the ‘duration_ms’ field to add it to the Custom column formula box and type in ‘/ 60000’. Lastly, type ‘Duration_mins’ in the new column name box and click OK.
Step 3: Creating a Custom Column in Power Query Editor
Before moving on, we want to create one more custom column with unique descriptive data for each track:
Click on the Add Column icon again and double click on the ‘artists’ field.
Type in ‘&” “&’ and double click on the ‘name’ field. This will combine text from the two columns and separate them by a space. (In the Custom Column Formula box, enter an ampersand (&) after the first column that you inserted. The ampersand (&) operator is used to combine values in Power Query in the same way as in Excel.)
Type ‘artists+track’ in the new column name box to label the new record and click OK.
Step 3.1: Creating a Custom Column in Power Query Editor with joining Two columns
Go back to the home tab and click on the Close & Load icon. You should see an Excel table refreshed with the new data.
Step 4: Set Up a Pivot Table in Excel
Pivot tables allow you to quickly analyse your spreadsheets, aggregate values, and group them into categories.
To create a new Pivot Table in Excel:
Go to your spreadsheet and select all data in the table.
Click on the Pivot Table icon in the Insert tab of the Excel menu.
Select the New Worksheet option in the Create PivotTable window and then click OK.
Step 3.2: Inserting Pivot Tables
You will now see a spreadsheet with an empty pivot table to the left and PivotTable Fields to the right.
Using our data example, we want to examine the track names by their release date. To display year values, drag the year from the PivotTable fields to the Rows area. You will find the Rows area at the lower-right corner, below the list of the available fields. Then, drag the ‘artists+name’ and place it below the ‘year’ field.
To change the Pivot Table’s layout, click on the Design tab at the top of the Excel workbook and click on the Report Layout. Then, select the Show in Tabular Form option to change the Pivot Table’s layout.
Your pivot table should now display rows containing year values and corresponding artists’ and tracks’ names to their right. Adjust the width of the columns to fit the Pivot Table onto the screen.
Step 4: Setting up a Pivot Table Layout
To preserve the column width setup, right-click on the PivotTable, select Pivot Table Options… and then untick the Autofit column widths on the update checkbox.
Step 4.1: Setup Column Width
Step 5: Analysing Excel Data with Pivot Tables
To complete the data analysis, go to the PivotTable Fields area and drag the data into the Values area below. Our task is to examine the popularity of each track, so we will use the Popularity field as the measure. Hence, drag the field into the Values area. Notice the ‘Sum of Popularity’ column added in the Pivot Table. You may need to adjust the column widths to fit the table onto the screen.
We need to display the numbers as averages, so let’s change the value settings. Go back to the Values area and click on the Sum of popularity field. Then, choose Value Field Settings… and you will see a Summarize Values By window. Finally, select Average from the list of options and click OK.
Our goal is only to analyse the most popular tracks for every release year. To limit the number, click on the filter arrow next to the ‘artists+track’ header. Then, select Value Filters and click on the Top 10… In the Top 10 Filter window, change the value from ’10’ to ‘1’.
Step 5: Using Top 10 Value Filter in Pivot Table
Finally, go to the Design tab, click on the Subtotals icon, and select Do Not Show Subtotals. Then, click on the Grand Totals icon and select Off for Rows and Columns. You will now see a list of only the most popular tracks grouped by their release year.
Step 6: Aggregating Data in Pivot Table
One advantage of using a Pivot Table to analyse Excel data is the ease of grouping values. As an example, we will group our year values into ten-year intervals. We will then examine the most popular songs by the decade in which they were released.
To aggregate the data, right-click on one of the values in the ‘Year’ column and select the Group… option. Then, in the Grouping window, type in ‘1960’ as your starting value and ‘2020’ as the end. To aggregate the year values into ten-year intervals, type in ’10’ in the by field. Lastly, click OK, and you will see the tracks data collapsed into ten-year buckets, for instance, ‘1960–69’.
Step 6: Grouping Pivot Table Numbers by Range
Let’s expand our analysis to include the five most popular songs from every decade:
Click on the filter arrow next to the ‘artist+track’ header and select Top 10… from the Value Filters menu.
Change the value from 1 to 5 in the Top Filter window and click OK.
Sort the values in descending order to show the most popular tracks on the top of each period.
Step 7: Analyse the Results and Cleanse Data
Our initial data analysis suggests that the most popular songs are tracks related to winter tunes. That doesn’t seem right. After looking at the source data description, we realised that the popularity measure was calculated in December. As a result, the overall number of downloads and the calculation of popularity were skewed towards holiday tunes. To rectify that, we want to exclude the data from our analysis.
We don’t want to remove any data from the Excel table, so instead, let’s use the Pivot Table’s functionalities. To clean and filter out data, drag the ‘name’ field into the Filters area. Then, click on the arrow next to the ‘All’ value in cell B1. You will see the list of unique values from the name column.
Tick the Select Multiple items checkbox and then click on the Search field at the top of the box. Type in ‘snow’ and then untick the (Select All Search Results) checkbox. As a result, you will now see the names of all tracks containing ‘snow’ in their titles. Untick the Add current selection to filter checkbox and then click OK. Finally, repeat the same step for other search terms such as ‘Christmas’, ‘sleigh’ until your name list looks clean.
Step 7: Filtering and Excluding Data in a Pivot Table
Step 8: Using a Pivot Chart to Visualise Data
Pivot Table not only helps to analyse Excel data, but it also allows you to visualise it. So, let’s expand our data analysis and examine how the average duration of songs evolved over the decades. To save time, we will use the Pivot Table created in the previous steps as a template.
First, remove some of the values and fields from the Pivot Table. Drag and remove the ‘artists+track’ field from the rows area. Then, remove the Average of popularity field in the Values area. Consequently, the Pivot Table should now be showing a list of decades only.
To display average duration values, drag the ‘duration_mins’ field into the Values area. The ‘duration_mins’ was the Power Query’s custom column we created earlier. Then, click on the field name and choose Value Field Settings. Finally, select Average from the list of options in the Summarize value field by window.
To insert a chart, go to the PivotChart Design tab and click the PivotChart icon. I have chosen the horizontal bar chart to visualise and quickly compare the data over time. Right-click on the chart to change the layout, style or add new elements. Lastly, untick the gridlines checkbox in the View tab to remove any background distractions.
Step 8: Analysing Excel Data with Pivot Chart
So, this is my last step to show analysing data in EXCEL using power query and pivot table.