top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Correlation Matters! - Understanding Correlation and Calculating Pearson Correlation Coefficient in Power BI


For anyone working with data, it is crucial to have a strong understanding of statistical concepts. One of such concepts is Correlation – in this guide we will get a grasp of what is Correlation, Correlation Coefficient, and its significance. We will also learn how to calculate the correlation coefficient using DAX in power BI.


What is Correlation?

Correlation is the statistical measure that shows the relationship between quantities. It is used to identify the association between variables and to understand whether or not, and how strongly, they are linearly related. It is symmetrical in nature meaning correlation between A and B is same as correlation between B and A.  Correlation can be applied to a wide range of datasets, and it can be helpful in predicting how things will relate to each other.


It is important to remember the famous quote  “Correlation does not mean Causation”.  If two variables are correlated, it does not imply that change in one variable will cause a change in another variable. Correlation only assesses the relationships between variables, and there may be multiple factors that contribute to that relationship.


Types of Correlation -

·  Positive Correlation – When the two variables change in the same direction meaning when one variable increases the other also increases or when one decreases the other also decreases. The relationship line has a positive slope.

·  Negative Correlation – When the variables change in opposite directions meaning when one variable increases the other decreases. The relationship line has a negative slope.

·  No Correlation – In this scenario, there is no relevant linear relation between variables.


What is Correlation Coefficient?


Correlation Coefficient is a mathematical measure which helps us to determine the strength of the relationship between variables. The range of Correlation Coefficient is between -1 and 1. We can interpret the coefficient values as shown in below-

Now that we have an idea about what is correlation and correlation coefficient let’s take a look at  the mathematical equation used to calculate the Pearson correlation coefficient-




Where r – the correlation coefficient

∑ - Sigma symbol is used to represent a sum of variables (eg.  x1+x2+x3..)

xi– values of x variable in the dataset

yi – values of y variable in the dataset

x̄ -  mean(average) of x

ȳ - mean(average) of y


 

How to calculate correlation coefficient in Power BI?


There are two methods to calculate Correlation Coefficient in Power BI. Let us take an in-depth look at each of these methods. To understand the two methods, we will consider a sample dataset, containing columns height(kg), weight(m) and patient_ID, to calculate the correlation coefficient between height and weight.


Method 1:- Using DAX


How to Calculate Correlation Coefficient using DAX?

Now let’s dig into the implementation of the above formula using DAX. First, we will have to understand the equivalent of each symbol present in the above formula in DAX.

·       ∑  Sigma symbol – the equivalent of sigma is SUM or SUMX in DAX.

·       x̄ (mu x bar) – the equivalent of x̄ is AVERAGE of variable x

·       ȳ (mu y bar) - the equivalent of ȳ is AVERAGE of variable y

·       square root – the equivalent is SQRT in DAX


Generalized DAX code for the Pearson correlation coefficient formula-

Correlation_coefficient =

var mean_X =calculate(AVERAGE(TableName[xi]))
var mean_Y=calculate(AVERAGE(TableName[yi]))
//numerator
var numerator = SUMX('TableName',( [xi]- mean_X)*([yi]- mean_Y))
//denominator
var denominator = SQRT(SUMX('TableName',([xi]-mean_X)^2)*SUMX('TableName',([yi]- mean_Y)^2))

return
divide( numerator, denominator)

Implementing the above to create correlation coefficient measure using DAX-

Correlation_coefficient = 
var mean_X =calculate(AVERAGE(SampleDataset[Weight (KG)]))
var mean_Y=calculate(AVERAGE(SampleDataset[HEIGHT (M)]))

//numerator
var numerator  =  SUMX('SampleDataset',( [Weight (KG)]- mean_X)*([HEIGHT (M)]- mean_Y))

//denominator
var denominator=  SQRT(SUMX('SampleDataset',([Weight (KG)]- mean_X)^2)*SUMX('SampleDataset',([HEIGHT (M)]- mean_Y)^2))

return
divide( numerator, denominator)

Visualization in Power BI

 

In Power BI, correlation can be visually represented using a Scatter Plot. In correlation we always deal with paired scores so it doesn’t matter which variable goes in X-axis and which goes in Y-axis as the values of the two variables will be taken together to make the plot.

Using the above calculated measure – Correlation_coefficient we got the coefficient as 0.59 which shows that there is moderate positive correlation between Height and Weight of patients in our sample dataset.


Method 2: Using Power BI Quick Measures

1.  Import your data into Power BI Desktop 

2. Click on the Table Tools tab and then click on the Quick measure icon

3.  Scroll down and select Correlation coefficient from the list of calculation types. Then choose the category and the two measures you want to calculate the correlation coefficient for.     



Category – Put in the category we need to find the relation over. We can add unique id column as category if present in the dataset.

Measure X – The first variable for which we need to calculate the correlation coefficient.

Measure Y - Second variable for which we need to calculate the correlation coefficient.

Using the sample dataset we can add the Weight in Measure X; Height in Measure Y and Patient_ID in Category.


4. After filling in the above values click on Add to get the correlation coefficient measure – ‘Average of Weight (KG) and Average of HEIGHT (M) correlation for Patient_ID’


5. Use Scatter plot to show the relationship between Height and Weight and add the measure ‘Average of Weight (KG) and Average of HEIGHT (M) correlation for Patient_ID’ in a card to get the correlation coefficient value.


Visualization in Power BI

 


Summary

Correlation is a vital statistical tool in data analysis and data science techniques.

It allows the analyst to clearly see if there is a relationship between variables. Correlation analysis can be used for predicting the value of one variable given the value of another variable and can also be used to reduce the range of uncertainty.

 

Thank you for reading and look forward to your feedback!



205 views0 comments

Recent Posts

See All
bottom of page