Image:Photo by krakenimages on Unsplash
In the old version of Tableau ,when we combine data tables in the Canvas it will be using Joins .But starting in Tableau version 2020.2 ,they introduced the Data model. So let's first see what is a Data model. Then we can dig more to Joins ,Relatioship,Blending.
Data Model
Once we add the tables to the canvas of the Data source Page ,Data Model is created. It has two layers. The top-level layer is the logical layer of the data source and second Layer is the Physical Layer.
Image: from Tableau.com
The default view that you see in the Data Source page canvas is the logical layer of the data source. You combine data in the logical layer using relationships .
Image:noodle connection represents the Relatioship(source:Tableau.com)
You combine data between tables at the physical layer using Joins and unions. Each logical table contains at least one physical table in this layer .when you double-click a logical table the physical layer will be opened, there we will add the Joins .
Image:Venn Diagram represents the Joins(source:Tableau.com)
So we can say Relationship lives in the logical layer and Joins lives in the physical layer.
Now let’s see more on Joins.
JOINS
Joins are defined in the physical layer on the Data source page .Once it is defined it will impact all the sheets using that Data Source .It will make the tables to merge in to single table.So sometimes we will miss the unmatched values and duplicates values will be created.So when we create a granularity for View , it will based on the merged table.
so let's see key points for Joins
1 .It is displayed using Venn Diagram.
2 .Tables are merged into a single table
3.We need to select which type of Join we need .
4.Sometimes unmatched values will be missing
Steps to do Joins in the Tableau
1. Connect to a data source file. I am using the sample- superstore dataset for explaining the details.
place the Orders table in the Canvas.
As you can see in the box,it says a logical table Orders is there in the canvas, and by double click it to see physical table.
2.Double click on the orders
Now you can see that ,we have come to the Physical layer of the Data Model .When I drag the People table from the left pane to the canvas ,then Venn diagram symbol comes between orders table and people table can be seen.
When you click on the Venn diagram,a new window will be opened where we can select the join we want.
So please see the below image
It clearly says that 2 physical tables define the logical table orders ,but using the inner join won’t be much help always as it will merge tables, and fields that are common in both tables will be show and ,mismatched data will be omitted .But this won’t be helpful in every situation
Let's take a scenario, imagine the field names are the same. But values are different ,eg:custid with values 1 to 10 but the custid in the other table has values as cust1,cust2 cust3…cust10.so how can we join these two tables with the field custid?? the solution is using ‘create join calculation’ option
Click on the Venn diagram>>
This will solve that problem.
But there are some limitations for using Joins :
Duplicate rows data
For different join types require different data sources.
To solve these problems ..Relationship is introduced. So now let's move to the next Topic.
Relationship
The key point for the Relationship is:
The original table will be retained, So their level of detail and domain will be kept.
It is represented using flexible noodles between tables
All rows and columns from tables will be available for use.
Appropriate Joins are automatically created.
Duplicate won't be there and keeps the unmatched values.
Relationships can be based on calculated fields. You can specify how fields are used in the relationships that should be compared by using operators when you define the relationship.
Blending
Now we understand about Joins and Relationship.So what is Blending ? Why do we need it ?If you want to combine data from published data sources, blends are currently your only option. Blends query each data source independently, the results are aggregated to the appropriate level, then the results are presented visually together in the view.
Key points for Blend:
It won't combine the data .separate query is sent to both Data source by the Tableau and their result are left joined.
Blends only support left joins, while relationships support full outer joins.
Blends can be customized on a per-sheet basis
The order in which fields are used determines which data source is the primary versus the secondary data source.
The Primary Data source is indicated with a blue check mark, any secondary data sources and fields from secondary data sources have an orange check mark.
6.Linking fields can be automatically determined based on shared field names or the blend relationship can be manually created.
7. Data blending behaves similarly to a left join, which may result in missing data from the secondary data source.
8. Asterisks (*) may appear. This indicates multiple dimension values in a single mark, because data blending takes aggregated results and combines them in the view
So we can see that there is a blend sign in the data pane of secondary source,it is because that field name define the relationship between primary source and secondary source .Red blend sign indicate the link is connected ,but if there is a cut andcolorr is a blue sign then it means the link is broken.So we need to make the link between the field again. Automatically field names will be linked ,but if the field names are different or we want to create link manually then we have the option to define the relationship between them,
For that go to Data>>Edit Blend relationships
Conclusion:
It is recommended to use relationships as your first approach to combining your data because it makes your Data preparation and Analysis easier and more intuitive. Use joins only when you absolutely need to. Because join will merge the table and unmatched data will be omitted. Blending is used when we want to connect with published sources.
Comments