Sankey diagrams are powerful tools for visualizing the flow and connections between different variables or elements. While there are many different types of Sankey diagrams, ranging from simple to complex, we will focus on creating a simple straightforward Sankey diagram to illustrate the distribution of profits across categories over time.

**History of Sankey: **

Matthew Henry Phineas Riall Sankey was the first person to use this type of diagram in 1898, leading to its name, the Sankey diagram. Initially, he used this diagram to illustrate the energy efficiency of a steam engine. Over time, the Sankey diagram has become a standard model in numerous fields, including science and engineering. In recent times, sankey diagrams have become a valuable tool in data analytics and widely used in various industries. Today we will see how to use a sankey diagram for our analysis in the retail sector.

**Sankey diagram: An example**

**Main components of Sankey:**

**Nodes:**represent our dimensions, order date in years and categories.**Flows:**are the connections between nodes.**Directionality:**indicates the path of the flow from one node to another.

**Purpose:** Creating a Sankey diagram to visually represent the yearly profit flow by category.

**Tools**: We will use tableau public to create this visual

**Dataset**: Download sample superstore dataset and upload it into tableau and only work with orders table as it consists of all the columns we need to create our visual.

**What we need:**

**Data densification:**We need to use a technique called data densification to create our chart. Among many ways to densify the data, we will use union which is self joining the dataset to itself.Drag the orders table on Data Source Canvas.

Drag the orders table again on top of the frist where the union tool tip pops up.

Drop it and we have the union

We now have two datasets, one for the nodes and the other for flow. Let's work on the horizontal data that exists in our view. For this, we are going to densify data. Data densification is also known as padding. We need to create a calculated field to densify the data.

**Calculated field, padding: **

Go to sheet 1. Create a calculated field, padding.

We created a value of 1 for our orderdate column and value of 49 for our category column, now we will fill in data in between these two columns.

We now have two numbers 1 and 49, we need to get more data points in between these two values. So we need to create a bin to create these extra marks. We will take bin size of 1 for our visual

Now instead of just having two values, we now have values from 1 through 49.

To check it, bring this field to the columns, click on the small triangle next to this field in the row, and make sure you check the show missing value option. Then you will see all the points form 1 through 49.

We successfully densified our data

**Calculated field, t:** this evenly spaces all of our marks across the view.

To see this in action, bring t to columns, padded bin to details, and make t calculate compute using our bin. Change from automatic to circle in the marks pane. You will see the marks for every single bin. See the picture below. This is the essence of data densification and data padding

Now we need to specify where these horizontal data points start and where they end. We will create some calculated field,two rank calculations, one for each dimension, sigmoid and curve calculations.

**Calculated field, Rank1:** gives you percent of total across the entire data

**Calculated field, Rank2:** we are going to duplicate that and rename it to rank2

**Calculated field, Sigmoid function:** is a mathematical function with an S-shaped curve. This will give us the curve that we need in our visual.

**Calculated field,Curve:** we have rank1 which is our starting point on the left side, and then we have the difference of rank1 and 2 which gives the flip(if you don’t do this we have plain horizontal lines going from one end to the other) and then multiplying with sigmoid gives us the nice curves.

We are done with the calculation part of creating our visual, lets see them in action.

we already have our t in column, padding(bin) in details and made t calculate compute using bin. Previously I did this step to see padding(bin) in action.

Drag the curve into rows, we have 2 points at this stage.

Add Order Date into details as well as category.

Choose rank1, specific dimension, and select all three dimensions, year of order date, category and padding(bin). You can move them top to bottom by clicking on each one and moving up or down.Please follow this order as it is important for our flow from one end to the other, don’t close this window yet.

Choose rank2, specific dimension, and select all three dimensions, but change the order this time, first category, year of order date and then padding(bin), don’t close this window yet.

Choose t, and only check padding(bin)

Close the window, you will have this visual

Cosmetic work: we are now going to format our visual

Get rid of the grid lines, zero lines

To get rid of the extra dots at the end of each curve, we will edit our horizontal axis, change the start to -5 and end to 5

Similarly we will edit out vertical axis,

Hide the headers by right clicking on the axis and uncheck the show header option.

In marks pane, change circles to line

Now we have to specify the path, so, let's do that by bring the padding(bin) to path

Next we need to size our lines by profit, but we don’t have this value for all the data points we created through densification technique, so we will create one last calculated field, profit Size

Bring the profit size to size, notice the change in the sizes of each curve. You can adjust the size of these curves as per your liking.

You can add category to color or order date to color depending on your analysis. Add category to color.

If you want to have your dimensions (category wise profits and profits by order date in years) graphs along with your sankey, create them.

Here is for order date:

Here is for category:

You need to separately create these two bar charts and arrange it in the dashboard for a completed look. Sometimes you need to reverse the order of your categories in your dimensions to match the sankey flow. To achieve this match, add category into detail and choose descending. Also our sankey starts at year 2021, so I reversed the order for the orderdate too.

For example, to reverse the order for category:

To get the completed look, get to the dashboard and arrange the three charts next to each other, edit/format them accordingly.

**Final completed look of our sankey diagram:**

Order date(years) Yearly profit flow by category Category

**Conclusion: **

As you can see, sankey diagram is very informative and allows you to see the category wise profit flow over time. If you take the first three nodes from top, they represent 2021 profit flow and can easily identify the max profit is from technology and least is from furniture. So sankey is very useful visualization tool to show the flow from one dimension to another dimension. Congratulations on creating your first sankey diagram. Hope you will use this visualization tool in your future analysis.

## コメント