One of the powerful data visualization libraries in Python is Seaborn. Seaborn is built on top of Matplotlib and is designed to work with Panda’s data structures. Seaborn offers extensive visualizations in Python to derive meaningful insights from data. Say, one needs to know the distribution of individual variables and also explore the relationship between two continuous variables, then one can use joint plots.
What is a Joint Plot:
The “joint plot” is an effective visualization in the Seaborn library that combines a bivariate plot, which could be a Scatter plot, kernel density estimation with univariate marginal distribution for each variable, enabling the user to understand the relationship between variables, and the individual distribution of variables.
Salient Features of Joint Plot:
The below image depicts, what each section of a joint plot refers to and explains its features as well,
Joint Plot Syntax and Parameters:
The basic syntax is,
Sns.jointplot(x=’specify the variable x’’, y=specify the variable y’, data=specify the dataset, kind=’the type of relationship one wants to visualize’)
x, y, and kind variables are defined within the parenthesis. The parameters within the syntax are explained below,
x, and y define what variables to be represented on x axis and y-axis respectively.
data=input data structure in the form of a data frame.
kind=defines what kind of distribution. Seaborn offers 6 different visualizations which are scatter, kde, resid, hist, hex, and reg. These are discussed in detail in further sections.
Additional parameters can be added to the above syntax to enhance the visualization, explore the data distribution concerning categorical variables, and define the visual parameters. These are all explained in the coming sections.
Basic Requisites:
Step 1: Importing the Seaborn library and importing pandas, matplotlib as Seaborn is built over them.
Step 2:
Seaborn provides pre-loaded datasets for learning purposes. In this case, the 'healthexp' dataset is used for visualization. This dataset has 274 rows and 4 columns, containing information about life expectancy and, amount spent in US dollars, across different countries from the years 1970 to 2020. There is a categorical variable as well with the name 'Country'.
Plotting Basic Distribution Joint Plot:
We can customize the style by passing a command called sns.set_style(‘darkgrid’)
Seaborn offers around five different styles for background, which are ‘darkgrid’, ‘whitegrid’, ‘dark’, ‘white’, ‘ticks’ which provide different visualization techniques, one can explore as per their convenience, here darkgrid is selected.
Now, let's look in two ways how a basic joint plot can be plotted.
One way is by sns.jointplot(x='Life_Expectancy', y='Year', data=healthexp)
Another way is sns. jointplot(x=healthexp. Life_Expectancy, y=healthexp. Year)
Both the above methods yield the same result.
If we don’t specify the kind, default is “Scatter”.
Changing the Color, Size and Shape of the Chart:
Color: As mentioned before, we can add additional parameters to enhance the visualization by changing the color and background and also customize the size as per your requirements.
Seaborn offers unique color for scatter plot in the central area and marginal histograms, out of many color palettes offered, only 2 conditions for color are discussed here.
Size and Shape:
In addition to above, one can change the size and shape of the jointplot, generally jointplot comes in a square shape; by adjusting the parameter ‘height’, we can control the overall height of the jointplot. The ‘ratio’ parameter defines the ratio of height of central plot to that marginal, say the central/middle plot is 5, then the marginal histograms are of 1 height. The ‘space’ parameter controls the spacing between the joint and marginal axis. If you don’t specify values for above, sns.jointplot() uses default values, The default values are height =6, ratio=5, space=0.2. By varying these we can adjust the jointplot as per your wish.
Joint plot Using a Categorical Variable:
Till now, we have used visualized the scatter plot distribution between two numerical variables ‘Life_Expectancy’ and ‘Year’, but in the data table, there is a categorical variable ‘Country’, let us see how this categorical variable has an impact on distribution.
So, the parameter ‘hue’ is introduced here which allows us to introduce an additional categorical variable.
Specifying a hue categorical variable in sns. jointplot() causes the marginal univariate plots to split and display the conditional distributions for each category, allowing you to visually compare the distributions side-by-side. This provides more insight into how the marginal distributions vary across different categorical groups.
The hue parameter allows you to visualize how the bivariate relationship and marginal distributions differ across levels of a categorical variable, by using different colors and separating the plots by category. This provides deeper insight into potential relationships in your data.
Kinds of Plots:
Seaborn offers 6 different visualizations which are scatter, kde, resid, hist, hex and reg. these can be discussed in the following sections:
Regression (`kind='reg'`): This plot type displays a scatter plot in the main axes, along with a linear regression line that fits to the data. The marginal axes show univariate histograms. This combined view allows you to assess both the bivariate relationship (regression line) and the individual variable distributions simultaneously.
2. Scatter Plot (kind='scatter'): This is the default plot type. It creates a scatter plot in the central area to show the relationship between the two variables, along with univariate histograms or kernel density estimates (KDEs) on the marginal axes.
3. Kernel Density Estimation (`kind='kde'`): This plot type displays a bivariate kernel density estimate in the main axes, along with univariate KDEs on the marginal axes. It helps visualize the joint and marginal distributions of the two variables.
3.1 kde Plot with Numerical Variables:
Kernel density plot is plotted using two numerical variables and, in the 'kind' parameter, kde is mentioned.
Kernel density plot, when we apply condition, fill=’True', enhances the visualization by filling the kernel rings with color.
3.2. Kerney Density Plots can be further collaborated with Scatter Plots:
In this chart, we are overlaying the Seaborn jointplot with a scatter plot and a kde plot with contour levels, which is a type of bivariate distribution visualization.
Initially, the command line of creating a jointplot with desired x and y variables in a scatter plot is passed as a variable.
g=sns.jointplot(x='Life_Expectancy', y='Year', data=healthexp, kind='scatter’, color='purple', height=4, ratio=3, space=0.1)
is used to create a plot joint and kdeplot.
g.plot_joint(sns.kdeplot, color='green', levels=5)
later this variable, here in this case, g is our variable (any name can be given) , and the kdeplot is overlayed on the main scatter plot. the color difference is given to show clarity. We can call this a “Bivariate KDE plot’.
This plot provides a combined view which is useful for understanding the overall density distribution along with individual points and also to identify the potential clusters or patterns in data.
3.3. Kde Plots in combination with Categorical Variable:
Kde plot can also be used when we want include a categorical variable by using the ‘hue’ parameter. The below graph demonstrates the same.
The syntax is :
sns.jointplot(x='Life_Expectancy', y='Year', data=healthexp, hue= 'Country', kind='kde', fill='True', height=12, ratio=5, space=0.2)
.
3.4. Kde plots in combination with both Categorical variable Scatter plot.
Just like in the above case, explained in the 3.2 section, here the kdeplot included with a categorical variable is further combined with the scatter plot by using the same methodology of passing the scatterplot with the categorical variable as a variable with any name which is further used to create a kdeplot.
The syntax is
p=sns. jointplot(x='Life_Expectancy', y='Year', data=healthexp, kind='scatter’, hue= 'Country', color='purple', height=5,ratio=4, space=0.1)
p. plot_joint(sns.kdeplot, levels=5)
4. Histogram (`kind='hist'`): This plot type shows a bivariate histogram in the main axes, along with univariate histograms on the marginal axes. It bins the data into rectangular bins and displays the counts. darker color indicates a greater number of data points in that bin.
The bivariate histogram provides a way to visualize the joint distribution and identify patterns, and correlations between the two variables based on frequency or count distribution of data points using rectangular bars.
5. Hexbin (`kind='hex'`): Similar to the histogram plot, but this uses hexagonal bins instead of rectangular bins to display the bivariate distribution in the main axes. The marginal axes show univariate histograms. Represents density, the color intensity within each hexagonal bin indicates density of datapoints in that area. Darker the color, higher the density and vice versa.
6. Residual (kind='resid’): This plot type is similar to the regression plot; it shows the residuals (differences between the observed and predicted values) from the linear regression. The marginal axes show univariate histograms.
Changing the Key Parameters in the Center Plot:
The following parameters allows to customize the appearance and behavior of the scatter plot and the regression line that is fitted to the data.
Marker, ci, and order are defined in joint_kws(keywords). The joint_kws parameter refers to additional keywords that can be passed to all kinds of plotting functions like hex, resid, kde, hexbin, red, and scatter.
Changing the Key Parameters in the Marginal Axis:
In the same manner, the marginal axis can also be changed by passing kws into marginal.
Conclusion:
The jointplot function in Seaborn is a versatile tool for visualizing the relationship between two variables along with their distributions. Various parameters allow customizing the appearance, incorporating categorical variables, and overlaying multiple plot types for deeper insights into the data.
Reference:
Nice one!