When we look at an image, what strikes us the most, is it the design, pattern, details, or color? Color indefinitely locks the attention of the viewer, so color plays a vital role in visualization. Coming to data visualization, visualization through colors is the key to conveying the message that data wants.
One of the best ways to visualize data in a 2-dimensional space is through a Heat- map, where the values of data are expressed in intensity or density as color. In other words, Heatmap is a 2-dimensional data visualization technique representing data magnitude as color. In general, intensity is the color, the higher is the value and lower is the value, the color is light or subtle. The name “Heatmap” is due to the usage of different shades or colors that are used to represent the magnitude or density of data values, that mimic the way heat is visualized. This also could be the thermal imaging techniques where hotter objects appear in warmer colors. These maps are useful in identifying patterns, trends and correlations in large datasets.
History:
History of heatmaps dates back to 1873, when Toussiant Loua, used a shading matrix to visualize social statistics in Paris districts. Later, it was followed by many accountants and mathematicians.
The idea of a data matrix was originated by Robert Lin in 1973. Later with the help of computer programming, Heatmaps evolved and were adopted across industries emerging as a useful tool in data analytics.
Types of Heatmaps:
Heatmaps can be classified based on various criteria such as their purpose, data type, and visualization technique. They are listed below,
· Geographical Heatmaps: These maps can be visualized spatial data on a geographical map. These can be seen as the density of points on a map or as shaded patterns on a particular geographical area.
· Matric Heatmaps: These maps are popular because they show the relationships between variables as correlation and confusion matrix.
· Time-series Heatmaps: These show changes over time like calendar and temporal heatmaps which are generally shown as gradient colors.
· Hierarchical Heatmaps: These visualize hierarchical data relationships, these could be dendrograms and clustered heatmaps.
· Image Heatmaps: These maps enhance visualizations of images or photographs.
· Statistical Heatmaps: These maps show statistical distributions and relationships.
The selection of these maps depends on the data nature, and insights from data and one can experiment with different types of Heatmaps can discover hidden patterns and relationships between variables.
Heatmaps in Python:
Many Python libraries like matplotlib, Seaborn, Plotly, Bokeh offer Heatmaps, out of which Seaborn can be considered better for creating Heatmaps due to its simplicity, enhanced features when compared to others such as less and concise code, enhanced visual appeal, pandas’ integration, Seaborn capabilities of clustering and annotations for correlations. Although Matplotlib offers flexibility, and customization options but Seaborn’s simplicity and attractive default styles make it popular for Heatmap creation for exploratory data analysis and visualizing correlations or patterns in tabular data.
Let's dive into Heatmap in Seaborn Python.
Prerequisites:
Install Seaborn and import numpy, pandas, and matplotlib as we all know Seaborn is built upon Matplotlib and pandas.
Data:
One can create arrays and convert them to data frames or take datasets and convert them to data frames, for heatmap generation, here, we have selected the preexisting dataset “car_crashes”, which is already available in the Seaborn library for analysis and visualization using Heatmap. This dataset provides information about car crashes in various states of the United States. It has 8columns and 50rows, out of which 7 columns are numerical.
Plotting a Heatmap:
Before plotting heatmap and correlation, one has to preprocess the data such as, using groupby, handling missing values, aggregating data.
The basic syntax of correlation and Heat map is
sns.heatmap(‘dataset’)
Where ‘dataset’ is the data frame you want to analyze.
This can also be written as below when we want to use a correlation coefficient between variables.
sns.heatmap(‘dataset.corr()’)
other syntax parameters are there which are for multiple purposes and are discussed further.
In the given dataset we want to find correlation coefficient between numerical variables and plot a Heatmap for the same.
When we call sns. heatmap(data. corr()) on a Pandas DataFrame, it calculates the Pearson correlation coefficients between all pairs of columns in the DataFrame and displays them in a color-coded matrix format, making it easy to visualize the strength and direction of correlations.
We need to reverse the color palette, that’s why we are introducing cmap.
Introducing color palette:
Seaborn offers a variety of predefined set of color palettes to customize the plot appearance. These palettes come in different types like qualitative, sequential, and diverging each suited for different data types and also depending on visualization needs. One can use specific palettes for categorical, ordered data. One can customize them as well. The “cmap” parameter is used to specify the color map (colormap) for the heatmap. Colormaps map data values to colors in a plot, helping to visually represent the data. The default colormap is often “viridis” but one can specify a different one using the cmap parameter, choosing from a wide range of colormaps available based on one’s need.
One can also use color merging by specifying the color codes say from Red to Blue denoted as “Rdbu”
Adding annotations, and setting values for Heatmap:
In the above Heatmap, ‘0’ values aren’t mapped to the exact color, therefore using vmin and vmax where,
these allow one to customize the color scale map of the heatmap.
Annot determines if, numeric values are displayed in each cell of the heatmap, and annotation keywords are passed to control the font. As the size of the map is less, how to increase that, we will see in the next step.
Further styling:
Further styling incudes adding borders to the map, changing the space of each row in table to square shape, adding tick marks to enhance the visual appeal. Here, row in table is converted into square by using square parameter, added x and y labels, adjusted their font.
Further visualizations in terms of color, and font could be explored.
Advantages and Disadvantages of Heatmaps:
Advantages: Heatmaps in Seaborn are super handy for visualizing complex data in a compact, visually appealing way. They tap into our brain's ability to process colors effectively, making patterns and correlations pop out. This is especially useful for spotting highly correlated or uncorrelated features in datasets where one can visualize correlation matrices as heatmaps. The best part is one can customize them by changing color palettes, adding annotations, and tweak labels. Additionally, Seaborn allows clustering and reordering of rows and columns based on similarity, revealing hidden relationships in data.
Disadvantages: But heatmaps aren't perfect. Different people perceive colors differently, so there's a risk of misinterpreting the data. when dealing with massive datasets, heatmaps can get cluttered and hard to make sense of, especially if no proper clustering or reordering of the data is done. Representing categorical variables in heatmaps is also a pain - one might need to preprocess or encode data first. Overplotting is another issue - when tons of similar data points are available, they can blend, making it tough to distinguish individual values. Finally, interpreting heat maps can be a challenge, especially for non-experts or people unfamiliar with data. Clear labeling, annotations, and color scale explanations are needed to communicate data findings effectively.
Practical applications are:
Widely used in Geographical data, website analysis, Business intelligence, sales
Conclusion:
The journey of heatmaps from simple, hand-drawn maps to sophisticated, computer-generated visualizations reflects broader trends in technology and data science. Today, heatmaps are essential tools in a wide array of fields, continuing to evolve with advancements in data processing and visualization technologies. Understanding their history helps us appreciate their versatility and power in making data comprehensible and actionable.
Reference:
Comments