A Quick Interactive Tool to Visualize Multiple Variable Correlations

Published Nov 27, 2017Last updated Dec 01, 2017
A Quick Interactive Tool to Visualize Multiple Variable Correlations

The Thought

Very often one is bombarded with a large collection of variables and is forced to understand how all of the variables interact. In short, predict the response of one variable using another variable. The first step in this process is to usually see what variables are linearly correlated and model these uses a linear model such as a simple regression.

Below is a simple tool to discover what variables have the most linear correlations in a data set. This code is a simple Shiny application that takes a collection of data with variables from the data set baseball in the corrgram library. This code calculates the Pearson correlation coefficient of each variable with all of the remaining variables in the data set. A interactive heatmap is generated with the variables that have the correlation coefficient closest to one across the diagonal centerline of the heatmap. This is done using the d3heatmap library. The change in color reflects the correlation “strength” and variables that are correlated with themselves are the darkest and have a coefficient of 1.

The Code

##################################################################################
#
# Description: Interactive Correlations Heatmap
#
# Location: N/A
#
# Program name: N/a
# 
# Source code: v1.0
#
# Author: Jason Watts
#
# Sys.info: SnowWhite and the 88 Dwarfs
#
# Computational Framework: Microsoft R Open version: >=3.4.2
#
# Web Framework: Shiny >=1.0.5
#
# Analytics Dashboard Framework: N/A
#
# Plotting and Graphics: d3heatmap
# 
# License: Private with Open Source components. Open Source components require credits with distribution.  
##################################################################################
# Load Required Libraries

library(d3heatmap)
library(shinythemes)
library(shiny)
library(corrgram)

# UI 
ui <- fluidPage(theme = shinytheme("superhero"),
  h1("Correlation Heatmap"),
        selectInput("palette", "Palette", c("YlOrRd", "RdYlBu", "Greens", "Blues")),
              checkboxInput("cluster", "Apply Clustering"),
                     checkboxInput("scale", "Apply Scale"),
                             h3("The interactive heatmap is a interactive tool to help visualize model correlations. The variables along the diagonal have the highest degree of correlations. Use your mouse to zoom in on the variables along the diagonal to explore various correlations. Correlations range from zero to one, one having the highest degree of correlation. Explore the various baseball variables below."),
                                   wellPanel(h1("Heatmap"),d3heatmapOutput("heatmap", width = "100%", height = "700px"))
 
) 
# Server
server <- function(input, output, session) {
  
data(baseball) #Load Data Set from Corrgram Package
  
baseball <- data.frame(baseball[sapply(baseball, function(x) !is.factor(x))]) # Remove Factors
        
          cr_baseball <- abs(cor(baseball, method = c("pearson"), use="complete.obs")) # Calculate Pearson’s Correlation Coefficient
  
  output$heatmap <- renderD3heatmap({ # Heatmap Generation and Rendering
        d3heatmap(if (input$scale) scale(cr_baseball) else cr_baseball , revC=FALSE, scale="column", 
                 theme="", k_row = 3, k_col = 3, colors = input$palette, 
                        dendrogram = if (input$cluster) "both" else "none", show_grid = TRUE)
})

}
shinyApp(ui, server)

The Product

I have the code running on my personal server for those who would like to test the application and play with the interactive heatmap.

Shiny HeatMap Demo

Discover and read more posts from Jason Watts
get started