A Quick Interactive Tool to Visualize Multiple Variable Correlations
The Thought
Very often one is bombarded with a large collection of variables and is forced to understand how all of the variables interact. In short, predict the response of one variable using another variable. The first step in this process is to usually see what variables are linearly correlated and model these uses a linear model such as a simple regression.
Below is a simple tool to discover what variables have the most linear correlations in a data set. This code is a simple Shiny application that takes a collection of data with variables from the data set baseball in the corrgram library. This code calculates the Pearson correlation coefficient of each variable with all of the remaining variables in the data set. A interactive heatmap is generated with the variables that have the correlation coefficient closest to one across the diagonal centerline of the heatmap. This is done using the d3heatmap library. The change in color reflects the correlation “strength” and variables that are correlated with themselves are the darkest and have a coefficient of 1.
The Code
##################################################################################
#
# Description: Interactive Correlations Heatmap
#
# Location: N/A
#
# Program name: N/a
#
# Source code: v1.0
#
# Author: Jason Watts
#
# Sys.info: SnowWhite and the 88 Dwarfs
#
# Computational Framework: Microsoft R Open version: >=3.4.2
#
# Web Framework: Shiny >=1.0.5
#
# Analytics Dashboard Framework: N/A
#
# Plotting and Graphics: d3heatmap
#
# License: Private with Open Source components. Open Source components require credits with distribution.
##################################################################################
# Load Required Libraries
library(d3heatmap)
library(shinythemes)
library(shiny)
library(corrgram)
# UI
ui <- fluidPage(theme = shinytheme("superhero"),
h1("Correlation Heatmap"),
selectInput("palette", "Palette", c("YlOrRd", "RdYlBu", "Greens", "Blues")),
checkboxInput("cluster", "Apply Clustering"),
checkboxInput("scale", "Apply Scale"),
h3("The interactive heatmap is a interactive tool to help visualize model correlations. The variables along the diagonal have the highest degree of correlations. Use your mouse to zoom in on the variables along the diagonal to explore various correlations. Correlations range from zero to one, one having the highest degree of correlation. Explore the various baseball variables below."),
wellPanel(h1("Heatmap"),d3heatmapOutput("heatmap", width = "100%", height = "700px"))
)
# Server
server <- function(input, output, session) {
data(baseball) #Load Data Set from Corrgram Package
baseball <- data.frame(baseball[sapply(baseball, function(x) !is.factor(x))]) # Remove Factors
cr_baseball <- abs(cor(baseball, method = c("pearson"), use="complete.obs")) # Calculate Pearson’s Correlation Coefficient
output$heatmap <- renderD3heatmap({ # Heatmap Generation and Rendering
d3heatmap(if (input$scale) scale(cr_baseball) else cr_baseball , revC=FALSE, scale="column",
theme="", k_row = 3, k_col = 3, colors = input$palette,
dendrogram = if (input$cluster) "both" else "none", show_grid = TRUE)
})
}
shinyApp(ui, server)
The Product
I have the code running on my personal server for those who would like to test the application and play with the interactive heatmap.