Summary of “Data Visualization: A Practical Introduction” by Kieran Healy (2018)

Summary of

Technology and Digital TransformationData Analytics

Introduction

Kieran Healy’s book “Data Visualization: A Practical Introduction” serves as a comprehensive guide aimed at helping readers understand and execute effective data visualization techniques. Geared towards practitioners in data analytics, the book proceeds with a practical approach, focusing on actionable advice and examples mainly using the R programming language and the ggplot2 package.

Chapter 1: The Grammar of Graphics and ggplot2

Key Points

  1. Grammar of Graphics: Healy introduces the concept of the grammar of graphics, which forms the backbone of the ggplot2 package. This “grammar” provides a structured way to describe the components that make up any statistical graphic.

  2. ggplot2 Basics: Basic components of ggplot2 such as ggplot(), aes(), and geom_* functions are explained. ggplot() initializes the plot, aes() maps aesthetic attributes to variables, and geom_* functions add different types of layers to the plot.

Actionable Advice

  • Action 1: Start every data visualization in R with ggplot(data_frame, aes(x, y)) as the foundational step.
  • Action 2: Employ different geom_* functions such as geom_point() for scatter plots or geom_bar() for bar charts to add layers to your plot.

Example

Healy provides examples such as creating a basic scatter plot:

R
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()

Chapter 2: Data Preparation

Key Points

  1. Tidying Data: The importance of tidy data, wherein each variable forms a column, each observation forms a row, and each type of observational unit forms a table, is emphasized.
  2. Data Transformation: Using functions from the dplyr package such as filter(), mutate(), summarize(), and group_by() for data manipulation.

Actionable Advice

  • Action 3: Always ensure your dataset is tidied before plotting. Use tools like tidyr and dplyr to reshape and clean data.
  • Action 4: Normalize your data transformation pipeline using verbs from the dplyr package to consistently manipulate data.

Example

Healy offers an example of tidying and transforming data:

R
library(dplyr)
mtcars %>%
filter(cyl == 6) %>%
mutate(kmpl = mpg * 0.425144) %>%
summarize(avg_kmpl = mean(kmpl))

Chapter 3: Creating Effective Visualizations

Key Points

  1. Understanding Your Audience: Tailoring your visuals based on the audience’s level of statistical knowledge.
  2. Choosing the Right Chart: Depending on the data type and the message you intend to convey, the author discusses when to use bar charts, line charts, scatter plots, etc.
  3. Design Principles: Utilizes Tufte’s principles for data-ink ratio, avoiding chartjunk, and maintaining simplicity in design.

Actionable Advice

  • Action 5: Identify the target audience and adjust the complexity of your visualizations accordingly.
  • Action 6: Choose an appropriate chart type based on your data and the story you want to tell.
  • Action 7: Simplify your charts by removing unnecessary elements to maintain a high data-ink ratio.

Example

An example of avoiding chartjunk:

R
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_bar(stat = "identity") +
theme_minimal()

Chapter 4: Customizing Plots

Key Points

  1. Themes and Styles: The importance of customizing ggplot2 themes (theme()) to enhance readability and aesthetics of plots.
  2. Labels and Annotations: Adding and customizing titles, labels, and annotations to make your plot more informative.

Actionable Advice

  • Action 8: Use ggplot2’s theme() function to consistently apply custom styles across different plots.
  • Action 9: Improve plot readability by adding meaningful titles (ggtitle()), labels (labs()), and annotations (annotate()).

Example

Healy illustrates enhancing a plot’s appearance using custom themes and labels:

R
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
geom_point() +
labs(title = "Scatter plot of MPG vs Weight",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon",
color = "Cylinders") +
theme_minimal()

Chapter 5: Advanced Techniques

Key Points

  1. Faceting: Split data by one or more variables using facet_wrap() or facet_grid() to create subplots within a single visualization for comparison.
  2. Complex Geometries: Use specialized geometries like geom_boxplot(), geom_violin(), and geom_density() for advanced visual needs.

Actionable Advice

  • Action 10: Utilize faceting (facet_wrap() or facet_grid()) to compare subgroups within your data.
  • Action 11: For distributions and categorical data, explore advanced geometries provided by ggplot2.

Example

Healy shows how to use faceting and box plots:

R
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
facet_wrap(~gear) +
theme_light()

Chapter 6: Communicating Results

Key Points

  1. Storytelling with Data: The narrative aspect of presenting data visualizations, turning data into compelling stories.
  2. Interactivity: Leveraging interactive visualization tools, such as plotly, to engage the audience and offer deep data exploration capabilities.

Actionable Advice

  • Action 12: Frame your visualizations within a broader narrative to convey your insights effectively.
  • Action 13: Utilize interactive tools like plotly to allow users to explore the data in a more engaging manner.

Example

Creating an interactive plot with plotly:

R
library(plotly)
p <- ggplot(mpg, aes(displ, hwy, text = model)) +
geom_point()
ggplotly(p)

Conclusion

Kieran Healy’s “Data Visualization: A Practical Introduction” is a highly effective resource for anyone looking to master the art and science of data visualization using R and ggplot2. The book is rich with practical advice and concrete examples that can be directly applied to real-world data visualization scenarios.

Summarizing Healy’s key lessons, effective data visualization begins with a solid understanding of the principles and grammar of graphics, thorough data preparation, and the application of design principles aimed at readability and effectiveness. By following the actionable advice and using the methods and styles detailed in the book, readers can enhance their ability to communicate complex data insights compellingly and accurately.

Technology and Digital TransformationData Analytics