Introduction to ggplot2

HES 505 Fall 2022: Session 23

Matt Williamson

Objectives

By the end of today you should be able to: * Understand the relationship between the Grammar of Graphics and ggplot syntax

  • Describe the various options for customizing ggplots and their syntactic conventions

  • Generate complicated plot layouts without additional pre-processing

Grammar of Graphics (Wilkinson 2005)

  • Grammar: A set of structural rules that help establish the components of a language

  • System and structure of language consist of syntax and semantics

  • Grammar of Graphics: a framework that allows us to concisely describe the components of any graphic

  • Follows a layered approach by using defined components to build a visualization

  • ggplot2 is a formal implementation in R

The ggplot2 hex logo.


{ggplot2} is a system for declaratively creating graphics,
based on “The Grammar of Graphics” (Wilkinson, 2005).

You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

Advantages of {ggplot2}

  • consistent underlying “grammar of graphics” (Wilkinson 2005)
  • very flexible, layered plot specification
  • theme system for polishing plot appearance
  • lots of additional functionality thanks to extensions
  • active and helpful community

The Grammar of {ggplot2}


Component Function Explanation
Data ggplot(data)          The raw data that you want to visualise.
Aesthetics           aes() Aesthetic mappings between variables and visual properties.
Geometries geom_*() The geometric shapes representing the data.

The Grammar of {ggplot2}


Component Function Explanation
Data ggplot(data)          The raw data that you want to visualise.
Aesthetics           aes() Aesthetic mappings between variables and visual properties.
Geometries geom_*() The geometric shapes representing the data.
Statistics stat_*() The statistical transformations applied to the data.
Scales scale_*() Maps between the data and the aesthetic dimensions.
Coordinate System coord_*() Maps data into the plane of the data rectangle.
Facets facet_*() The arrangement of the data into a grid of plots.
Visual Themes theme() and theme_*() The overall visual defaults of a plot.

A Basic ggplot Example

The Data

Bike sharing counts in London, UK, powered by TfL Open Data

  • covers the years 2015 and 2016
  • incl. weather data acquired from freemeteo.com
  • prepared by Hristo Mavrodiev for Kaggle
  • further modification by myself


Variable Description Class
date Date encoded as `YYYY-MM-DD` date
day_night `day` (6:00am–5:59pm) or `night` (6:00pm–5:59am) character
year `2015` or `2016` factor
month `1` (January) to `12` (December) factor
season `winter`, `spring`, `summer`, or `autumn` factor
count Sum of reported bikes rented integer
is_workday `TRUE` being Monday to Friday and no bank holiday logical
is_weekend `TRUE` being Saturday or Sunday logical
is_holiday `TRUE` being a bank holiday in the UK logical
temp Average air temperature (°C) double
temp_feel Average feels like temperature (°C) double
humidity Average air humidity (%) double
wind_speed Average wind speed (km/h) double
weather_type Most common weather type character

ggplot2::ggplot()

The help page of the ggplot() function.

Data

ggplot(data = bikes)

Aesthetic Mapping


= link variables to graphical properties

  • positions (x, y)
  • colors (color, fill)
  • shapes (shape, linetype)
  • size (size)
  • transparency (alpha)
  • groupings (group)

Aesthetic Mapping

ggplot(data = bikes) +
  aes(x = temp_feel, y = count)

aesthetics

aes() outside as component

ggplot(data = bikes) +
  aes(x = temp_feel, y = count)


aes() inside, explicit matching

ggplot(data = bikes, mapping = aes(x = temp_feel, y = count))


aes() inside, implicit matching

ggplot(bikes, aes(temp_feel, count))


aes() inside, mixed matching

ggplot(bikes, aes(x = temp_feel, y = count))

Geometries


= interpret aesthetics as graphical representations

  • points
  • lines
  • polygons
  • text labels

Geometries

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point()

Visual Properties of Layers

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    color = "#28a87d",
    alpha = .5,
    shape = "X",
    stroke = 1,
    size = 4
  )

Setting vs Mapping of Visual Properties

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    color = "#28a87d",
    alpha = .5
  )

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  )

Mapping Expressions

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = temp_feel > 20),
    alpha = .5
  )

Mapping Expressions

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear"),
    alpha = .5,
    size = 2
  )

Mapping to Size

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    alpha = .5
  )

Setting a Constant Property

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 18,
    alpha = .5
  )

Filter Data

ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 18,
    alpha = .5
  )

Filter Data

ggplot(
    bikes %>% filter(!is.na(weather_type)),
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 18,
    alpha = .5
  )

Local vs. Global Encoding

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  )

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .5
  )

Adding More Layers

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

Global Color Encoding

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

Local Color Encoding

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

The `group` Aesthetic

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  ) +
  geom_smooth(
    aes(group = day_night),
    method = "lm"
  )

Set Both as Global Aesthetics

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season,
        group = day_night)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

Overwrite Global Aesthetics

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season,
        group = day_night)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm",
    color = "black"
  )

Statistical Layers

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = temp_feel, y = count)) +
  stat_smooth(geom = "smooth")

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = temp_feel, y = count)) +
  geom_smooth(stat = "smooth")

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = season)) +
  stat_count(geom = "bar")

ggplot(bikes, aes(x = season)) +
  geom_bar(stat = "count")

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = date, y = temp_feel)) +
  stat_identity(geom = "point")

ggplot(bikes, aes(x = date, y = temp_feel)) +
  geom_point(stat = "identity")

Facets

Facets


= split variables to multiple panels

Facets are also known as:

  • small multiples
  • trellis graphs
  • lattice plots
  • conditioning

Wrapped Facets

g <-
  ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .3,
    guide = "none"
  )
g +
  facet_wrap(
    vars(day_night)
  )

Wrapped Facets

g +
  facet_wrap(
    ~ day_night
  )

Scales

Scales


= translate between variable ranges and property ranges

  • feels-like temperature  ⇄  x
  • reported bike shares  ⇄  y
  • season  ⇄  color
  • year  ⇄  shape

Scales

The scale_*() components control the properties of all the
aesthetic dimensions mapped to the data.


Consequently, there are scale_*() functions for all aesthetics such as:

  • positions via scale_x_*() and scale_y_*()

  • colors via scale_color_*() and scale_fill_*()

  • sizes via scale_size_*() and scale_radius_*()

  • shapes via scale_shape_*() and scale_linetype_*()

  • transparency via scale_alpha_*()

Scales

The scale_*() components control the properties of all the
aesthetic dimensions mapped to the data.


The extensions (*) can be filled by e.g.:

  • continuous(), discrete(), reverse(), log10(), sqrt(), date() for positions

  • continuous(), discrete(), manual(), gradient(), gradient2(), brewer() for colors

  • continuous(), discrete(), manual(), ordinal(), area(), date() for sizes

  • continuous(), discrete(), manual(), ordinal() for shapes

  • continuous(), discrete(), manual(), ordinal(), date() for transparency

Continuous vs. Discrete in {ggplot2}

Continuous:
quantitative or numerical data

  • height
  • weight
  • age
  • counts

Discrete:
qualitative or categorical data

  • species
  • sex
  • study sites
  • age group

Continuous vs. Discrete in {ggplot2}

Continuous:
quantitative or numerical data

  • height (continuous)
  • weight (continuous)
  • age (continuous or discrete)
  • counts (discrete)

Discrete:
qualitative or categorical data

  • species (nominal)
  • sex (nominal)
  • study site (nominal or ordinal)
  • age group (ordinal)

Aesthetics + Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point()

Aesthetics + Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_date() +
  scale_y_continuous() +
  scale_color_discrete()

Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_color_discrete()

Coordinate Systems


= interpret the position aesthetics

  • linear coordinate systems: preserve the geometrical shapes
    • coord_cartesian()
    • coord_fixed()
    • coord_flip()
  • non-linear coordinate systems: likely change the geometrical shapes
    • coord_polar()
    • coord_map() and coord_sf()
    • coord_trans()

Cartesian Coordinate System

ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  coord_cartesian()

Cartesian Coordinate System

ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  coord_cartesian(
    ylim = c(NA, 15000)
  )

Changing Limits

ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  coord_cartesian(
    ylim = c(NA, 15000)
  )

ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  scale_y_continuous(
    limits = c(NA, 15000)
  )