library(PAGES)

The datasauRus and tidyverse packages

This exercise is largely constructed around the datasauRus (Locke and D’Agostino McGowan 2018) package. In addition, I will use again the tidyverse collection packages ggplot2 (Wickham et al. 2020; Wickham 2016) for plotting and dplyr (Wickham et al. 2021) for data manipulations.

library(datasauRus)
library(ggplot2)
library(dplyr, warn.conflicts = FALSE)

Data

The datasaurus_dozen dataset of the datasauRus (Locke and D’Agostino McGowan 2018) package consists of one categorical variable dataset representing subsets of the data, which, in turn, all contain an x and y variable.

Model datasauRus

First, we will model all datasets with a conventional least square linear regression with geom_smooth() and by setting the argument method to "lm". We see then that all subsets can be fitted with more-or-less similar models.

ggplot(data = datasaurus_dozen) +
  geom_smooth(mapping = aes(x = x, y = y, colour = dataset), method = "lm") +
  theme_classic() +
  theme(legend.position = "none") +
  facet_wrap(facets = vars(dataset), ncol = 3)
#> `geom_smooth()` using formula 'y ~ x'

The theme() and theme_classic() functions in this construction dictate certain visual aspects of the plots. They are of no further relevance.

Visualize datasauRus

These plots are a variant of plots known as Anscombe plots, after the statistician Francis Anscombe, demonstrating the importance of graphing data before analysing it.

ggplot(data = datasaurus_dozen) +
  geom_point(mapping = aes(x = x, y = y, colour = dataset)) +
  theme_void() +
  theme(legend.position = "none") +
  facet_wrap(facets = vars(dataset), ncol = 3)

Reference

Locke, Steph, and Lucy D’Agostino McGowan. 2018. datasauRus: Datasets from the Datasaurus Dozen. https://CRAN.R-project.org/package=datasauRus.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2020. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

3. Patterns and models

The datasauRus and tidyverse packages

Data

Model datasauRus

Visualize datasauRus

Reference