The package PAGES is the supportive material of a webinar designed for the PAGES Early-Career Network (ECN). The goal is to give some useful pointers to explore geological data, in particular stratigraphic occurrences, and uses RStudio and packages from the tidyverse universe.

R for Data Science (R4DS)

This class is modelled after Hadley Wickham’s and Garrett Grolemund’s R4DS. However, I have augmented the examples with cases from geology.

Book cover [@Wickham2016]

Book cover (Wickham and Grolemund 2016)

Credits

The construction of the R (R Core Team 2021) package PAGES and associated documentation was aided by the packages; devtools (Wickham, Hester, and Chang 2021), roxygen2 (Wickham, Danenberg, et al. 2020), knitr (Xie 2021, 2014, 2015), rmarkdown (Allaire et al. 2021; Xie, Allaire, and Grolemund 2018; Xie, Dervieux, and Riederer 2020), bibtex (Francois 2020), and the superb guidance in the book: R packages: organize, test, document, and share your code, by Wickham (2015). In addition, this package relies on a set of external packages from the tidyverse universe, including: dplyr (Wickham et al. 2021), tidyr (Wickham 2021), tibble (Müller and Wickham 2021), readr (Wickham and Hester 2020), magrittr (Bache and Wickham 2020), and readr (Wickham and Hester 2020). Plots are made with ggplot2 (Wickham, Chang, et al. 2020; Wickham 2016) and thematic (Sievert, Schloerke, and Cheng 2021) is used for a consistent design in the presentation.

The package marelac (Soetaert and Petzoldt 2020) is used for chemical data and transformations, and the package datasauRus (Locke and D’Agostino McGowan 2018) is used as an example for the strength of plotting data.

Installation

You can install the released version of PAGES from github with:

# Install PAGES from GitHub:
# install.packages("devtools")
devtools::install_github("MartinSchobben/PAGES", build_vignettes = TRUE)

Usage

Load PAGES with library.

library(PAGES)

Data sources

The study on the Triassic–Jurassic (~201 million years before present) boundary section of Bonenburg (Germany) and Kuhjoch (Austria) by Schobben et al. (2019) is used as the example material for this course. Lazy load datasets constitute:

-kuhjoch - Kuchjoch is a palynological dataset where the counts have summed for spores, pollen, aquatic and terrestrial elements.
-bonenburg - Bonenburg is a geochemical dataset containing: elemental analyser total organic carbon (TOC) and total nitrogen (TN), XRF element data; Aluminium (Al), Potassium (K) and sodium (Na), as well as the carbon isotope composition of TOC (del13Ctoc).

Raw datasets (kuhjoch_raw.csv and bonenburg_raw.csv) can be easily accessed with the PAGES_example() function and a call to the readr function read_csv().

readr::read_csv(PAGES_example("kuhjoch_raw.csv"))

Render presentation slides

To render the presentation slides:

Exercises

Details regarding the exercise and live programming during the webinar can be found under the package vignettes (called with vignette()).

Tidy format

The lazy load datasets are provided in a tidy format. Look-up directory data-raw on the Github repository for details on the data processing.

head(bonenburg)
#> # A tibble: 6 x 11
#>   section strat strat2 sampleid height CaCO3    TN del13Ctoc TOCcfb  Na_Al  K_Al
#>   <chr>   <chr> <chr>     <dbl>  <dbl> <dbl> <dbl>     <dbl>  <dbl>  <dbl> <dbl>
#> 1 Bonenb… Cont… Pre-e…        0   3.01 13.3   0.06     -27.5   1.16 0.0314 0.216
#> 2 Bonenb… Cont… Pre-e…      100   3.95  3.84  0.07     -27.3   0.96 0.0250 0.211
#> 3 Bonenb… Cont… Pre-e…      150   4.43  5.86  0.07     -27     1.25 0.0197 0.224
#> 4 Bonenb… Cont… Pre-e…      200   4.94 12.8   0.07     -27.8   1.52 0.0231 0.236
#> 5 Bonenb… Cont… Pre-e…      250   5.25  3.34  0.09     -27.6   2.45 0.0330 0.243
#> 6 Bonenb… Cont… Pre-e…      275   5.68  9.91  0.06     -27     1.19 0.0201 0.251

Besides wide format data the similarly named datasets with the suffix _long are used to generate, for example, multi-proxy stratigraphic plots for initial data exploration.

ggplot(data = bonenburg_long) +
  geom_point(mapping = aes(x = value, y = height)) +
  facet_grid(cols = vars(measurement), scales = "free_x") +
  theme_classic()

Bonenburg geochemical data traversing the Triassic-Jurassic boundary.

Further reading

Data science with R
Hadley Wickham & Garrett Grolemund 2016 R for Data Science

General statistics with R
Peter Dalgaard 2008 Introduction to statistics with R

Regression with R
John Fox & Sanford Weisberg 2018 An R companion to applied regression

Mixed effect models with R
Alain Zuur et al. 2008 Mixed Effects Models and Extensions in Ecology with R

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2021. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.
Bache, Stefan Milton, and Hadley Wickham. 2020. Magrittr: A Forward-Pipe Operator for r. https://CRAN.R-project.org/package=magrittr.
Dalgaard, Peter. 2008. Introduction to statistics with R. Edited by J Chambers, D Hand, and W. Hardle. Springer. https://doi.org/10.1201/9780429341830-12.
Fox, John, and Sanford Weisberg. 2018. An R companion to applied regression. Sage publications.
Francois, Romain. 2020. Bibtex: Bibtex Parser. https://github.com/romainfrancois/bibtex.
Locke, Steph, and Lucy D’Agostino McGowan. 2018. datasauRus: Datasets from the Datasaurus Dozen. https://CRAN.R-project.org/package=datasauRus.
Müller, Kirill, and Hadley Wickham. 2021. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Schobben, Martin, Julia Gravendyck, Franziska Mangels, Ulrich Struck, Robert Bussert, Wolfram M. Kürschner, Dieter Korn, P. Martin Sander, and Martin Aberhan. 2019. “A comparative study of total organic carbon-δ13C signatures in the Triassic–Jurassic transitional beds of the Central European Basin and western Tethys shelf seas.” Newsletters on Stratigraphy 52 (4): 461–86. https://doi.org/10.1127/nos/2019/0499.
Sievert, Carson, Barret Schloerke, and Joe Cheng. 2021. Thematic: Unified and Automatic Theming of Ggplot2, Lattice, and Base r Graphics. https://CRAN.R-project.org/package=thematic.
Soetaert, Karline, and Thomas Petzoldt. 2020. Marelac: Tools for Aquatic Sciences. https://CRAN.R-project.org/package=marelac.
Wickham, Hadley. 2015. R Packages: Organize, Test, Document, and Share Your Code. O’Reilly Media, Inc. https://r-pkgs.org/.
———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2021. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2020. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Peter Danenberg, Gábor Csárdi, and Manuel Eugster.
  1. Roxygen2: In-Line Documentation for r. https://CRAN.R-project.org/package=roxygen2.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Garrett Grolemund. 2016. R for data science: import, tidy, transform, visualize, and model data. O’Reilly Media, Inc. https://r4ds.had.co.nz/index.html.
Wickham, Hadley, and Jim Hester. 2020. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wickham, Hadley, Jim Hester, and Winston Chang. 2021. Devtools: Tools to Make Developing r Packages Easier. https://CRAN.R-project.org/package=devtools.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2021. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.
Zuur, Alain F., Elena N. Ieno, Neil J. Walker, Anatoly A. Saveliev, and Graham M. Smith. 2008. Mixed Effects Models and Extensions in Ecology with R. https://doi.org/10.4324/9780429201271-2.