rg <- paste("The range of mpg is", sum(mean(mtcars$mpg), sd(mtcars$mpg)), "-", sum(mean(mtcars$mpg), -sd(mtcars$mpg)))
rg[1] "The range of mpg is 26.1175730520891 - 14.0636769479109"
HES 505 Fall 2022: Session 4
Matt Williamson
Describe the basic components of functions
Introduce the apply and map family of functions
Practice designing functions for repetitive tasks
class of R object (can call function inside functions)rg <- paste("The range of mpg is", sum(mean(mtcars$mpg), sd(mtcars$mpg)), "-", sum(mean(mtcars$mpg), -sd(mtcars$mpg)))
rg[1] "The range of mpg is 26.1175730520891 - 14.0636769479109"
A self-contained (i.e., modular) piece of code that performs a specific task
Allows powerful customization and extension of R
Copy-and-paste and repetitive typing are prone to errors
Evocative names and modular code make your analysis more tractable
Update in one place!
If you are copy-and-pasting more than 2x, consider a function!
Sketch out the steps in the algorithm (pseudocode!)
Develop working code for each step
Anonymize
Provide the data that the function will work on
Provide other arguments that control the details of the computation (often with defaults)
Called by name or position (names should be descriptive)
Same As
The body of the function appears between the {}
This is where the function does its work
Default is to return the last argument evaluated
Can use return() to return an earlier value
Can use list to return multiple values
A note on the Environment
Another tool for reducing code duplication
Iteration for when you need to repeat the same task on different columns or datasets
Imperative iteration uses loops (for and while)
Functional iteration combines functions with the apply family to break computational challenges into independent pieces.
Use counters (for) or conditionals (while) to repeat a set of tasks
3 key components
apply familyVectorized functions that eliminate explicit for loops
Differ by the class they work on and the output they return
apply, lapply are most common; extensions for parallel processing (e.g., parallel::mclapply)
apply familyapply for vectors and data frames
Args: X for the data, MARGIN how will the function be applied, (1=rows, 2=columns), FUN for your function, ... for other arguments to the function
apply familylapply for lists (either input or output)
Args: X for the data, FUN for your function, ... for other arguments to the function
map familySimilar to apply, but more consistent input/output
All take a vector for input
Difference is based on the output you expect
Integrates with tidyverse
map familymap(): output is a listmap_int(): output is an integer vectormap_lgl(): output is a logical vectormap_dbl(): output is a double vectormap_chr(): output is a character vectormap_df(), map_dfr(), map_dfc(): output is a dataframe (r and c specify how to combine the data)Transparency vs. speed
Testing
Moving forward