The Simulator: An R package for better simulations

Jacob Bien

What is the simulator?

The simulator is an R package that makes it easier to write simulation studies. Code written with the simulator is succinct, highly readable, and easily shared with others. The simulator makes it easy to reuse code, which saves time and facilitates reproducibility.

How does it work?

The simulator takes care of all the “infrastructure” of a simulation, such as

This allows you to focus exclusively on the interesting problem-specific aspects of the simulation. The simulator takes a “modular approach” to writing simulations. There are three types of modules that need to be defined for a simulation:

1) Models: How is the data generated? What are the underlying parameters?

2) Methods: What is done to the data?

3) Metrics: How should the methods’ outputs be evaluated?

These user-defined objects are then “plugged in” to the simulator. The modular approach also allows one to add, remove, or mix-and-match components (for example, add a method, remove a metric, etc) with minimal code changes.

What does code written with the simulator look like?

Here's the code for a simulation comparing the lasso to ridge regression as we vary the dimension p:

library(simulator)
new_simulation(name = "lasso-ridge",
               label = "Compare lasso to ridge") %>%
  generate_model(make_lm, n = 50, p = list(20, 50, 80), vary_along = "p",
                 seed = 123) %>%
  simulate_from_model(nsim = 100) %>%
  run_method(list(lasso, ridge)) %>%
  evaluate(list(nnz, best_sqr_err))

The problem-specific parts (make_lm, lasso, ridge, nnz, best_sqr_err) are defined elsewhere. For example, lasso in the above code is a “Method object,” which was created with the following code:

library(glmnet)
lasso <- new_method(name = "lasso", label = "The Lasso",
                    method = function(model, draw) {
                      fit <- glmnet(x = model$x, y = draw)
                      list(beta = fit$beta, df = fit$df)
                    })

After running the simulation, we can make plots or tables as follows:

sim <- load_simulation("lasso-ridge")
plot_eval(sim, "best_sqr_err")
tabulate_eval(sim, "best_sqr_err")

This and other examples are shown in greater detail in the vignettes for the package.

How do I get the simulator?

The simulator is available on CRAN, meaning that one can type

install.packages("simulator")

from within R. For the most up-to-date version of the code (and to see the code itself), one can go to the simulator's github page.

Where can I learn more?

You can read about the simulator in the paper The Simulator: An Engine to Streamline Simulations.

You can hear about the simulator by watching my useR!2016 talk.

You can get started with the simulator by following this vignette.

You can see examples of the simulator in the context of some of the most famous statistical methods:

1) Lasso vignette: Explains basics, including the magrittr pipe and making plots and tables. Also demonstrates some more advanced features such as writing method extensions (such as refitting the result of the lasso or performing cross-validation).

2) James-Stein vignette: Shows how to step into specific parts of the simulation for troubleshooting your code.

3) Elastic net vignette: Shows how we can work with a sequence of methods that are identical except for a parameter that varies

4) Benjamini-Hochberg vignette: Shows how we can load a preexisting simulation and add more random draws without having to rerun anything. It also shows how one can have multiple simulation objects that point to overlapping sets of results