As part of the Embracing Heterogeneity project, an R package was created to facilitate the simulation of data from large-scale educational assessments. This package is called “lsasim”. The first stable release of lsasim was released on March 1, 2017 through the Comprehensive R Archive Network (CRAN). For those interested, the development version on GitHub is fully functional.

lsasim currently has four areas of functionality:

  1. simulating discrete and continuous background questionnaire data from known marginals and correlations
  2. flexibly generate item parameters from combinations of item response models
  3. assign items to blocks and booklets using default spiraling designs or user-specified specifications
  4. simulate item responses based on matrix sampling designs.

Thirty months after this first major release, lsasim 2.0.0 was made available on CRAN. This second major release brought in multiple features for users interested in generating data from background questionnaires.

The lsasim package enables researchers to generate data for conducting research on many aspects of large-scale assessments. In particular, ilsasim has a number of capabilities including simulating correlated background questionnaire that have continuous, dichotomous, and ordinal scales. In addition, background variables can be specified to have varied relationships with underlying latent proficiencies. Item responses – that depend on the latent proficiency – can be generated from matrix sampling designs (Rutkowski, von Davier, Gonzalez, & Zhou, 2014) and exploratory IRT models. Although users have the ability to specify each part of the data generating process, the package comes with population parameters from previous PISA and TIMMS assessments as well as functions that facilitate the generation of random population parameters. In the proposed paper, we provide an overview of the methodology – based in international assessment design, population modeling (Mislevy, Beaton, Kaplan, & Sheehan, 1992), and item response theory – used to generate each inter-related component of the data.

We are currently preparing the next feature release of lsasim, which will add some exciting new capabilities to the software. In the meantime, a couple of bug-fixing patch-releases have made their way into the CRAN repository, which is why the latest released version of lsasim is 2.0.2.



Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29(2), 133–161.

Rutkowski, L., von Davier, M., Gonzalez, E., & Zhou, Y. (2014). Assessment design for international large-scale assessment. In Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis. Boca Raton, FL: Chapman & Hall/CRC Press.