This page shows all of the projects that are currently part of the Embracing Heterogeneity Project.

  1. Response Time in ILSA

Given the formidable scope of PISA, creative assessment designs are employed to ensure sufficient content coverage and that assessed groups are measured with appropriate precision. To that end, PISA uses a sophisticated assessment design whereby each individual student is administered just a small number of the total possible items, yet all items are administered throughout each of the reporting groups. In large scale assessment this approach is often referred to as multiple-matrix sampling (Shoemaker, 1973). In multiple-matrix sampling, test administration is done in such a way as to ensure sufficiently precise achievement distribution estimates (Mislevy, Beaton, Kaplan, & Sheehan, 1992; von Davier & Sinharay, 2014). The process used to estimate achievement is highly-specialized and is outside the scope of the current proposal (Adams & Wu, 2007; Mislevy, 1991; von Davier & Sinharay, 2014). However, it is important to note that the models used to estimate achievement rely on an item response theory (IRT) model and a latent regression model, which usually includes information from the student background questionnaire. This approach leverages information about groups of test takers and permits inferences at the population and subpopulation level. As we describe subsequently, one possible source of information includes response time data.

One consequence of a computer based assessment (CBA) platform, adopted by PISA, is that process data (e.g., response times and keystrokes) are easily harvested as part of the data collection process. To that end, operational procedures in PISA currently only use response time data in limited ways – primarily as a means of exploratory analysis (OECD, 2017). However, relatively recent methodological innovations make explicitly modeling response times possible (van der Linden, 2007; van der Linden, Klein Entink, & Fox, 2010). In the current paper, we investigate whether including timing data in models for item parameter estimation offers any advantage in accuracy over currently used methods. Further, we consider whether the inclusion of timing data in the latent regression improves precision about achievement distributions. We rely on a simulation study to answer our research question. We first simulate data according to operationally observed conditions in PISA using the R (R Core Team, 2018) package lsasim (Matta, Rutkowski, Rutkowski, Liaw, & Mughogho, 2017). We then estimate item parameters using cirt, which allows for the inclusion of response times. Finally, we estimate achievement distributions in TAM (Robitzsch, Kiefer, & Wu, 2017). Item parameter and achievement estimates are compared to known, population values. The simulation is supplemented with an empirical example from PISA 2015.

Accurate achievement estimates have clear policy and interpretation implications. As such, this research can potentially improve overall and subpopulation achievement estimates in international assessments, such as PISA and related studies. Further, this study offers one possibility for using these rich data.


2.   Multistage Testing