Graduation Year


Document Type




Degree Granting Department

Educational Measurement and Research

Major Professor

John Ferron, Ph.D.


Item Response Theory, Equating to a calibrated item bank, Multidimensionality, Fixed Parameter Calibration, Stocking and Lord linking


This study examined the feasibility of using Rasch true score preequating under violated model assumptions and nonequivalent populations. Dichotomous item responses were simulated using a compensatory two dimensional (2D) three parameter logistic (3PL) Item Response Theory (IRT) model. The Rasch model was used to calibrate difficulty parameters using two methods: Fixed Parameter Calibration (FPC) and separate calibration with the Stocking and Lord linking (SCSL) method. A criterion equating function was defined by equating true scores calculated with the generated 2D 3PL IRT item and ability parameters, using random groups equipercentile equating. True score preequating to FPC and SCSL calibrated item banks was compared to identity and Levine's linear true score equating, in terms of equating bias and bootstrap standard errors of equating (SEE) (Kolen & Brennan, 2004).

Results showed preequating was robust to simulated 2D 3PL data and to nonequivalent item discriminations, however, true score equating was not robust to guessing and to the interaction of guessing and nonequivalent item discriminations. Equating bias due to guessing was most marked at the low end of the score scale. Equating an easier new form to a more difficult base form produced negative bias. Nonequivalent item discriminations interacted with guessing to magnify the bias and to extend the range of the bias toward the middle of the score distribution. Very easy forms relative to the ability of the examinees also produced substantial error at the low end of the score scale. Accumulating item parameter error in the item bank increased the SEE across five forms. Rasch true score preequating produced less equating error than Levine's true score linear equating in all simulated conditions.

FPC with Bigsteps performed as well as separate calibration with the Stocking and Lord linking method. These results support earlier findings, suggesting that Rasch true score preequating can be used in the presence of guessing if accuracy is required near the mean of the score distribution, but not if accuracy is required with very low or high scores.