An Empirical Comparison of the Effect of Missing Data on Type I Error and Statistical Power of the Likelihood Ratio Test for Differential Item Functioning: An Item Response Theory Approach using the Graded Response Model
Doctor of Philosophy (Ph.D.)
Degree Granting Department
Educational Measurement and Research
Jeffrey D. Kromrey, Ph.D.
Bárbara C. Cruz, Ed.D.
Eun-Sook Kim, Ph.D.
James A. Duplass, Ph.D.
Attitude Assessment, Civics Education, Invariance, Polytomous Item, Validity
In the context of educational research, missing data arise when examinees omit or do not reach an item, which generates an item nonresponse problem. Using a simulation approach, in addition to conducting complete data analyses, this study compared the performance of six methods for treating item nonresponse in the context of differential item functioning (DIF). The effect of missing data on the Type I error and statistical power of the Likelihood Ratio test for DIF detection in small scales was examined in the context of Item Response Theory (IRT-LR), using polytomous, Likert-type data and the graded response model. The effect of ability distribution, sample size, number of items, proportion of missing observations, and proportion of missing items on Type I error rates and empirical power of the IRT-LR DIF test were examined under full information maximum likelihood (FIML), multiple imputation (MI), person mean substitution (PMS), single regression substitution (SRS), relative mean substitution (RMS), and Listwise deletion missing data methods. Type I error rates were very consistent across nominal levels and factors, under each missing data method. Among the missing data methods examined, the FIML and PMS methods had Type I error rates comparable to the rejection rates for complete data. Although MI is considered a “state-of-the-art” missing data method, in this study, MI, as well as SRS were the less effective missing data methods (i.e., both MI and SRS had inflated rejections rates across all conditions). On the same note, Listwise deletion has been described as one of the most ineffective methods; however, under large data, the data loss due to implementing Listwise deletion might not be a problem if in addition other conditions are present, such as a small proportions of missing observations and small number of items or variables. Along with complete data and FIML, the PMS method had an adequate Type I error control under both nominal levels examined. MI and SRS had the smallest proportions of conditions meeting Bradley’s criteria for robustness at both levels of significance examined; as a result, when alpha was .01 none of the simulation conditions of these methods met the criteria for robustness and were not included in power analyses at this significance level. Power analyses were entirely consistent across nominal levels, factors and missing data methods. Entirely consistent with theory, sample size and proportion of missing observations were the factors affecting the performance of the IRT-LR test for DIF detection across all missing data methods.
Scholar Commons Citation
Rodriguez De Gil, Patricia, "An Empirical Comparison of the Effect of Missing Data on Type I Error and Statistical Power of the Likelihood Ratio Test for Differential Item Functioning: An Item Response Theory Approach using the Graded Response Model" (2015). Graduate Theses and Dissertations.