Graduation Year

2011

Document Type

Dissertation

Degree

Ph.D.

Degree Granting Department

Psychology

Major Professor

Walter Borman

Keywords

Cut Score, Modified Angoff, Performance Standard, Rater Fatigue, Test Development

Abstract

The primary purpose of this study was to evaluate the effectiveness of stratified item sampling in order to reduce the number of items needed in Modified Angoff standard setting studies. Representative subsets of items were extracted from a total of 30 full-length tests based upon content weights, item difficulty, and item discrimination. Cut scores obtained from various size subsets of each test were compared to the full-length test cut score as a measure of generalizability. Applied sampling results indicated that 50% of the full-length test is sufficient to obtain cut scores within one standard error of estimate (SEE) of the full-length test standard, and 70% of the full-length test is sufficient to obtain standards within one percentage point of the full-length test standard. A theoretical sampling procedure indicated that 35% of the full-length test is required to reliably obtain a standard within one SEE of the full-length standard, and 65% of the full-length test is required to fall within one percentage point. The effects of test length, panelist group size, and interrater reliability on the feasibility of stratified item sampling were also examined. However, these standard setting characteristics did not serve as significant predictors of subset generalizability in this study.

Share

COinS