Graduation Year


Document Type




Degree Granting Department

Educational Measurement and Research

Major Professor

Jeffrey D. Kromrey


Multilevel Modeling, Observational Studies, Propensity Scores, Simulation Research


Propensity score analysis has been used to minimize the selection bias in observational studies to identify causal relationships. A propensity score is an estimate of an individual's probability of being placed in a treatment group given a set of covariates. Propensity score analysis aims to use the estimate to create balanced groups, akin to a randomized experiment. This study used Monte Carlo methods to examine the appropriateness of using propensity score methods to achieve balance between groups on observed covariates and reproduce treatment effect estimates in multilevel studies. Specifically, this study examined the extent to which four different propensity score estimation models and three different propensity score conditioning methods produced balanced samples and reproduced the treatment effects with clustered data. One single-level logistic model and three multilevel models were investigated. Conditioning methods included: (a) covariance adjustment, (b) matching, and (c) stratification. Design factors investigated included: (a) level-1sample size, (b) level-2 sample size, (c) level-1 covariate relationship to treatment, (d) level-2 covariate relationship to treatment, (e) level-1 covariate relationship to outcome, (f) level-2 covariate relationship to outcome, and (g) population effect size. The results of this study suggest the degree to which propensity score analyses are able to create balanced groups and reproduce treatment effect estimates with clustered data is largely dependent upon the propensity score estimation model and conditioning method selected. Overall, the single-level logistic and random intercepts models fared slightly better than the more complex multilevel models while covariance adjustment and matching methods tended to be more stable in terms of balancing groups than stratification. Additionally, the results indicate propensity score analysis should not be conducted with small samples. Finally, this study did not identify an estimation model or conditioning method that was consistently able to create adequately balanced groups and reproduce treatment effect estimates.