Date of Award
Doctor of Philosophy (PhD)
Public Health Sciences
College of Graduate Studies
In studying the association between count outcomes and covariates using Poisson regression, the necessary requirement that the mean and variance of responses are equivalent for each covariate pattern is not always met in real datasets. This violation of equidispersion can lead to invalid inference unless proper alternative models are considered. There is currently no comprehensive and definitive assessment of the different methods of dealing with overdispersion, nor is there a standard approach for determining the threshold of overdispersion such that statistical intervention is necessary. The issue of overdispersion can be further complicated by the presence of missing covariate data in count outcome models. In this dissertation we have (1) compared the performance of different statistical models for dealing with overdispersion, (2) determined an appropriate threshold of the ratio of the Pearson chi-squared goodness of fit statistic to degrees of freedom σp such that statistical intervention is necessary to address the overdispersion, (3) developed a latent transition multiple imputation (LTMI) approach for dealing with missing time varying categorical covariates in count outcome models, and (4) compared the performance of LTMI with complete case analysis (CCA) and latent class multiple imputation (LCMI) in addressing missing time varying categorical covariates in the presence of overdispersion. Latent class assignment was determined via both SAS software and random effect modeling, and missing observation imputation was performed using predictive mean matching multiple imputation methods. We utilized extensive simulation studies to assess the performance of the proposed methods on a variety of overdispersion and missingness scenarios. We further demonstrated the application of the proposed models and methods via real data examples. We conclude that the negative binomial generalized linear mixed model (NB-GLMM) is superior overall for modeling count data characterized by overdispersion. Furthermore, a general threshold for relying on the simple Poisson model for cross-sectional and longitudinal datasets is in cases where σp <=1.2. LTMI methods outperform CCA and LCMI in many scenarios, particularly when there is a higher percentage of missingness and data are MAR. Lastly, NBGLMM is preferable to address overdispersion while LTMI is preferable for imputing covariate observations when jointly considering both issues.
Payne, Elizabeth Holly, "Statistical Methods for Modeling Count Data with Overdispersion and Missing Time Varying Categorical Covariates" (2016). MUSC Theses and Dissertations. 409.
All rights reserved. Copyright is held by the author.