MUSC Theses and Dissertations

Statistical Methods for Modeling Count Data with Overdispersion and Missing Time Varying Categorical Covariates

Elizabeth Holly Payne, Medical University of South Carolina

Date of Award

2016

Embargo Period

8-1-2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Public Health Sciences

College

College of Graduate Studies

First Advisor

Mulugeta Gebregziabher

Second Advisor

Leonard Egede

Third Advisor

James Hardin

Fourth Advisor

Viswanathan Ramakrishnan

Fifth Advisor

Anbesaw Selassie

Abstract

In studying the association between count outcomes and covariates using Poisson regression, the necessary requirement that the mean and variance of responses are equivalent for each covariate pattern is not always met in real datasets. This violation of equidispersion can lead to invalid inference unless proper alternative models are considered. There is currently no comprehensive and definitive assessment of the different methods of dealing with overdispersion, nor is there a standard approach for determining the threshold of overdispersion such that statistical intervention is necessary. The issue of overdispersion can be further complicated by the presence of missing covariate data in count outcome models. In this dissertation we have (1) compared the performance of different statistical models for dealing with overdispersion, (2) determined an appropriate threshold of the ratio of the Pearson chi-squared goodness of fit statistic to degrees of freedom σp such that statistical intervention is necessary to address the overdispersion, (3) developed a latent transition multiple imputation (LTMI) approach for dealing with missing time varying categorical covariates in count outcome models, and (4) compared the performance of LTMI with complete case analysis (CCA) and latent class multiple imputation (LCMI) in addressing missing time varying categorical covariates in the presence of overdispersion. Latent class assignment was determined via both SAS software and random effect modeling, and missing observation imputation was performed using predictive mean matching multiple imputation methods. We utilized extensive simulation studies to assess the performance of the proposed methods on a variety of overdispersion and missingness scenarios. We further demonstrated the application of the proposed models and methods via real data examples. We conclude that the negative binomial generalized linear mixed model (NB-GLMM) is superior overall for modeling count data characterized by overdispersion. Furthermore, a general threshold for relying on the simple Poisson model for cross-sectional and longitudinal datasets is in cases where σp <=1.2. LTMI methods outperform CCA and LCMI in many scenarios, particularly when there is a higher percentage of missingness and data are MAR. Lastly, NBGLMM is preferable to address overdispersion while LTMI is preferable for imputing covariate observations when jointly considering both issues.

Recommended Citation

Payne, Elizabeth Holly, "Statistical Methods for Modeling Count Data with Overdispersion and Missing Time Varying Categorical Covariates" (2016). MUSC Theses and Dissertations. 409.
https://medica-musc.researchcommons.org/theses/409

Rights

Download

COinS

MUSC Theses and Dissertations

Statistical Methods for Modeling Count Data with Overdispersion and Missing Time Varying Categorical Covariates

Date of Award

Embargo Period

Document Type

Degree Name

Department

College

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Rights

Browse

Search

Author Corner

MUSC Theses and Dissertations

Statistical Methods for Modeling Count Data with Overdispersion and Missing Time Varying Categorical Covariates

Author

Date of Award

Embargo Period

Document Type

Degree Name

Department

College

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Rights

Share

Browse

Search

Author Corner