Date of Award

2017

Embargo Period

8-1-2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Public Health Sciences

College

College of Graduate Studies

First Advisor

Mulugeta Gebregziabher

Second Advisor

Leonard E. Egede

Third Advisor

Viswanathan Ramakrishnan

Fourth Advisor

Lewis J. Frey

Fifth Advisor

Robert Neal Axon

Abstract

Healthcare outcomes research based on administrative data is frequently hindered by two important challenges: (1) accurate adjustment for disease burden and (2) effective management of missing data in key variables. Standard approaches exist for both problems, but these may contribute to biased results. For example, several well- established summary measures are used to adjust for disease burden, often without consideration for whether other methods could perform this task more accurately. Similarly, observations with missing values are often arbitrarily excluded, or the values are imputed without regard for the involved assumptions. Despite recent substantial gains in computing power, statistical approaches and machine learning methods, no comprehensive effort has been made to develop an improved comorbidity index based on predictive performance comparisons of competing approaches. Similarly, recently developed machine learning approaches have shown promise in addressing missing data problems, but these have not been compared with parametric methods via a rigorous simulation study using large-dimensional data with the complete range of missingness types. This makes it difficult to assess the relative merits of each procedure. This work accomplished three broad aims: (1) Improved models for summarizing disease burden were developed by comparing the predictive performance of a wide variety of statistical and machine learning methods. (2) A new comorbidity summary score for predicting five-year mortality was developed. (3) A comprehensive comparison of machine learning and model-based multiple imputation methods was completed, both in simulations and through an application to real data. Several sensitivity analyses were also examined for variables with missing not at random (MNAR) missingness. This work successfully demonstrated several new approaches for summarizing disease burden. Each of the competing disease burden models in the first aim and the summary score from the second aim had superior predictive performance when compared to the Elixhauser index, a commonly-used summary measure. This research also led to new applications for applying machine learning methods within the multiple imputation with chained equations (MICE) framework. Additionally, several MNAR sensitivity methods were adapted and applied to demonstrate that unbiased inference under MNAR may not be possible in some situations, even when the missingness mechanism is fully understood.

Rights

All rights reserved. Copyright is held by the author.

Share

COinS