Date of Award
Spring 3-24-2023
Embargo Period
4-1-2025
Document Type
Dissertation - MUSC Only
Degree Name
Doctor of Philosophy (PhD)
Department
Public Health Sciences
College
College of Graduate Studies
First Advisor
Valerie Durkalski
Second Advisor
Mulugeta Gebregziabher
Third Advisor
Patrick Mauldin
Fourth Advisor
Andrew Schreiner
Fifth Advisor
Bethany J. Wolf
Abstract
Datasets that are used in clinical research settings often contain repeated measures of multiple biomarkers of interest. Longitudinal analysis methods generally are designed for outcome variables that are also repeated measures, and many methods assume that data is balanced, i.e. collected at the same equally spaced timepoints for each individual. In many observational studies, however, this is often not the case. Datasets such as electronic health records are often large and complex, with a variety of lab values collected at often infrequent and unequally spaced times, and outcomes are sometimes a single distal endpoint. The values of some biomarkers might be subject to a high level of variation, due to acute conditions, normal fluctuations, or measurement error. In addition, there is often a correlation between variables that needs to be considered in a statistical analysis. Some existing methods fail to take this correlation into account and can also suffer drawbacks such as long computation times and problems with convergence. The goal of this dissertation is to develop methodology to address these issues that will be applicable in a variety of settings. In Aim 1, we have developed a method to smooth data from multiple continuous variables collected over time using multivariate tensor product smoothing splines, so that the correlation between these variables can be taken into account in the smoothing process. Smoothing creates a balanced dataset with reduced noise that we then examined for patterns using a robust fuzzy clustering algorithm. Our method resulted in better clustering accuracy than the use of individually smoothed variables. In Aim 2, we developed a method for the prediction of a binary distal outcome in a logistic model using fuzzy clusters as predictors, while taking into account the uncertainty in the clustering process. The third aim was the development of an R package to provide an accessible tool so that these methods can be easily applied in future research. In this work, we also examined the use of a composite score calculated from a number of collected biomarkers and compare it to the multivariate analysis using the individual biomarkers. Such a composite variable can be calculated either on observed values or on smoothed values of each biomarker and could offer a simpler implementation and interpretation in some instances. We apply our methods in two settings, where we analyze data collected for up to 7 days from acute liver failure (ALF) patients by the Acute Liver Failure Study Group (ALFSG; NCT005184400), and data collected in liver function tests and other lab work in a primary care setting at the Medical University of South Carolina. The methodology developed in this dissertation adds a valuable tool to the research field for utilizing multivariate longitudinal biomarker data when examining a distal binary outcome that can be applied in a variety of research settings.
Recommended Citation
Livingston, Sherry Irene, "Multivariate Longitudinal Prognostic Factors: Improving Prediction and Association Modeling with a Single Binary Outcome Using Smoothing Splines and Composite Variables" (2023). MUSC Theses and Dissertations. 767.
https://medica-musc.researchcommons.org/theses/767
Rights
Copyright is held by the author. All rights reserved.