Date of Award

Spring 3-24-2023

Embargo Period

4-1-2025

Document Type

Dissertation - MUSC Only

Degree Name

Doctor of Philosophy (PhD)

Department

Public Health Sciences

College

College of Graduate Studies

First Advisor

Valerie Durkalski

Second Advisor

Mulugeta Gebregziabher

Third Advisor

Patrick Mauldin

Fourth Advisor

Andrew Schreiner

Fifth Advisor

Bethany J. Wolf

Abstract

Datasets that are used in clinical research settings often contain repeated measures of multiple biomarkers of interest. Longitudinal analysis methods generally are designed for outcome variables that are also repeated measures, and many methods assume that data is balanced, i.e. collected at the same equally spaced timepoints for each individual. In many observational studies, however, this is often not the case. Datasets such as electronic health records are often large and complex, with a variety of lab values collected at often infrequent and unequally spaced times, and outcomes are sometimes a single distal endpoint. The values of some biomarkers might be subject to a high level of variation, due to acute conditions, normal fluctuations, or measurement error. In addition, there is often a correlation between variables that needs to be considered in a statistical analysis. Some existing methods fail to take this correlation into account and can also suffer drawbacks such as long computation times and problems with convergence. The goal of this dissertation is to develop methodology to address these issues that will be applicable in a variety of settings. In Aim 1, we have developed a method to smooth data from multiple continuous variables collected over time using multivariate tensor product smoothing splines, so that the correlation between these variables can be taken into account in the smoothing process. Smoothing creates a balanced dataset with reduced noise that we then examined for patterns using a robust fuzzy clustering algorithm. Our method resulted in better clustering accuracy than the use of individually smoothed variables. In Aim 2, we developed a method for the prediction of a binary distal outcome in a logistic model using fuzzy clusters as predictors, while taking into account the uncertainty in the clustering process. The third aim was the development of an R package to provide an accessible tool so that these methods can be easily applied in future research. In this work, we also examined the use of a composite score calculated from a number of collected biomarkers and compare it to the multivariate analysis using the individual biomarkers. Such a composite variable can be calculated either on observed values or on smoothed values of each biomarker and could offer a simpler implementation and interpretation in some instances. We apply our methods in two settings, where we analyze data collected for up to 7 days from acute liver failure (ALF) patients by the Acute Liver Failure Study Group (ALFSG; NCT005184400), and data collected in liver function tests and other lab work in a primary care setting at the Medical University of South Carolina. The methodology developed in this dissertation adds a valuable tool to the research field for utilizing multivariate longitudinal biomarker data when examining a distal binary outcome that can be applied in a variety of research settings.

Rights

Copyright is held by the author. All rights reserved.

Available for download on Tuesday, April 01, 2025

Share

COinS