Date of Award
Spring 1-24-2025
Embargo Period
3-6-2025
Document Type
Thesis
Degree Name
Doctor of Philosophy (PhD)
Department
Cell and Molecular Biology and Pathobiology
Additional Department
Oral Health Sciences
College
College of Graduate Studies
First Advisor
Alexander Alekseyenko
Abstract
Healthcare is poised to comprise over a third of the market for big data by 2032, making it the dominant sector in this field (Insights 2024). This growth is driven by the increasing volume and variety of data sources, including electronic health records, payer records, research studies, personal health devices, and smartphones. The integration of such data holds transformative potential, enabling earlier disease diagnosis through biomarker identification, advancing preventive care by pinpointing risk factors, and enhancing hospital quality standards, among other benefits. Despite these promising applications, significant challenges arise in processing and analyzing high-dimensional biomedical data. This dissertation focuses on utilizing high-dimensional diffusion drift methods to address these challenges, with an emphasis on data derived from research studies and electronic health records. By developing and applying these advanced analytical approaches, this work aims to uncover actionable insights, demonstrate the utility of high-dimensional methods in handling complex healthcare datasets, and contribute to innovative applications in improving healthcare outcomes.
This initial segment establishes a robust framework for analyzing high- dimensional data, with a primary focus on univariate and multivariate analytical methods. This framework is applied to evaluate the multi-omic profiles of patient-derived samples in relation to three distinct diseases: lupus, oral mucositis, and Fanconi anemia (Chapters 2 - 4). Particular emphasis is placed on the utility of distance-based multivariate analyses for characterizing disease states within high-dimensional datasets.
Succeeding sections of this study detail the derivation of key estimates under the MD3F framework, accompanied by a simulation study to evaluate the type I error rate and statistical power characteristics of the proposed approach (Chapter 5). Subsequently, we apply this method to three distinct microbiome datasets and one electronic health record dataset (Chapter 6). The findings from these applications demonstrate the utility of MD3F in elucidating multivariate trajectories within high-dimensional patient data and capturing variability between patient groups or at the individual level. Overall, our results indicate MD3F as a robust framework for analyzing high-dimensional longitudinal datasets; however, further refinement and characterization are needed to enhance its applicability to biological contexts in future studies.
The following portion of this study explores the development of UNAFIED-8, a logistic regression model designed to predict a 2-year risk score for atrial fibrillation. The results demonstrate the model effectiveness of atrial fibrillation risk prediction and exhibits generalizability by utilizing readily available electronic health record data. These findings highlight the potential of electronic health record data to support and enhance clinical care.
Concluding chapters discern future directions for this work, emphasizing the translational implications for the studies discussed herein and their potential to drive innovative healthcare outcomes. By identifying key avenues for further research, these chapters aim to highlight how the methodologies and findings can be adapted and implemented to improve clinical decision-making, patient outcomes, and overall healthcare delivery.
Recommended Citation
Zielinski, Jessica, "Comprehensive Analysis of High-Dimensional, Longitudinal Multi-Modal Molecular and Electronic Health Record Data" (2025). MUSC Theses and Dissertations. 1023.
https://medica-musc.researchcommons.org/theses/1023
Rights
Copyright is held by the author. All rights reserved.