Date of Award
Fall 11-18-2024
Embargo Period
12-5-2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Public Health Sciences
College
College of Graduate Studies
First Advisor
Graham Warren
Second Advisor
Jihad Obeid
Third Advisor
Alexander Alekseyenko
Fourth Advisor
Aaron Masino
Fifth Advisor
Xia Jing
Abstract
The management of brain metastases and their associated complications, such as radiation necrosis, has long posed significant challenges in radiation oncology. Stereotactic radiosurgery (SRS) is a key treatment modality for patients with brain metastases, yet the complexity of metastatic disease and limitations in diagnostic coding systems, such as the International Classification of Diseases (ICD), often obstruct the accurate extraction of critical information from electronic health records (EHRs). This difficulty especially impacts the identification of primary tumor histology and the detection of post-treatment complications like radiation necrosis (RN), both of which are crucial for improving clinical outcomes and conducting robust observational research. Traditional methods, which rely on ICD-10 codes and manual chart reviews, tend to be inefficient and imprecise, limiting both the scale and timeliness of research.
This research aims to address these challenges by (1) developing advanced machine learning (ML) and natural language processing (NLP) algorithms to enhance the automated identification of primary tumor histologies in patients undergoing SRS, (2) creating an electronic phenotype for accurately detecting radiation necrosis in clinician notes, and (3) implementing and evaluating the RN electronic phenotype in a real world dataset showing the practical utility of the algorithm. Together these methods focus on improving data extraction and diagnostic precision of that extraction of clinical EHR data. Together, these aims contribute to a more streamlined, accurate framework for researching SRS for brain metastases and understanding treatment-related complications.
Aim 1 involved the creation of an ML and NLP algorithm to extract primary tumor histology from clinical records, overcoming the limitations of ICD-10-based classification systems. By accurately identifying the primary tumor type, this algorithm improved the precision and accuracy of identification of patient diagnosis in the EHR, enabling better patient stratification for observational research. We demonstrated that the NLP model greatly improved classification F1-score over ICD. In addition we demonstrated the ability to classify patients by subtypes further demonstrating improvement over ICD classification.
Aim 2 focused on developing an NLP model to detect instances of radiation necrosis in clinical notes. The algorithm analyzed unstructured clinical data, identifying key terms and patterns indicative of RN, facilitating RN detection from the EHR notes. After several failed attempts and making a more elaborate NLP model, we restricted the data to a more uniform corpus and used a simple model that successfully had an F1-score of 0.92 for classifying RN.
Aim 3 focused on a real world practical trial of the RN NLP model. Detailed are 3 separate attempts to improve the precision of the RN classifier while keeping the recall high. After the limit of the algorithm was reached, we investigated the remaining errors highlighting the difficult and often overlapping signs and symptoms. We demonstrated that the NLP algorithm reduced the volume of clinical notes requiring manual review by 90%. Although the notes identified by the algorithm were sometimes false positives, they frequently represented edge cases where the clinician was uncertain about the diagnosis. The NLP algorithm significantly outperformed traditional rule-based text searches, reducing the false positive rate and improving the efficiency of identifying potential RN cases for further review.
Through these specific aims, the study significantly improved the efficiency and accuracy of data extraction for large-scale research and clinical applications. It provided valuable insights into extracting electronic phenotypes radiation necrosis in SRS patients, which potentially may facilitate more informed patient care.
Recommended Citation
Fugal, Mario, "Natural Language Processing for Precise Identification of Primary Tumor Histology and Radiation Necrosis in Stereotactic Radiosurgery Patients" (2024). MUSC Theses and Dissertations. 979.
https://medica-musc.researchcommons.org/theses/979
Rights
Copyright is held by the author. All rights reserved.