Date of Award
2021
Embargo Period
8-1-2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Public Health Sciences
College
College of Graduate Studies
First Advisor
Bethany J. Wolf
Second Advisor
Dongjun Chung
Third Advisor
Andrew Lawson
Fourth Advisor
Paula S. Ramos
Fifth Advisor
Kelly J. Hunt
Sixth Advisor
Hang J. Kim
Abstract
Genome-wide association studies (GWAS) have successfully identified over two hundred thousand trait risk-associated genetic variants; however, several challenges remain. First, a complex trait is associated with many single nucleotide polymorphisms (SNPs), each with small or moderate effect sizes that are hard to detect with limited sample size due to a phenomenon called polygenicity. Additionally, currently available statistical methods are limited in explaining the functional mechanisms through which genetic variants are associated with complex traits. In the first dissertation aim, we address these challenges by proposing a statistical approach called GPA-Tree. GPA-Tree integratesGWAS summary statistics and functional annotation information for a single trait within a unified framework. Specifically, by combining a decision tree algorithm with a hierarchical modeling framework, GPA-Tree simultaneously implements association mapping and identifies key combinations of functional annotations related to the trait risk-associated SNPs. We evaluate the proposed GPA-Tree approach using simulation studies and demonstrate that, in most scenarios, GPA-Tree shows greater area under the curve (AUC) and power relative to existing statistical approaches in detecting risk-associated SNPs and greater accuracy in identifying the true combinations of functional annotations. We applied GPA-Tree to a systemic lupus erythematosus (SLE) GWAS and functional annotation data including GenoSkyline and GenoSkylinePlus. The results from GPA-Tree highlight the dysregulation of blood immune cells, including but not limited to primary B, memory helper T, regulatory T, neutrophils and CD8+ memory T cells. The second dissertation aim exploits the phenomenon called pleiotropy, shared genetic basis among multiple traits, to improve statistical power to detect SNPs associated with one or more traits. We extend GPA-Tree to develop Multi-GPA-Tree so that GWAS summary statistics for multiple traits and functional annotation information can be integrated within a unified framework. Specifically, by combining a multivariate decision tree algorithm with a hierarchical modeling framework, Multi-GPA-Tree simultaneously implements association mapping and identifies key combinations of functional annotations related to the SNPs associated with one or more traits. We evaluate the proposed Multi-GPA-Tree approach using simulation studies and demonstrate that, in most scenarios, Multi-GPA-Tree outperforms existing statistical approaches in detecting SNPs associated with one or more traits and identifying the true combinations of functional annotations with high accuracy. We utilize Multi-GPA-Tree to integrate GWAS from two rheumatic diseases, SLE and Rheumatoid Arthritis (RA), and GWAS from two inflammatory bowel diseases, Crohn’s trait (CD) and ulcerative colitis (UC), with GenoSkyline and GenoSkylinePlus annotations. The results from Multi-GPA-Tree highlight the dysregulation of blood immune cells for both joint analysis, including dysregulation of primary B cells for SLE and RA, and dysregulation of primary T regulatory cells for UC and CD. In the third dissertation aim, we develop the R package GPATree and the R Shiny app ShinyGPATree. The R package and Shiny app facilitate users’ convenience and make the GPA-Tree and Multi-GPA-Tree approach easily accessible. The package includes an example data and a vignette to facilitate seamless step-by-step implementation of the proposed methods. In addition, the Shiny app allows interactive and dynamic investigation of association mapping results and functional annotation trees.
Recommended Citation
Khatiwada, Aastha, "Statistical Approaches for Functional Annotation Tree Guided Prioritization of Genome-wide Association Studies (GWAS) Results" (2021). MUSC Theses and Dissertations. 644.
https://medica-musc.researchcommons.org/theses/644
Rights
All rights reserved. Copyright is held by the author.