Date of Award

2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Public Health Sciences

College

College of Graduate Studies

First Advisor

Bethany J. Wolf

Second Advisor

Dongjun Chung

Third Advisor

Andrew Lawson

Fourth Advisor

Paula S. Ramos

Fifth Advisor

Kelly J. Hunt

Sixth Advisor

Hang J. Kim

Abstract

Genome-wide association studies (GWAS) have successfully identified over two hundred thousand trait risk-associated genetic variants; however, several challenges remain. First, a complex trait is associated with many single nucleotide polymorphisms (SNPs), each with small or moderate effect sizes that are hard to detect with limited sample size due to a phenomenon called polygenicity. Additionally, currently available statistical methods are limited in explaining the functional mechanisms through which genetic variants are associated with complex traits. In the first dissertation aim, we address these challenges by proposing a statistical approach called GPA-Tree. GPA-Tree integratesGWAS summary statistics and functional annotation information for a single trait within a unified framework. Specifically, by combining a decision tree algorithm with a hierarchical modeling framework, GPA-Tree simultaneously implements association mapping and identifies key combinations of functional annotations related to the trait risk-associated SNPs. We evaluate the proposed GPA-Tree approach using simulation studies and demonstrate that, in most scenarios, GPA-Tree shows greater area under the curve (AUC) and power relative to existing statistical approaches in detecting risk-associated SNPs and greater accuracy in identifying the true combinations of functional annotations. We applied GPA-Tree to a systemic lupus erythematosus (SLE) GWAS and functional annotation data including GenoSkyline and GenoSkylinePlus. The results from GPA-Tree highlight the dysregulation of blood immune cells, including but not limited to primary B, memory helper T, regulatory T, neutrophils and CD8+ memory T cells. The second dissertation aim exploits the phenomenon called pleiotropy, shared genetic basis among multiple traits, to improve statistical power to detect SNPs associated with one or more traits. We extend GPA-Tree to develop Multi-GPA-Tree so that GWAS summary statistics for multiple traits and functional annotation information can be integrated within a unified framework. Specifically, by combining a multivariate decision tree algorithm with a hierarchical modeling framework, Multi-GPA-Tree simultaneously implements association mapping and identifies key combinations of functional annotations related to the SNPs associated with one or more traits. We evaluate the proposed Multi-GPA-Tree approach using simulation studies and demonstrate that, in most scenarios, Multi-GPA-Tree outperforms existing statistical approaches in detecting SNPs associated with one or more traits and identifying the true combinations of functional annotations with high accuracy. We utilize Multi-GPA-Tree to integrate GWAS from two rheumatic diseases, SLE and Rheumatoid Arthritis (RA), and GWAS from two inflammatory bowel diseases, Crohn’s trait (CD) and ulcerative colitis (UC), with GenoSkyline and GenoSkylinePlus annotations. The results from Multi-GPA-Tree highlight the dysregulation of blood immune cells for both joint analysis, including dysregulation of primary B cells for SLE and RA, and dysregulation of primary T regulatory cells for UC and CD. In the third dissertation aim, we develop the R package GPATree and the R Shiny app ShinyGPATree. The R package and Shiny app facilitate users’ convenience and make the GPA-Tree and Multi-GPA-Tree approach easily accessible. The package includes an example data and a vignette to facilitate seamless step-by-step implementation of the proposed methods. In addition, the Shiny app allows interactive and dynamic investigation of association mapping results and functional annotation trees.

Rights

All rights reserved. Copyright is held by the author.

Share

COinS