Genomic prediction of complex human traits (e.g., height, cognitive ability, bone density) and disease risks (e.g., breast cancer, diabetes, heart disease, atrial fibrillation) has advanced considerably in recent years. Using data from the UK Biobank, predictors have been constructed using penalized algorithms that favor sparsity: i.e., which use as few genetic variants as possible. We analyze the specific genetic variants (SNPs) utilized in these predictors, which can vary from dozens to as many as thirty thousand. We find that the fraction of SNPs in or near genic regions varies widely by phenotype. For the majority of disease conditions studied, a large amount of the variance is accounted for by SNPs outside of coding regions. The state of these SNPs cannot be determined from exome-sequencing data. This suggests that exome data alone will miss much of the heritability for these traits - i.e., existing PRS cannot be computed from exome data alone. We also study the fraction of SNPs and of variance that is in common between pairs of predictors. The DNA regions used in disease risk predictors so far constructed seem to be largely disjoint (with a few interesting exceptions), suggesting that individual genetic disease risks are largely uncorrelated. It seems possible in theory for an individual to be a low-risk outlier in all conditions simultaneously.
Compressed sensing and high-dimensional statistical methods in complex trait genomics
Our goal is to test new computational methods for determining the genetic architecture of complex traits, including highly heritable conditions such as Type 1 Diabetes, Alzheimer's, and others. The techniques we plan to use have been the subject of intense recent activity in fields such as optimization, signal processing and machine learning, but so far have just begun to be applied in genomics. The research will produce improved predictive models which, based on individual genomics, identify individuals at high risk for certain diseases. It will also identify the many alleles associated with this risk. Early intervention with high risk individuals may decrease rates of incidence and reduce health care costs. Elaboration of underlying genetic architecture is important basic science and may lead to improved treatments (e.g., drug development). We wish to obtain access to genomic data and phenotype data relevant to highly heritable disease conditions (e.g., Type 1 Diabetes) as well as complex traits such as height, BMI, cognitive ability. Advanced computational algorithms will be used to study the genetic architecture of these traits. The techniques we plan to use have been the subject of intense recent activity in fields such as optimization, signal processing and machine learning, but so far have just begun to be applied in genomics. Analysis will be performed on high-performance computing clusters. We would like access to the full cohort (SNP genotypes), and several relevant phenotypes.
|Lead investigator:||Stephen Hsu|
|Lead institution:||Michigan State University|