We test 26 polygenic predictors using tens of thousands of genetic siblings from the UK Biobank (UKB), for whom we have SNP genotypes, health status, and phenotype information in late adulthood. Siblings have typically experienced similar environments during childhood, and exhibit negligible population stratification relative to each other. Therefore, the ability to predict differences in disease risk or complex trait values between siblings is a strong test of genomic prediction in humans. We compare validation results obtained using non-sibling subjects to those obtained among siblings and find that typically most of the predictive power persists in between-sibling designs. In the case of disease risk we test the extent to which higher polygenic risk score (PRS) identifies the affected sibling, and also compute Relative Risk Reduction as a function of risk score threshold. For quantitative traits we examine between-sibling differences in trait values as a function of predicted differences, and compare to performance in non-sibling pairs. Example results: Given 1 sibling with normal-range PRS score (< 84 percentile, < + 1 SD) and 1 sibling with high PRS score (top few percentiles, i.e. > + 2 SD), the predictors identify the affected sibling about 70-90% of the time across a variety of disease conditions, including Breast Cancer, Heart Attack, Diabetes, etc. 55-65% of the time the higher PRS sibling is the case. For quantitative traits such as height, the predictor correctly identifies the taller sibling roughly 80 percent of the time when the (male) height difference is 2 inches or more.
Compressed sensing and high-dimensional statistical methods in complex trait genomics
Our goal is to test new computational methods for determining the genetic architecture of complex traits, including highly heritable conditions such as Type 1 Diabetes, Alzheimer's, and others. The techniques we plan to use have been the subject of intense recent activity in fields such as optimization, signal processing and machine learning, but so far have just begun to be applied in genomics. The research will produce improved predictive models which, based on individual genomics, identify individuals at high risk for certain diseases. It will also identify the many alleles associated with this risk. Early intervention with high risk individuals may decrease rates of incidence and reduce health care costs. Elaboration of underlying genetic architecture is important basic science and may lead to improved treatments (e.g., drug development). We wish to obtain access to genomic data and phenotype data relevant to highly heritable disease conditions (e.g., Type 1 Diabetes) as well as complex traits such as height, BMI, cognitive ability. Advanced computational algorithms will be used to study the genetic architecture of these traits. The techniques we plan to use have been the subject of intense recent activity in fields such as optimization, signal processing and machine learning, but so far have just begun to be applied in genomics. Analysis will be performed on high-performance computing clusters. We would like access to the full cohort (SNP genotypes), and several relevant phenotypes.
|Lead investigator:||Stephen Hsu|
|Lead institution:||Michigan State University|