Abstract
Polygenic risk scores (PRSs) are a promising approach to accurately predict an individual's risk of developing disease. The area under the receiver operating characteristic curve (AUC) of PRSs in their population are often only reported for models that are adjusted for age and sex, which are known risk factors for the disease of interest and confound the association between the PRS and the disease. This makes comparison of PRS between studies difficult because the genetic effects cannot be disentangled from effects of age and sex (which have a high AUC without the PRS). In this study, we used data from the UK Biobank and applied the stacked clumping and thresholding method and a variation called maximum clumping and thresholding method to develop PRSs to predict coronary artery disease, hypertension, atrial fibrillation, stroke and type 2 diabetes. We created case-control training datasets in which age and sex were controlled by design. We also excluded prevalent cases to prevent biased estimation of disease risks. The maximum clumping and thresholding PRSs required many fewer single-nucleotide polymorphisms to achieve almost the same discriminatory ability as the stacked clumping and thresholding PRSs. Using the testing datasets, the AUCs for the maximum clumping and thresholding PRSs were 0.599 (95% confidence interval [CI]: 0.585, 0.613) for atrial fibrillation, 0.572 (95% CI: 0.560, 0.584) for coronary artery disease, 0.585 (95% CI: 0.564, 0.605) for type 2 diabetes, 0.559 (95% CI: 0.550, 0.569) for hypertension and 0.514 (95% CI: 0.494, 0.535) for stroke. By developing a PRS using a dataset in which age and sex are controlled by design, we have obtained true estimates of the discriminatory ability of the PRSs alone rather than estimates that include the effects of age and sex.
7 Authors
- Chi Kuen Wong
- Enes Makalic
- Gillian S. Dite
- Lawrence Whiting
- Nicholas M. Murphy
- John L. Hopper
- Richard Allman
1 Application
Application ID | Title |
47401 | Validation of polygenic risk scores, based upon pooled discovery studies, for predisposition to major prevalent diseases affecting both quality-of-life and lifespan and testing for causation. |