Abstract
Polygenic risk scores (PRS) have ushered in a new era in genetic epidemiology, offering insights into individual predispositions to a wide range of diseases. However, despite recent marked enhancements in predictive power, PRS-based models still need to overcome several hurdles before they can be broadly applied in the clinic. Chiefly, they need to achieve sufficient accuracy, easy interpretability and portability across diverse populations. Leveraging trans-ancestry genome-wide association study (GWAS) meta-analysis, we generated novel, diverse summary statistics for 30 medically-related traits and benchmarked the performance of six existing PRS algorithms using UK Biobank. We built an ensemble model using logistic regression to combine outputs of top-performing algorithms and validated it on the diverse eMERGE and PAGE MEC cohorts. It surpassed current state-of-the-art PRS models, with minimal performance drops in external cohorts, indicating good calibration. To enhance predictive accuracy for clinical application, we incorporated easily-accessible clinical characteristics such as age, gender, ancestry and risk factors, creating disease prediction models intended as prospective diagnostic tests, with easily interpretable positive or negative outcomes. After adding clinical characteristics, 12 out of 30 models surpassed 80% AUC. Further, 25 traits exceeded the diagnostic odds ratio (DOR) of five, and 19 traits exceeded DOR of 10 for all ancestry groups, indicating high predictive value. Our PRS model for coronary artery disease identified 55-80 times more true coronary events than rare pathogenic variant models, reinforcing its clinical potential. The polygenic component modulated the effect of high-risk rare variants, stressing the need to consider all genetic components in clinical settings. These findings show that newly developed PRS-based disease prediction models have sufficient accuracy and portability to warrant consideration of being used in the clinic.</p>