Abstract
Polygenic risk scores (PRSs) hold promise in their potential translation into clinical settings to improve disease risk prediction. An important consideration in integrating PRSs into clinical settings is to gain an understanding of how to identify which subpopulations of individuals most benefit from PRSs for risk prediction. In this study, using the UK Biobank dataset, we trained logistic regression models to predict the 10 year incident risk of myocardial infarction, breast cancer, and schizophrenia using either just clinical features or clinical features combined with PRSs. For each disease, we identified the top 10% subgroup with the greatest magnitude of improvement in risk prediction accuracy attributed to PRSs in the multi-modal model. Using up to ~ 3.6 k demographic, lifestyle, diagnostic, lab, and physical measurement features from the UK Biobank dataset of ~ 500 k individuals, we characterized these subgroups based on various clinical, lifestyle, and demographic characteristics. The incident cases in the top 10% subgroup for each disease represent distinct phenotypes that differ from other cases and that are strongly correlated with genetic predisposition. Our findings provide insights into disease subtypes and can encourage future studies aimed at classifying these individuals to enhance the targeting of polygenic risk scoring in practice.</p>