Abstract
BackgroundPolygenic risk scores (PRS) quantify an individual's genetic predisposition for different traits and are expected to play an increasingly important role in personalized medicine. A crucial challenge in clinical practice is the generalizability and transferability of PRS models to populations with different ancestries. When assessing the generalizability of PRS models for continuous traits, the R2$$R^2$$ is a commonly used measure to evaluate prediction accuracy. While the R2$$R^2$$ is a well-defined goodness-of-fit measure for statistical linear models, there exist different definitions for its application on test data, which complicates interpretation and comparison of results.MethodsBased on large-scale genotype data from the UK Biobank, we compare three definitions of the R2$$R^2$$ on test data for evaluating the generalizability of PRS models to different populations. Polygenic models for several phenotypes, including height, BMI and lipoprotein A, are derived based on training data with European ancestry using state-of-the-art regression methods and are evaluated on various test populations with different ancestries.ResultsOur analysis shows that the choice of the R2$$R^2$$ definition can lead to considerably different results on test data, making the comparison of R2$$R^2$$ values from the literature problematic. While the definition as the squared correlation between predicted and observed phenotypes solely addresses the discriminative performance and always yields values between 0 and 1, definitions of the R2$$R^2$$ based on the mean squared prediction error (MSPE) with reference to intercept-only models assess both discrimination and calibration. These MSPE-based definitions can yield negative values indicating miscalibrated predictions for out-of-target populations. We argue that the choice of the most appropriate definition depends on the aim of PRS analysis - whether it primarily serves for risk stratification or also for individual phenotype prediction. Moreover, both correlation-based and MSPE-based definitions of R2$$R^2$$ can provide valuable complementary information.ConclusionsAwareness of the different definitions of the R2$$R^2$$ on test data is necessary to facilitate the reporting and interpretation of results on PRS generalizability. It is recommended to explicitly state which definition was used when reporting R2$$R^2$$ values on test data. Further research is warranted to develop and evaluate well-calibrated polygenic models for diverse populations.</p>