Abstract
The advent of large biobanks has substantially increased the accuracy of polygenic scores (PGS). However, most existing PGSs were derived from European-ancestry data and often exhibit reduced predictive performance when applied to individuals of non-European ancestries. Transfer Learning offers a promising strategy to address this limitation by leveraging information learned in one population to improve prediction in another. Here, we introduce GPTL, an R package that implements three Transfer Learning based approaches for developing PGS: (1) gradient descent with early stopping, (2) a penalized regression model that shrinks variant-effect estimates toward prior values, and (3) a Bayesian method with a finite-mixture prior that enables integration of multiple prior sources of information. Using both simulated data and real data from the UK-Biobank and All of Us, we demonstrate that PGS generated with GPTL's Transfer Learning algorithms consistently outperform single-ancestry PGS and, in many settings, match or exceed the performance of multi-ancestry ensemble-based PGS. Our software can be used with either individual genotype-phenotype data or summary statistics from genome-wide association studies.</p>