: Publication 4809

Publication 4809

Title:	Evaluation of polygenic prediction methodology within a reference-standardized framework
Journal:	PLOS Genetics
Published:	4 May 2021
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/33945532/
DOI:	https://doi.org/10.1371/journal.pgen.1009021
Citations:	125 (59 in last 2 years) as of 8 Aug 2024

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16-18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.</p>

13 Keywords

Computer Simulation
Datasets as Topic
Genome-Wide Association Study
Genotype
Humans
Models, Genetic
Multifactorial Inheritance
Polymorphism, Single Nucleotide
Precision Medicine
Reproducibility of Results
Twin Studies as Topic
Twins
United Kingdom

12 Authors

Oliver Pain
Kylie P. Glanville
Saskia P. Hagenaars
Saskia Selzam
Anna E. Fürtjes
Héléna A. Gaspar
Jonathan R. I. Coleman
Kaili Rimfeld
Gerome Breen
Robert Plomin
Lasse Folkersen
Cathryn M. Lewis

1 Application

Application ID	Title
18177	Multi-trait GWAS analyses in the UK Biobank

Enabling scientific discoveries that improve human health