: Publication 12966

Publication 12966

Title:	Fast and scalable ensemble learning method for versatile polygenic risk prediction
Journal:	Proceedings of the National Academy of Sciences of the United States of America
Published:	7 Aug 2024
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/39110727/
DOI:	https://doi.org/10.1073/pnas.2403210121
URL:	https://www.ncbi.nlm.nih.gov/pmc/articles/11331062

Abstract

Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary-level data (ALL-Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL-Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large-scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL-Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20-fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL-Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL-Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state-of-the-art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL-Sum is available as a user-friendly R software package with publicly available reference data for streamlined analysis.</p>

Application ID	Title
52008	Development of statistical methods to discover novel genetic associations, explain underlying biological mechanisms, and develop risk prediction models across varied complex diseases.

Application ID

Title

52008

Development of statistical methods to discover novel genetic associations, explain underlying biological mechanisms, and develop risk prediction models across varied complex diseases.

Abstract

6 Keywords

4 Authors

1 Application