| Title: | Efficient blockLASSO for polygenic scores with applications to All of Us and UK Biobank |
| Journal: | BMC Genomics |
| Published: | 27 Mar 2025 |
| Pubmed: | https://pubmed.ncbi.nlm.nih.gov/40148775/ |
| DOI: | https://doi.org/10.1186/s12864-025-11505-0 |
| Title: | Efficient blockLASSO for polygenic scores with applications to All of Us and UK Biobank |
| Journal: | BMC Genomics |
| Published: | 27 Mar 2025 |
| Pubmed: | https://pubmed.ncbi.nlm.nih.gov/40148775/ |
| DOI: | https://doi.org/10.1186/s12864-025-11505-0 |
WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.
We develop a "block" LASSO (blockLASSO) approach for training polygenic scores (PGS) and demonstrate its use in All of Us (AoU) and the UK Biobank (UKB). blockLASSO utilizes the approximate block diagonal structure (due to chromosomal partition of the genome) of linkage disequilibrium (LD). The new implementation can be used for exploratory and methods research where repeated PGS training is necessary and expensive. For 11 different phenotypes, in two different biobanks, and across 5 different ancestry groups (African, American, East Asian, European, and South Asian) - we demonstrate that blockLASSO is generally as effective for training PGS as a (global) LASSO. Previous work has shown penalized regression methods produce competitive PGS to alternative approaches. It has been shown that some phenotypes are more/less polygenic than others. Using sparse algorithms, an accurate PGS can be trained for type 1 diabetes (T1D) using ∼100$${\sim }100$$ single nucleotide variants (SNVs), but a PGS for body mass index (BMI) would need more than 10k SNVs. blockLASSO produces similar PGS for phenotypes while training with just a fraction of the variants per block. Within AoU (using only genetic information) block PGS for T1D reaches an AUC of 0.63±0.02$$0.63_{\pm 0.02}$$ and for BMI a correlation of 0.21±0.01$$0.21_{\pm 0.01}$$, whereas a global LASSO approach which finds for T1D an AUC 0.65±0.03$$0.65_{\pm 0.03}$$ and BMI a correlation 0.19±0.03$$0.19_{\pm 0.03}$$. This new block approach is more computationally efficient and scalable than naive global machine learning approaches and makes it ideal for exploratory methods investigations based on penalized regression.</p>
| Application ID | Title |
|---|---|
| 15326 | Compressed Sensing and high-dimensional statistical methods in complex trait genomics |
Enabling scientific discoveries that improve human health