: Publication 14684

Publication 14684

Title:	Fast analysis of biobank-size data and meta-analysis using the BGLR R-package
Journal:	G3: Genes, Genomes, Genetics
Published:	9 Dec 2024
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/39657738/
DOI:	https://doi.org/10.1093/g3journal/jkae288
URL:	https://academic.oup.com/g3journal/advance-article-pdf/doi/10.1093/g3journal/jkae288/61000293/jkae288.pdf

Abstract

Analyzing human genomic data from biobanks and large-scale genetic evaluations often requires fitting models with a sample size exceeding the number of DNA markers used (n>p). For instance, developing polygenic scores for humans and genomic prediction for genetic evaluations of agricultural species may require fitting models involving a few thousand SNPs using data with hundreds of thousands of samples. In such cases, computations based on sufficient statistics are more efficient than those based on individual genotype-phenotype data. Additionally, software that admits sufficient statistics as inputs can be used to analyze data from multiple sources jointly without the need to share individual genotype-phenotype data. Therefore, we developed functionality within the BGLR R-package that generates posterior samples for Bayesian shrinkage and variable selection models from sufficient statistics. In this article, we present an overview of the new methods incorporated in the BGLR R-package, demonstrate the use of the new software through simple examples, provide several computational benchmarks, and present a real-data example using data from the UK-Biobank, All of Us, and the Hispanic Community Health Study/Study of Latinos cohort demonstrating how a joint analysis from multiple cohorts can be implemented without sharing individual genotype-phenotype data, and how a combined analysis can improve the prediction accuracy of polygenic scores for Hispanics-a group severely under-represented in genome-wide association studies data.</p>

Abstract

8 Keywords

5 Authors