: Publication 11092

Publication 11092

Title:	HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets
Journal:	Molecular Biology and Evolution
Published:	15 Feb 2023
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/36790822/
DOI:	https://doi.org/10.1093/molbev/msad027
URL:	https://academic.oup.com/mbe/advance-article-pdf/doi/10.1093/molbev/msad027/49200298/msad027.pdf
Citations:	4 (4 in last 2 years) as of 8 Aug 2024

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

Genomic regions under positive selection harbor variation linked for example to adaptation. Most tools for detecting positively selected variants have computational resource requirements rendering them impractical on population genomic datasets with hundreds of thousands of individuals or more. We have developed and implemented an efficient haplotype-based approach able to scan large datasets and accurately detect positive selection. We achieve this by combining a pattern matching approach based on the positional Burrows-Wheeler transform with model-based inference which only requires the evaluation of closed-form expressions. We evaluate our approach with simulations, and find it to be both sensitive and specific. The computational resource requirements quantified using UK Biobank data indicate that our implementation is scalable to population genomic datasets with millions of individuals. Our approach may serve as an algorithmic blueprint for the era of "big data" genomics: a combinatorial core coupled with statistical inference in closed form.</p>

5 Keywords

Genetics, Population
Genome
Genomics
Haplotypes
Metagenomics

10 Authors

Benedikt Kirsch-Gerweck
Leonard Bohnenkämper
Michel T Henrichs
Jarno N Alanko
Hideo Bannai
Bastien Cazaux
Pierre Peterlongo
Joachim Burger
Jens Stoye
Yoan Diekmann

1 Application

Application ID	Title
63023	Efficient whole-genome scans for positive selection in large population genomic datasets

Enabling scientific discoveries that improve human health