: Publication 7092

Publication 7092

Title:	Haplotype and population structure inference using neural networks in whole-genome sequencing data
Journal:	Genome Research
Published:	6 Jul 2022
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/35794006/
DOI:	https://doi.org/10.1101/gr.276813.122
URL:	https://www.ncbi.nlm.nih.gov/pmc/articles/9435741
Citations:	13 (11 in last 2 years) as of 8 Aug 2024

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.</p>

7 Keywords

Genetics, Population
Genome, Human
Haplotypes
Humans
Neural Networks, Computer
Principal Component Analysis
Whole Genome Sequencing

2 Authors

Jonas Meisner
Anders Albrechtsen

1 Application

Application ID	Title
32683	Combined effect of the genetic and lifestyle determinants of metabolic syndrome on cardiometabolic risk

Enabling scientific discoveries that improve human health