: Publication 7057

Publication 7057

Title:	Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models
Journal:	PLOS ONE
Published:	31 Aug 2022
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/36044406/
DOI:	https://doi.org/10.1371/journal.pone.0273293
URL:	https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0273293&type=printable
Citations:	9 (9 in last 2 years) as of 8 Aug 2024

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

Genotype-to-phenotype prediction is a central problem of human genetics. In recent years, it has become possible to construct complex predictive models for phenotypes, thanks to the availability of large genome data sets as well as efficient and scalable machine learning tools. In this paper, we make a threefold contribution to this problem. First, we ask if state-of-the-art nonlinear predictive models, such as boosted decision trees, can be more efficient for phenotype prediction than conventional linear models. We find that this is indeed the case if model features include a sufficiently rich set of covariates, but probably not otherwise. Second, we ask if the conventional selection of single nucleotide polymorphisms (SNPs) by genome wide association studies (GWAS) can be replaced by a more efficient procedure, taking into account information in previously selected SNPs. We propose such a procedure, based on a sequential feature importance estimation with decision trees, and show that this approach indeed produced informative SNP sets that are much more compact than when selected with GWAS. Finally, we show that the highest prediction accuracy can ultimately be achieved by ensembling individual linear and nonlinear models. To the best of our knowledge, for some of the phenotypes that we consider (asthma, hypothyroidism), our results are a new state-of-the-art.

7 Keywords

Genome-Wide Association Study
Genotype
Humans
Models, Genetic
Nonlinear Dynamics
Phenotype
Polymorphism, Single Nucleotide

5 Authors

Aleksandr Medvedev
Satyarth Mishra Sharma
Evgenii Tsatsorin
Elena Nabieva
Dmitry Yarotsky

1 Application

Application ID	Title
43661	Predicting heritable phenotypic traits and conditions from genomic data using cutting-edge deep learning methods

Enabling scientific discoveries that improve human health