: Publication 16040

Publication 16040

Title:	Multi-domain rule-based phenotyping algorithms enable improved GWAS signal
Journal:	npj Digital Medicine
Published:	2 Aug 2025
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/40753274/
DOI:	https://doi.org/10.1038/s41746-025-01815-8
URL:	https://www.nature.com/articles/s41746-025-01815-8.pdf

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

Biobanks are a rich source of data for genome-wide association studies (GWAS). They store clinical data from electronic health records, with data domains such as laboratory measurements, conditions, and self-reported diagnoses. Traditionally, biobank GWAS utilize case-control cohorts built exclusively from conditions. However, because reported conditions are primarily collected for billing purposes, they face data quality issues. Consequently, incorporating additional data domains in cohort construction can improve cohort accuracy and GWAS results. Here, we assess the impact of various rule-based phenotyping algorithms on GWAS outcomes, examining factors such as power, heritability, replicability, functional annotations, and polygenic risk score prediction accuracy across seven diseases in the UK Biobank. We find that high complexity phenotyping algorithms generally improve GWAS outcomes, including increased power, hits within coding and functional genomic regions, and co-localization with expression quantitative trait loci. Our findings suggest that biobank-scale GWAS can benefit from phenotyping algorithms that integrate multiple data domains.</p>

3 Authors

Abigail Newbury
Ahmed Elhussein
Gamze Gürsoy

1 Application

Application ID	Title
100316	A deep learning based approach for phenome-wide association studies

Enabling scientific discoveries that improve human health