: Application

Application 78795

Title:	Regression methods for phenome-wide association analysis on large-scale biobank data
Lead Institution:	Peking University
Principal investigator:	Dr Wenjian Bi

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

About

With the advances in genotyping technologies and electronic health records (EHRs), large biobanks have been great resources to identify novel genetic associations and gene-environment interactions on a genome-wide and even a phenome-wide scale. To date, several phenome-wide association studies (PheWAS) have been performed on biobank data, which provides comprehensive insights into many aspects of human genetics and biology. Although inspiring, PheWAS on large-scale biobank data encounters new challenges including large computational burden, unbalanced phenotypic distribution, and genetic relationship. For quantitative and binary traits, some state-of-art strategies such as matrix projection, saddlepoint approximation, and mixed model approaches have been used to overcome the challenges. However, for some complex phenotypes such as MRI, the analysis approaches are still urgently needed. This application proposes to develop fast and accurate regression methods which can be used to fully utilize phenotypes with complex structure. Since whole genome sequencing can accurately identify and genotype rare variants, scalable and powerful methods to evaluate rare variant associations will also be proposed in this application. In addition, the evolving availability of new technologies will provide us with rich multi-omics data resources. This application will also consider how to effectively incorporate additional information to boost powers and to increase interpretability in phenome-wide studies. The application estimates a duration of 3 years. The developed approaches will be important supplementary to the existing analysis approaches and will be applied to UK Biobank data to identify novel genetic variants associated with phenotypes and environmental factors, which can contribute to translational and clinical research, including to construct risk prediction models for complex diseases and phenotypes, to identify the causal effect of exposures and drugs, and to identify drug targets and repurposing.

6 Publications

Pub ID	Title	Author(s)	Year	Journal
15815	Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks	Yuzhuo Ma (+3)	2025	Nature Communications
14381	Genome-wide interaction association analysis identifies interactive effects of childhood maltreatment and kynurenine pathway on depression	Yaoyao Sun (+17)	2025	Nature Communications
15365	Multitrait GWAS of non-suicidal self-injury and the polygenetic effects on child psychopathology and brain structures	Yaoyao Sun (+14)	2025	Cell Reports Medicine
15182	SPAGRM: effectively controlling for sample relatedness in large-scale genome-wide association studies of longitudinal traits	He Xu (+11)	2025	Nature Communications
7880	Scalable mixed model methods for set-based association studies on large-scale categorical data analysis and its application to exome-sequencing data in UK Biobank	Wenjian Bi (+5)	2023	American Journal of Human Genetics
9907	Shared Genetic Determinants of Schizophrenia and Autism Spectrum Disorder Implicate Opposite Risk Patterns: A Genome-Wide Analysis of Common Variants	Yu Chen (+3)	2024	Schizophrenia Bulletin

Enabling scientific discoveries that improve human health