About
Genetic epidemiology has entered the big data era and researchers have now access to both DNA data and a large number of disease-related traits including numerous molecular phenotypes and biomarkers. These extensive datasets hold great promise to decipher the genetic causes of human traits and diseases. However, the analysis of such complex and extensive data faces major methodological challenges and current approaches showed limitations in their ability to capture DNA-diseases associations. Here we aim at extending an innovative multivariate method we recently developed that can allow the detection of genetic variants associated with common human diseases when existing methods would fail. The performances of the developed approaches will first be assessed using both simulated data and real data from the UK Biobank. The approaches will then be applied in the UK Biobank on a broad range of human phenotypes. We expect the project to last three years, one for each of the following specific aims: 1) to improve the computational time of the approach in order to make it tractable in large dataset such as the UK Biobank; 2) to extend the approach for testing gene-environment interaction effects on human phenotypes; and 3) to develop methods leveraging results from 1) and 2) to assess shared and specifics genetic and environmental effects across human phenotypes. For each specific aim, we will alternate method development stage and real data analyses. Large-scale genomic data have the potential to answer important biological questions and improve public health. Many of these questions, such as improving disease risk prediction or inferring causal relationships between molecular phenotypes, environmental exposures and diseases, rely on our ability to identify associations between variables. Our proposal will provide statistical and computational methods that maximize the use of large cohort data, such as the UK Biobank, and help fulfilling these goals.