Abstract
Polygenic risk scores (PRS) are rapidly emerging as a way to measure disease risk by aggregating multiple genetic variants. Understanding the interplay of PRS with environmental factors is critical for interpreting and applying PRS in a wide variety of settings. We develop an efficient method for simultaneously modeling gene-environment correlations and interactions using PRS in case control studies. We use a logistic-normal regression modeling framework to specify the disease risk and PRS distribution in the underlying population and propose joint inference across the two models using the retrospective likelihood of the case-control data. Extensive simulation studies demonstrate the flexibility of the method in trading-off bias and efficiency for the estimation of various model parameters compared to the standard logistic regression or a case-only analysis for gene-environment interactions, or a control-only analysis for gene-environment correlations. Finally using simulated case-control data sets within the UK Biobank study, we demonstrate the power of our method for its ability to recover results from the full prospective cohort for the detection of an interaction between long-term oral contraceptive use and PRS on the risk of breast cancer. This method is computationally efficient and implemented in a user-friendly R package.</p>