About
Most chronic diseases come from an accumulation of interactions between genetic, environmental and lifestyle risk factors over an individual's lifetime. Common chronic diseases often share common risk factors including lifestyle (physical activity, diet, smoking), and environment exposure (pollution). The relative prognostic Importance and optimal levels of risk factors for diseases in individuals are sparsely studied.
Aim: The purpose of this study is to (1) identify the important predictors for common chronic diseases and clinical outcomes in a large epidemiological study and (2) compare the performance between machine learning approaches and traditional statistical models in clinical events prediction.
Scientific rationale: Variables including individual clinical, demographic, lifestyle and environmental factors, genetic markers will be utilized to develop prediction models.
The main outcomes for this study are common chronic diseases incidence, death from specific disease, and all causes. The associations of genetic, environmental and lifestyle factors with the risk of outcomes of interest will be tested using Cox proportional hazards.
We will construct multiple machine learning models to assess the strength of association for the risk factors. The predictor variables' relative importance will be ranked according to highest relative contribution of each risk factor in every model. Cox proportional hazards regression and machine learning algorithms (random survival forest, gradient boosting machine) will also be used to develop prediction models. The models performance will be evaluated using Harrell's concordance index (C-index) for discrimination (accuracy), brier score (BS) for calibration (precision), and calibration plots.
Project duration: The project will take approximately 3 years to complete.
Public health impact: Understanding important risk factors on common chronic diseases and clinical outcomes will provide evidence for intervention and prevention of diseases. The risk prediction models for diseases based on people's features and various exposures can identify individuals at particular risk and provide decision-making supports for individualized intervention.