Abstract
Background Nonalcoholic fatty liver disease (NAFLD) is the most prevalent liver disease worldwide. Cardiovascular disease (CVD) is the leading cause of mortality among patients with NAFLD. The aim of our study was to develop a machine learning algorithm integrating clinical, lifestyle, and genetic risk factors to identify CVD in patients with NAFLD. Methods and Results We created a cohort of patients with NAFLD from the UK Biobank, diagnosed according to proton density fat fraction from magnetic resonance imaging data sets. A total of 400 patients with NAFLD with subclinical atherosclerosis or clinical CVD, defined by disease codes, constituted cases and 446 NAFLD cases with no CVD constituted controls. We evaluated 7 different supervised machine learning approaches on clinical, lifestyle, and genetic variables for identifying CVD in patients with NAFLD. The most significant clinical and lifestyle variables observed by the predictive modeling were age (59 years [54.00-63.00 years]), hypertension (145 mm Hg [134.0-156.0 mm Hg] and 85 mm Hg [79.00-93.00 mm Hg]), waist circumference (98 cm [95.00-105.00 cm]), and sedentary lifestyle, defined as time spent watching TV >4 h/d. In the genetic data, single-nucleotide polymorphisms in IL16 and ANKLE1 gene were most significant. Our proposed ensemble-based integrative machine learning model achieved an area under the curve of 0.849 using the random forest modeling for CVD prediction. Conclusions We propose a machine learning algorithm that identifies CVD in patients with NAFLD through integration of significant clinical, lifestyle, and genetic risk factors. These patients with NAFLD at higher risk of CVD should be flagged for screening and aggressive treatment of their cardiometabolic risk factors to prevent cardiovascular morbidity and mortality.</p>