Abstract
Background: Inflammatory bowel disease (IBD) is a chronic, incurable gastrointestinal disease without a gold standard for diagnosis. This study aimed to develop predictive models for diagnosing IBD, Crohn's disease (CD), and Ulcerative colitis (UC) by combining two approaches: machine learning (ML) and traditional nomogram models.</p>
Methods: Cohorts 1 and 2 comprised data from the UK Biobank (UKB), and the First Hospital of Jilin University, respectively, which represented the initial laboratory tests upon admission for 1135 and 237 CD patients, 2192 and 326 UC patients, and 1798 and 298 non-IBD patients. Cohorts 1 and 2 were used to create predictive models. The parameters of the machine learning model established by Cohorts 1 and 2 were merged, and nomogram models were developed using Logistic regression. Cohort 3 collected initial laboratory tests from 117 CD patients, 197 UC patients, and 241 non IBD patients at a tertiary hospital in different regions of China for external testing of three nomogram models.</p>
Results: For Cohort 1, ML-IBD-1, ML-CD-1 and ML-UC-1 models developed using the LightGBM algorithm demonstrated exceptional discrimination (ML-IBD-1: AUC = 0.788; ML-CD-1: AUC = 0.772; ML-UC-1: AUC = 0.841). For Cohort 2, ML-IBD-2, ML-CD-2, and ML-UC-2 models developed using XGBoost and Logistic Regression algorithms demonstrated exceptional discrimination (ML-IBD-2: AUC = 0.894; ML-CD-2: AUC = 0.932; ML-UC-2: AUC = 0.778). The nomogram model exhibits good diagnostic capability (nomogram-IBD: AUC=0.778, 95% CI (0.688-0.868); nomogram-CD: AUC=0.744, 95% CI (0.710-0.778); nomogram-UC, AUC=0.702, 95% CI (0.591-0.814)). The predictive ability of the three models was validated in cohort 3 (nomogram-IBD: AUC=0.758, 95% CI (0.683-0.832); nomogram-CD: AUC=0.791, 95% CI (0.717-0.865); nomogram-UC, AUC=0.817, 95% CI (0.702-0.932)).</p>
Conclusion: This study utilized three cohorts and developed risk prediction models for IBD, CD, and UC with good diagnostic capability, based on conventional laboratory data using ML and nomogram.</p>