: Publication 10997

Publication 10997

Title:	OWL: an optimized and independently validated machine learning prediction model for lung cancer screening based on the UK Biobank, PLCO, and NLST populations
Journal:	EBioMedicine
Published:	24 Jan 2023
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/36701900/
DOI:	https://doi.org/10.1016/j.ebiom.2023.104443
URL:	http://www.thelancet.com/article/S2352396423000087/pdf
Citations:	6 (6 in last 2 years) as of 8 Aug 2024

Abstract

BACKGROUND: A reliable risk prediction model is critically important for identifying individuals with high risk of developing lung cancer as candidates for low-dose chest computed tomography (LDCT) screening. Leveraging a cutting-edge machine learning technique that accommodates a wide list of questionnaire-based predictors, we sought to optimize and validate a lung cancer prediction model.

METHODS: We developed an Optimized early Warning model for Lung cancer risk (OWL) using the XGBoost algorithm with 323,344 participants from the England area in UK Biobank (training set), and independently validated it with 93,227 participants from UKB Scotland and Wales area (validation set 1), as well as 70,605 and 66,231 participants in the Prostate, Lung, Colorectal, and Ovarian cancer screening trial (PLCO) control and intervention subpopulations, respectively (validation sets 2 & 3) and 23,138 and 18,669 participants in the United States National Lung Screening Trial (NLST) control and intervention subpopulations, respectively (validation sets 4 & 5). By comparing with three competitive prediction models, i.e., PLCO modified 2012 (PLCO_m2012), PLCO modified 2014 (PLCO_all2014), and the Liverpool Lung cancer Project risk model version 3 (LLPv3), we assessed the discrimination of OWL by the area under receiver operating characteristic curve (AUC) at the designed time point. We further evaluated the calibration using relative improvement in the ratio of expected to observed lung cancer cases (RI_EO), and illustrated the clinical utility by the decision curve analysis.

FINDINGS: For general population, with validation set 1, OWL (AUC = 0.855, 95% CI: 0.829-0.880) presented a better discriminative capability than PLCO_all2014 (AUC = 0.821, 95% CI: 0.794-0.848) (p < 0.001); with validation sets 2 & 3, AUC of OWL was comparable to PLCO_all2014 (AUC_PLCOall2014-AUC_OWL < 1%). For ever-smokers, OWL outperformed PLCO_m2012 and PLCO_all2014 among ever-smokers in validation set 1 (AUC_OWL = 0.842, 95% CI: 0.814-0.871; AUC_PLCOm2012 = 0.792, 95% CI: 0.760-0.823; AUC_PLCOall2014 = 0.791, 95% CI: 0.760-0.822, all p < 0.001). OWL remained comparable to PLCO_m2012 and PLCO_all2014 in discrimination (AUC difference from -0.014 to 0.008) among the ever-smokers in validation sets 2 to 5. In all the validation sets, OWL outperformed LLPv3 among the general population and the ever-smokers. Of note, OWL showed significantly better calibration than PLCO_m2012, PLCO_all2014 (RI_EO from 43.1% to 92.3%, all p < 0.001), and LLPv3 (RI_EO from 41.4% to 98.7%, all p < 0.001) in most cases. For clinical utility, OWL exhibited significant improvement in average net benefits (NB) over PLCO_all2014 in validation set 1 (NB improvement: 32, p < 0.001); among ever smokers of validation set 1, OWL (average NB = 289) retained significant improvement over PLCO_m2012 (average NB = 213) (p < 0.001). OWL had equivalent NBs with PLCO_m2012 and PLCO_all2014 in PLCO and NLST populations, while outperforming LLPv3 in the three populations.

INTERPRETATION: OWL, with a high degree of predictive accuracy and robustness, is a general framework with scientific justifications and clinical utility that can aid in screening individuals with high risks of lung cancer.

FUNDING: National Natural Science Foundation of China, the US NIH.

Application ID	Title
57471	Phenome-Wide Association Studies of the immune-related genetic variants

Application ID

Title

57471

Phenome-Wide Association Studies of the immune-related genetic variants

Abstract

11 Keywords

14 Authors

1 Application