Abstract
BACKGROUND: Coronary artery disease (CAD) is one of the biggest causes of mortality worldwide. Risk stratification for early detection is essential for the primary prevention of CAD. QRISK3 is known to overestimate future CAD risk in some populations, resulting in unnecessary preventive treatment that reduces the cost-effectiveness and safety. Combining machine learning with a metaheuristic optimisation approach using the Particle Swarm Optimization algorithm may outperform QRISK3 in predicting CAD. It may improve performance by selecting the best-performing subset of features related to clinical outcomes.</p>
METHODS: This study uses the UK Biobank dataset consisting of 348 015 participants aged 24-84 years with no prior diagnosis of CAD. The performance of both QRISK3 and machine learning models was evaluated separately using receiver operating characteristic analysis. Several machine learning models were assessed: Logistic Regression, Decision Tree, Random Forest, Naïve Bayes and Gradient Boosting. The dataset was split into training and test sets with a ratio of 4:1 for the machine learning models. Each model has been developed by adding a Particle Swarm Optimization algorithm to enhance the model's classification accuracy.</p>
RESULTS: Out of 348 015 participants, 23 136 individuals (6.64%) were diagnosed with CAD within 10 years following their first visit, while 324 879 individuals (93.4%) did not develop CAD. The area under the curve (AUC) value of the QRISK3 prediction was 0.6113, while the gradient boosting model using Particle Swarm Optimization achieved a better performance AUC of 0.7258.</p>
CONCLUSIONS: This study shows hybrid machine learning models optimised with the Particle Swarm Optimization algorithm can better predict CAD than QRISK3. The application of such machine learning models can effectively identify high-risk CAD patients, allowing for more personalised preventative strategies and supporting policymakers in implementing lifestyle change recommendations.</p>