IMR Press / RCM / Volume 24 / Issue 6 / DOI: 10.31083/j.rcm2406168
Open Access Original Research
A Machine Learning Framework for Diagnosing and Predicting the Severity of Coronary Artery Disease
Show Less
1 Department of Cardiology, The First Affiliated Hospital of Xinjiang Medical University, 830011 Urumqi, Xinjiang, China
2 College of Information Science and Technology, Shihezi University, 832003 Shihezi, Xinjiang, China
*Correspondence: maxiangxj@yeah.net (Xiang Ma); djg_inf@shzu.edu.cn (Jian Guo Dai)
These authors contributed equally.
Rev. Cardiovasc. Med. 2023, 24(6), 168; https://doi.org/10.31083/j.rcm2406168
Submitted: 31 January 2023 | Revised: 2 March 2023 | Accepted: 6 March 2023 | Published: 8 June 2023
Copyright: © 2023 The Author(s). Published by IMR Press.
This is an open access article under the CC BY 4.0 license.
Abstract

Background: Although machine learning (ML)-based prediction of coronary artery disease (CAD) has gained increasing attention, assessment of the severity of suspected CAD in symptomatic patients remains challenging. Methods: The training set for this study consisted of 284 retrospective participants, while the test set included 116 prospectively enrolled participants from whom we collected 53 baseline variables and coronary angiography results. The data was pre-processed with outlier processing and One-Hot coding. In the first stage, we constructed a ML model that used baseline information to predict the presence of CAD with a dichotomous model. In the second stage, baseline information was used to construct ML regression models for predicting the severity of CAD. The non-CAD population was included, and two different scores were used as output variables. Finally, statistical analysis and SHAP plot visualization methods were employed to explore the relationship between baseline information and CAD. Results: The study included 269 CAD patients and 131 healthy controls. The eXtreme Gradient Boosting (XGBoost) model exhibited the best performance amongst the different models for predicting CAD, with an area under the receiver operating characteristic curve of 0.728 (95% CI 0.623–0.824). The main correlates were left ventricular ejection fraction, homocysteine, and hemoglobin (p < 0.001). The XGBoost model performed best for predicting the SYNTAX score, with the main correlates being brain natriuretic peptide (BNP), left ventricular ejection fraction, and glycated hemoglobin (p < 0.001). The main relevant features in the model predictive for the GENSINI score were BNP, high density lipoprotein, and homocysteine (p < 0.001). Conclusions: This data-driven approach provides a foundation for the risk stratification and severity assessment of CAD. Clinical Trial Registration: The study was registered in www.clinicaltrials.gov protocol registration system (number NCT05018715).

Keywords
machine learning
coronary artery disease
SYNTAX score
GENSINI score
Funding
2022B03022/Key Research and Development Task of Xinjiang Uygur Autonomous Region Research
Figures
Fig. 1.
Share
Back to top