A Machine Learning Model Based on Genetic and Traditional Cardiovascular Risk Factors to Predict Premature Coronary Artery Disease

¹ Guangzhou Institute of Cardiovascular Disease, Guangdong Key Laboratory of Vascular Diseases, State Key Laboratory of Respiratory Disease, The Second Affiliated Hospital, Guangzhou Medical University, 510260 Guangzhou, Guangdong, China

² Department of Laboratory Medicine, Panyu Hospital of Chinese Medicine, Guangzhou University of Chinese Medicine, 511400 Guangzhou, Guangdong, China

³ Department of Emergency, The Second Affiliated Hospital, Guangzhou Medical University, 510260 Guangzhou, Guangdong, China

⁴ General Practice, Guangzhou Medical University, 510182 Guangzhou, Guangdong, China

^*Correspondence: 2008690805@gzhmu.edu.cn (Chao-Wei Tian); liushiming@gzhmu.edu.cn (Shi-Ming Liu)
Academic Editors: Wei Lan, Qingfeng Chen and Khondaker Miraz Rahman

Front. Biosci. (Landmark Ed) 2022, 27(7), 211; https://doi.org/10.31083/j.fbl2707211

Submitted: 27 April 2022 | Revised: 16 June 2022 | Accepted: 24 June 2022 | Published: 4 July 2022

This is an open access article under the CC BY 4.0 license.

Abstract

Background: Premature coronary artery disease (PCAD) has a poor prognosis and a high mortality and disability rate. Accurate prediction of the risk of PCAD is very important for the prevention and early diagnosis of this disease. Machine learning (ML) has been proven a reliable method used for disease diagnosis and for building risk prediction models based on complex factors. The aim of the present study was to develop an accurate prediction model of PCAD risk that allows early intervention. Methods: We performed retrospective analysis of single nucleotide polymorphisms (SNPs) and traditional cardiovascular risk factors (TCRFs) for 131 PCAD patients and 187 controls. The data was used to construct classifiers for the prediction of PCAD risk with the machine learning (ML) algorithms LogisticRegression (LRC), RandomForestClassifier (RFC) and GradientBoostingClassifier (GBC) in scikit-learn. Three quarters of the participants were randomly grouped into a training dataset and the rest into a test dataset. The performance of classifiers was evaluated using area under the receiver operating characteristic curve (AUC), sensitivity and concordance index. R packages were used to construct nomograms. Results: Three optimized feature combinations (FCs) were identified: RS-DT-FC1 (rs2259816, rs1378577, rs10757274, rs4961, smoking, hyperlipidemia, glucose, triglycerides), RS-DT-FC2 (rs1378577, rs10757274, smoking, diabetes, hyperlipidemia, glucose, triglycerides) and RS-DT-FC3 (rs1169313, rs5082, rs9340799, rs10757274, rs1152002, smoking, hyperlipidemia, high-density lipoprotein cholesterol). These were able to build the classifiers with an AUC $>$ 0.90 and sensitivity $>$ 0.90. The nomograms built with RS-DT-FC1, RS-DT-FC2 and RS-DT-FC3 had a concordance index of 0.94, 0.94 and 0.90, respectively, when validated with the test dataset, and 0.79, 0.82 and 0.79 when validated with the training dataset. Manual prediction of the test data with the three nomograms resulted in an AUC of 0.89, 0.92 and 0.83, respectively, and a sensitivity of 0.92, 0.96 and 0.86, respectively. Conclusions: The selection of suitable features determines the performance of ML models. RS-DT-FC2 may be a suitable FC for building a high-performance prediction model of PCAD with good sensitivity and accuracy. The nomograms allow practical scoring and interpretation of each predictor and may be useful for clinicians in determining the risk of PCAD.

Keywords

premature coronary artery disease

machine learning

single nucleotide polymorphisms

traditional cardiovascular risk factors

nomogram

rs10757274

Figures

Fig. 1.

Previous article in this issue

Next article in this issue

Front. Biosci. (Landmark Ed) Print ISSN 2768-6701 Electronic ISSN 2768-6698