Academic Editors: Wei Lan, Qingfeng Chen and Khondaker Miraz Rahman
Background: Premature coronary artery disease (PCAD) has a poor
prognosis and a high mortality and disability rate. Accurate prediction of the
risk of PCAD is very important for the prevention and early diagnosis of this
disease. Machine learning (ML) has been proven a reliable method used for disease
diagnosis and for building risk prediction models based on complex factors. The
aim of the present study was to develop an accurate prediction model of PCAD risk
that allows early intervention. Methods: We performed retrospective
analysis of single nucleotide polymorphisms (SNPs) and traditional cardiovascular
risk factors (TCRFs) for 131 PCAD patients and 187 controls. The data was used to
construct classifiers for the prediction of PCAD risk with the machine learning
(ML) algorithms LogisticRegression (LRC), RandomForestClassifier (RFC) and
GradientBoostingClassifier (GBC) in scikit-learn. Three quarters of the
participants were randomly grouped into a training dataset and the rest into a
test dataset. The performance of classifiers was evaluated using area under the
receiver operating characteristic curve (AUC), sensitivity and concordance index.
R packages were used to construct nomograms. Results: Three optimized
feature combinations (FCs) were identified: RS-DT-FC1 (rs2259816, rs1378577,
rs10757274, rs4961, smoking, hyperlipidemia, glucose, triglycerides), RS-DT-FC2
(rs1378577, rs10757274, smoking, diabetes, hyperlipidemia, glucose,
triglycerides) and RS-DT-FC3 (rs1169313, rs5082, rs9340799, rs10757274,
rs1152002, smoking, hyperlipidemia, high-density lipoprotein cholesterol). These
were able to build the classifiers with an AUC