Developing a Machine Learning Model to Predict 180-day Readmission for Elderly Patients with Angina

Background: Readmission of elderly angina patients has become a serious problem, with a dearth of available prediction tools for readmission assessment. The objective of this study was to develop a machine learning (ML) model that can predict 180-day all-cause readmission for elderly angina patients. Methods: The clinical data for elderly angina patients was retrospectively collected. Five ML algorithms were used to develop prediction models. Area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), and the Brier score were applied to assess predictive performance. Analysis by Shapley additive explanations (SHAP) was performed to evaluate the contribution of each variable. Results: A total of 1502 elderly angina patients (45.74% female) were enrolled in the study. The extreme gradient boosting (XGB) model showed good predictive performance for 180-day readmission (AUROC = 0.89; AUPRC = 0.91; Brier score = 0.21). SHAP analysis revealed that the number of medications, hematocrit, and chronic obstructive pulmonary disease were important variables associated with 180-day readmission. Conclusions: An ML model can accurately identify elderly angina patients with a high risk of 180-day readmission. The model used to identify individual risk factors can also serve to remind clinicians of appropriate interventions that may help to prevent the readmission of patients.


Introduction
Angina is defined as a substernal chest pain, pressure, or discomfort [1], and is categorized as stable or unstable [2].It is a common symptom of ischemic heart disease (IHD) and acute coronary syndrome (ACS), and a major cause of morbidity and mortality worldwide [2,3].The prevalence of angina increases with age in both females and males.Approximately 30,000 to 40,000 people per million are diagnosed with chronic stable angina in Western countries [2].Approximately 4.1 million patients with coronary artery disease die each year in Europe, with 82% of these aged >65 years [4].A study of 12,277 participants in China reported that 975 experienced angina symptoms [5].
Many angina patients are readmitted after discharge due to poor disease control.Angina after myocardial infarction has been associated with an increased risk of readmission [6].An observational cohort study found that 11.7% of ACS patients were readmitted within 30 days [7], with most readmissions occurring primarily in elderly individuals [8].Several studies have reported that the economic burden due to angina is increasing [9][10][11].Frequent readmission may also be a difficult experience for elderly patients due to impaired mobility [12], and to the increased financial burden [13,14].Consequently, preventing the readmission of these patients is of major importance.
With the recent development of artificial intelligence, machine learning (ML) is increasingly being applied in the medical field [15,16].Indeed, several studies have reported models that predict angina-related risk [17,18].However, to the best of our knowledge, there is still no tool available to assess elderly angina patients for readmission.The aim of the present study was therefore to develop an ML model that predicts 180-day readmission of elderly angina patients, and to identify the important factors for readmission.

Study Population
This retrospective study was conducted at Sichuan Provincial People's Hospital and included elderly patients who received inpatient treatment from July 2018 to June 2020.The inclusion criteria were: (1) age ≥60 years; (2) diagnosed with angina [19]; (3) follow-up time ≥180 days.The exclusion criteria were: (1) length of stay (LOS) <2 days or >60 days; (2) transfer to another hospital; (3) died in hospital.The age standard was determined according to the Chinese Standard for the elderly [20].
As this was a retrospective study, the requirement for informed consent was waived.The primary outcome was 180-day all-cause readmission.The personal information of patients was anonymized during the data collection pro-cess.This research was approved by the Ethics Committee of Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital (approval number: 2023-85).

Data Selection and Preprocessing
Patient information including medical records, medication information, complications, and laboratory results was collected from the hospital records system.In the case of multiple laboratory results, the most recent test results before discharge were selected.For cases with multiple readmissions, the first readmission record was used.
Data preprocessing included firstly the exclusion of variables with >90% missing data.Secondly, extreme values for laboratory results were deleted, as defined by values >3 interquartile ranges (IQR) from the end of the box.Random forest (RF) was used to replace missing values and the Z-score to standardize continuous variables.The number of medications (NOM) taken by each patient at the time of discharge was counted, and the age-adjusted Charlson comorbidity index (ACCI) was used to analyze complications.

Development of the Model
Five ML algorithms were applied to develop prediction models: logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGB).The data set was divided into a training set and a test set at a ratio of 7:3.The training set was applied for developing prediction models and the test set for validating model performance.Additionally, borderline synthetic minority oversampling technique (SMOTE) was used to balance readmission and non-readmission patients, and a 5fold cross-validation method was used to evaluate the performance of the model on the training set.

Model Evaluation
Area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC) were the primary indicators used to evaluate the performance of each model.The accuracy, precision, recall, and F1-value of each model were also evaluated.Furthermore, the Brier score and calibration curve were used to evaluate the calibration of models.A model was considered to have good calibration when the Brier score was ≤0.25.

Model Interpretation
Considering the issue of black-box in the prediction of ML algorithms, analysis by Shapley additive explanations (SHAP) is a useful tool to determine the contribution of each variable to the outputs of ML models.SHAP was therefore used in this study to quantify the contribution of each variable to the best model.

Statistical Analysis
All statistical analyses were performed using SPSS software version 25 (IBM SPSS Statistics, IBM Corporation, Armonk, NY, USA).The development of models was achieved with sklearn packages in Python version 3.7.0(https://www.python.org).Counts and percentages were used for the expression of categorical variables, and these were analyzed by Chi-square test.The median (IQR) or mean ± standard deviation (SD) was used to express continuous variables.These were analyzed with the Mann-Whitney test or t-test.A two-sided p value of <0.05 was considered statistically significant.

Data Processing and Variable Screening
Data from a total of 178 variables was collected, but 85 variables with >90% missing data were excluded.Following consultation with clinical experts, a subset of 49 variables was selected for Lasso analysis.A final set of 14 variables was chosen for the development of ML models.The importance of variables selected by Lasso is shown in Fig. 1.

Model Development and Evaluation
LR, KNN, SVM, GBDT, and XGB were combined with 14 variables selected by Lasso to develop 5 models for the prediction of 180-day readmission in elderly patients with angina.The AUROC for the 5 ML models on the training set are shown in Fig. 2, with the XGB model showing the best performance.With the test set, XGB gave the best performance for AUROC (0.89), accuracy (0.79), and precision (0.96).The AUROC for the 5 ML models with the test set ranged from 0.61 (KNN) to 0.89 (XGB), while the accuracy ranged from 0.55 (SVM) to 0.79 (XGB), and the precision from 0.59 (KNN) to 0.96 (XGB).In addition, the Brier score ranged from 0.21 (XGB) to 0.45 (SVM) (Table 2).The AUROC and AUPRC for the 5 ML models used on the test set are shown in Fig. 3A,B, respectively.We also evaluated the calibration ability of the XGB model, since this model had the best predictive performance in the test set.The XGB model was found to have good calibration ability (Fig. 4).

Model Interpretation
SHAP values were calculated to determine the contribution of each variable to the prediction results.As shown in Fig. 5A, the 6 most important contributors to the predictions were NOM, hematocrit, chronic obstructive pulmonary disease (COPD), brain natriuretic peptide (BNP), age, and cholinesterase (CHE).
The contribution of each variable to the predicted outcome of individual patients was also determined using illustrative examples (Fig. 5B).In the case shown, a positive effect on the prediction result was provided by cholesterol, low density lipoprotein cholesterol (LDLC), and age, whereas a negative effect was provided by BNP, CHE, eosinophil, LOS, NOM, and COPD.

Discussion
Five ML algorithms combined with 14 variables were used in this study to develop prediction models for the risk of 180-day readmission in elderly angina patients.The XGB model was found to have the best predictive performance, as shown by the highest AUROC and AUPRC values.Furthermore, the XGB model exhibited good calibration performance, as demonstrated by the Brier score and calibration curve.
Although several models for predicting angina-related risk have been reported [21][22][23], there are no readily available tools for predicting all-cause readmission of individual, elderly angina patients.Several studies have evaluated readmission following cardiovascular disease (CVD) [24,25].Okere et al. [24] developed a decision tree model to predict 30-day hospital readmission of IHD patients, with their model showing good predictive performance.Another study used 5 ML algorithms to predict 30-day all-cause readmission in a cohort of 1962 patients with CVD.This decision tree model showed a high F1-value (64%), precision (57%), and recall (71%) [25].
Compared to the aforementioned models, the ML model developed in the present study was more accurate and convenient.Moreover, the two previous models lacked external validation and calibration evaluation, thus making them unsuitable for assessment of readmission in elderly angina patients.Although the AUROC in our model was slightly lower than that of Okere et al. [24] (0.89 vs 0.95), it was developed specifically for elderly angina patients, and is therefore more appropriate for readmission assessment in this population.Additionally, the Brier score combines model discrimination and calibration, and represents the mean square error between the predicted and observed results.When 2 models are compared, a smaller Brier score indicates better model performance.In the current study, the Brier score for the XGB model was lower than that of the other ML models.Hence, the XGB model exhibited good calibration ability for predicting 180-day all-cause readmission in elderly angina patients.Many clinical variables have been associated with readmission in the existing literature [7,[26][27][28][29][30][31][32][33].To increase the accuracy of prediction models, it is important to identify the significant influencing factors from amongst the many complex factors in elderly angina patients.After considering the potential influence of common clinical variables on the prediction results, 14 variables were selected from 178 clinical features in order to develop a practical model.Elderly patients often experience polypharmacy due to multiple comorbidities, which may increase the risk of adverse drug events [34].The present study confirmed CHD and NOM as predictors in the models.Consistent with our find-ings, previous studies have reported that age, BNP, and LOS affect hospital readmission for cardiovascular patients [29,30,33].In our study, age, BNP, and LOS were associated with an increased risk of 180-day readmission.Furthermore, we also identified other important factors for the readmission of elderly angina patients, including hematocrit, LDLC, and cholesterol.Interestingly, the relationship between readmission and some of the variables revealed by SHAP analysis in our study is not immediately consistent with clinical intuition.However, these variables may still reflect the extent of patient illness, thus helping to predict outcome.Another advantage of this study is that SHAP was used to determine individual risk, which may help in providing optimal patient care and appropriate intervention.
Nevertheless, there are several limitations to this study.Firstly, the single-center study design means that analysis of data from other medical institutions is needed to test the predictive performance of our model.Secondly, this study collected only common clinical data.Further research is therefore required to determine the influence of other factors that could positively impact the readmission of patients with cardiac diseases, such as continuity of care, self-care, and perceived control [35,36].Thirdly, although complications in elderly patients were incorporated into this research, relevant information such as disease severity and the duration of complications were not included.Finally, a prospective study is required to verify the clinical utility of our prediction model.

Conclusions
This study developed models to predict 180-day allcause readmission in elderly angina patients by combining 5 ML algorithms with common clinical variables.The XGB model demonstrated superior predictive performance compared to the other models.This ML-based tool could have future clinical applications for the identification of 180-day readmissions in elderly angina patients, as well as for improving their quality of care.

Fig. 3 .
Fig. 3. Receiver operating characteristic (ROC) curves and precision recall curves of the 5 machine learning models on the test set.(A) ROC curves of 180-day readmission.(B) Precision recall curves of 180-day readmission.Abbreviations: AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision recall curve; LR, logistic regression; KNN, k-nearest neighbor; SVM, support vector machine; GBDT, gradient boosting decision tree; XGB, extreme gradient boosting.

Fig. 5 .
Fig. 5. Contributions of input variables to readmission predictions.(A) The SHAP summary plot of the 10 most important variables of the XGB model for 180-day readmission.(B) Contribution of every variable to the predicted outcome of one sample.Red represents positive contribution, and blue represents negative contribution.Abbreviations: NOM, number of medications; COPD, chronic obstructive pulmonary disease; BNP, brain natriuretic peptide; CHE, cholinesterase; LDLC, low density lipoprotein cholesterol; LOS, length of stay; SHAP, Shapley additive explanations.