A New Risk Score for Predicting Postoperative Mortality in Suspected Heart Failure Patients Undergoing Valvular Surgery

Background: Heart failure (HF) is one of the most important indications of the severity of valvular heart disease (VHD). VHD with HF is frequently associated with a higher surgical risk. Our study sought to develop a risk score model to predict the postoperative mortality of suspected HF patients after valvular surgery. Methods: Between January 2016 and December 2018, all consecutive adult patients suspected of HF and undergoing valvular surgery in the Chinese Cardiac Surgery Registry (CCSR) database were included. Finally, 14,645 patients (55.39 ± 11.6 years, 43.5% female) were identified for analysis. As a training group for model derivation, we used patients who had surgery between January 2016 and May 2018 (11,292 in total). To validate the model, patients who underwent surgery between June 2018 and December 2018 (a total of 3353 patients) were included as a testing group. In training group, we constructed and validated a scoring system to predict postoperative mortality using multivariable logistic regression and bootstrapping method (1000 re-samples). We validated the scoring model in the testing group. Brier score and calibration curves using bootstrapping with 1000 re-samples were used to evaluate the calibration. The area under the receiver operating characteristic curve (AUROC) was used to evaluate the discrimination. The results were also compared to EuroSCORE II. Results: The final score ranged from 0 to 19 points and involved 9 predictors: age ≥60 years; New York Heart Association Class (NYHA) IV; left ventricular ejection fraction (LVEF) <35%; estimated glomerular filtration rate (eGFR) <50 mL/min/1.73 m2; preoperative dialysis; Left main artery stenosis; non-elective surgery; cardiopulmonary bypass (CPB) time >200 minutes and perioperative transfusion. In training group, observed and predicted postoperative mortality rates increased from 0% to 45.5% and from 0.8% to 50.3%, respectively, as the score increased from 0 up to ≥10 points. The scoring model’s Brier scores in the training and testing groups were 0.0279 and 0.0318, respectively. The area under the curve (AUC) values of the scoring model in both the training and testing groups were 0.776, which was significantly higher than EuroSCORE II in both the training (AUC = 0.721, Delong test, p < 0.001) and testing (AUC = 0.669, Delong test, p < 0.001) groups. Conclusions: The new risk score is an effective and concise tool that could accurately predict postoperative mortality rates in suspected HF patients after valve surgery.


Background
Heart failure (HF) is a life-threatening condition and is associated with significant morbidity, poor functional capacity, and decreased quality of life [1].In 2017, more than 64 million people worldwide were affected with HF [2], and the number is likely to rise.Savarese G et al. [3] reported in their survey that annual health care costs per HF patient amount up to €25,000 in the Western world, resulting in a substantial economic burden.The prevalence of valvular heart disease (VHD) among the elderly, as well as VHD-related HF, is rising as the population is ageing [4].VHD is one of the most common types of cardiac surgery.According to a multicenter study conducted in China [5], the overall mortality rate for VHD surgery was around 2%.However, it is much higher in patients with HF, and can be greater than 3%.Nearly all risk prediction models for cardiac surgery include HF as an independent predictor.There is a growing demand for risk assessment for these surgical patients, however current risk scores do not provide a reliable estimate of the exact operative mortality in an individual HF patient [6,7].
In order to better assess the risk of surgery for these patients, the aim of the present study is to establish a simplified scoring risk model based on the Chinese Cardiac Surgery Registry (CCSR) database to accurately predict the postoperative mortality of suspected HF patients undergoing VHD surgery.

Data Source
The CCSR is a multicenter registry, and consists of a council comprised of cardiac surgeons and researchers from the National Center of Cardiovascular Diseases which oversees the registry.This database contains information about cardiac surgery from 94 institutions.Each participating institution performed more than 100 cardiac surgeries each year and was requested to record cases using the same case report form (CRF).These sites are advanced cardiac cen- ters and have many features that are common among large cardiac centers in China.According to the Chinese Society of Extracorporeal Circulation's yearly surveys, we estimate that our database contains roughly 30% to 40% of all valvular procedures and represents surgical outcomes from large cardiac hospitals [8].Every six months, two researchers investigated 5-10% of the reported cases at random for auditing.For cases in which there were missing data, the relevant participating units were required to resolve the problems in order to ensure the data's integrity.

Patients
Between January 1, 2016, and December 31, 2018, we found 39,470 patients from the CCSR database who had undergone valvular surgery.We excluded 1302 individuals who had a primary diagnosis of acute aortic dissection, and whose hemodynamic characteristics were markedly different from those of VHD.We further removed 4746 patients under the age of 16 and 18,777 patients who had no HF related symptoms or signs and were classified as New York Heart Association Class (NYHA) I. Finally, we identified a total of 14,645 cases (NYHA II or higher) for analysis.Patients who had surgery between January 2016 and May 2018 (a total of 11,292) were allocated into the training group for model derivation.Patients who underwent surgery between June 2018 and December 2018 (a total of 3353) were included as a testing group to validate the model.The patient enrollment flow chart is shown in Fig. 1.

Definitions
We defined suspected HF patients as those who were classified as NYHA II or higher, due to valvular disease.
According to the latest European Society of Cardiology (ESC) guidelines [1], suspected HF is defined as a clinical syndrome consisting of typical symptoms (e.g., breathlessness, ankle swelling, and fatigue) that may be accompanied by signs (e.g., elevated jugular venous pressure, pulmonary crackles, and peripheral oedema).It is due to structural or functional abnormalities of the heart that could result in elevated intracardiac pressures or inadequate cardiac output at rest or during exercise.
Postoperative mortality was defined as death occurring between the surgery and hospital discharge or within 30 days after surgery. 1 are shown in Supplementary Table 1.

Statistical Analysis
We followed the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) statement for reporting the derivation and testing of the prediction model [9].Categorical variables were presented as frequencies (percentages %) and were compared with chi-squared tests.Continuous variables were presented as mean ± standard deviation (SD), and were compared with the t test or the Wilcoxon rank sum test as appropriate.A p-value < 0.05 was considered statistically significant.In the training group, all the possible risk factors were screened by univariate analyses and variables associated with a p-value < 0.05 level in univariate screening were entered into multivariate analysis, using a stepwise "both direction" procedure based on the Akaike information criterion (AIC), sequentially removing items until the lowest AIC was obtained.Continuous vari-  ables were dichotomized before entering into regression analyses by means of restricted cubic spline curves [10] (Supplementary Figs.1-4) and/or accounting for clinically relevant thresholds.Regression coefficients of the final model were then used as weights to compute a simplified scoring system, by multiplying and rounding coefficients to their closest integer, following the approach from Cole to determine the optimal multiplier [11].The area under the receiver operating characteristic curve (AUROC) was used to evaluate model discrimination, and calibration curves were plotted to assess the concordance between observed and anticipated probabilities.1000 bootstrap resamples were used for validation.In addition, in terms of Brier score for calibration and AUROC for discriminating, the risk model was compared to EuroSCORE II.The Delong test was used to compare AUC values (with 95 percent confidence interval, 95% CI).R software version 4.0.3(R Foundation for Statistical Computing, Vienna, Austria) was used for statistical analysis.GraphPad Prism version 8.0 (GraphPad Software, San Diego, CA, USA) was used to draw figures.

Study Population
Fig. 1 illustrates the patient enrollment flow chart.Table 1 compares the demographics and other pre-or intraoperative risk variables of the training group (n = 11,292) with the testing group (n = 3353).Supplementary Table 1 contains the definitions of the variables in Table 1.The mean age in the training group was 55.25 and was 55.87 in the testing group (p = 0.007).The training group had 4898 (43.4%) female patients, while the testing group had 1471 (43.9%) female patients (p = 0.619).In the whole cohort, a total of 7834 patients (53.5%) and 7957 patients (54.3%) received AVR and MVR procedures, respectively.Furthermore, only 1612 patients (11%) had an MV repair operation.Patients with concomitant moderate to severe tricuspid insufficiency were simultaneously performed with tricuspid repair operation.In the training group, the median EuroSCORE II value was 0.018, while in the testing group, it was 0.019 (p = 0.043).The postoperative mortality rate was 3% in the training group and 3.3% in the testing group (p = 0.324).

Univariate and Multivariate Analyses
Factors associated with postoperative mortality after univariate screening are presented in Table 1.To construct a simplified scoring system, continuous variables were dichotomized before analyses and were defined as follows: age ≥60 years, eGFR <50 mL/min/1.73m 2 , CPB time >200 minutes and LVEF <35%.
Table 2 shows results of the multivariate analysis.The independent variables selected to construct the final model were: age ≥60 years; NYHA IV; left ventricular ejection fraction (LVEF) <35%; estimated glomerular filtration rate (eGFR) <50 mL/min/1.73m 2 ; preoperative dialysis; left main artery stenosis; non-elective surgery; cardiopulmonary bypass (CPB) time >200 minutes and perioperative transfusion.In the final simplified scoring model, points attributed to each predictor according to its odds ratio are likewise presented in Table 2.

Model Validation
Observed and predicted in-hospital mortality rates according to the score from the simplified scoring model ranged from 0% to 45.5% and from 0.8% to 50.3%, respectively, for a score of 0 to 10 or more, with exponential increasing mortality rates as the score increased (Table 3 and Fig. 2).

Fig. 2. Predicted vs. observed mortality rates and numbers of patients according to the risk score value (in training group).
Fig. 3 shows the calibration plots of the simplified scoring model, and it can be seen that the calibration of the model is satisfactory in both training and testing groups.
In the training set, the AUC of our simplified scoring model was 0.776, which was statistically higher than EuroSCORE II with an AUC of 0.721 (Delong test, p < 0.001).Meanwhile, the Brier score of our model was 0.0274, lower than EuroSCORE II (0.0279).The comparison of ROC curves is shown in Fig. 4A.
In the testing group, the AUC of our simplified scoring model was 0.776, which was remarkably higher than EuroSCORE II with an AUC of 0.669 (Delong test, p < 0.001).Meanwhile, the Brier score of our model was 0.0308, also lower than EuroSCORE II (0.0318).The comparison of ROC curves is shown in Fig. 4B.Tables 4,5 show the AUCs and Brier scores of two models.
Interestingly, we found the difference of performance between our simplified scoring model and EuroSCORE II might increase according to the degree of HF presented by the patients.Fig. 5 shows comparisons of ROC curves between two scores validated in subgroups of different NYHA classifications.The calibration of the risk score model was good, as shown in Fig. 3, exhibiting satisfied agreement between observed and predicted probability of mortality for probabilities up to 40%, with a slight underestimation of this model for probabilities ranged from 20% to 40%.

Discussion
This risk score is an effective and simple tool for mortality prediction after valvular surgery in patients with HF.Unlike the EuroSCORE II, which has 18 predictors, this model includes only 9 predictors which are easily accessible in clinical practice.The model is convenient for clinical use and could be a reliable bedside tool.
HF is a major health-care issue that is related to high resource usage and health-care costs [3].HF is also the leading cause of hospitalization in people over the age of 65 [12].VHD is amongst the most common primary causes of HF, and many VHD patients require surgery.HF has long been a focus of clinical perioperative evaluation as an independent risk factor for cardiac surgery.The definitive    Furthermore, the prediction model is region-specific, because people in different regions of the world have distinct disease features [13], and there is regional variation in therapeutic concepts and techniques [6,14].Currently, the most of widely used clinical prediction models (such as EuroSCORE II and the society of thoracic surgeons (STS) score) were based on western populations.These western models may not be ideal for Asia or the Chinese population.The EuroSCORE II has underperformed in the Chinese suspected HF population, according to our findings.Our subgroup analysis (Fig. 5) indicates that in terms of discrimination, our model was significantly better than EuroSCORE II among NYHA III or IV patients.As a result, developing a prediction model for Chinese suspected HF patients who require VHD surgery is important in clinical practice.Wessler et al. [6] published a study showing that many VHD prediction models performed poorly in validation.They suggested that one probable explanation is a lack of sample size or a poor representative of the sample population for model derivation.Fortunately, one of the advantages of our study is that the sample population is well-representative using the CCSR data.The CCSR is the largest Chinese multicenter cardiac surgery database, analogous to the STS in North America, and includes almost all high-quality cardiac hospitals in China.As a result, this risk model's validation performance was satisfactory, and considerably better than EuroSCORE II.However, pending testing and practice in real world, this score's clinical significance will have to be determined for other populations.
Preoperative renal function indicators (eGFR and prior dialysis) and cardiac function indicators (NYHA IV and LVEF <35%) account for the large proportion of predictors in this model.The renal function indicators have the highest weights, implying that renal function has a significant impact on the prognosis of HF patients.Metra M et al. [15] suggested that worse renal function might result in poorer clinical outcomes in HF patients.And a metaanalysis [16] showed inadequate renal function was found in 23% of the HF patients, and was associated with a twofold increased risk of all-cause death, with greater magnitude of the association whether LVEF was higher.In those who suffer from HF, preoperative CKD is a predictor of poor outcomes in all cardiac surgical patients and nearly all cardiac surgery prediction models include preoperative renal function as a predictor.In contrast to coronary artery bypass grafting (CABG) surgery, there are many different valvular surgical techniques.The surgical method, however, is not an independent risk factor for postoperative mortality, as shown in this study.It indicates that, as valvular surgical techniques have advanced in recent years, the effects of disparities between surgical methods on progno-sis are decreasing, highlighting the importance of the patient's underlying medical conditions in determining prognosis.
The predictive probability of many existing VHD surgical risk models is not good [7,17].One of the reasons might be the diversity of VHD surgical methods and the relatively small sample sizes for model derivation [18].Some investigations had proposed that, in addition to traditional factors, risk models should include more predictors to increase their effectiveness [7].Given the vast number of intraoperative uncertainties in VHD surgery, we added certain essential intraoperative predictors to the model, in addition to some fundamental preoperative variables, to improve the model's prediction capability.CPB time, for example, was chosen as a predictor in this study because CPB time provides a thorough reflection of surgical complexity and surgeon proficiency.The longer the CPB time, the more complex the surgery and/or the less skilled the surgeon.
Our risk score model's primary goal is to offer patients and health care practitioners more accurate information about the risk of VHD surgery and to aid in decisionmaking.This simplified score model is simple to use, as it is based on nine predictors that are routinely accessed in VHD patients.When considering VHD surgery, it aids in stratifying the risk of mortality.

Limitations
There is still a gap in nationwide representativeness between CCSR and STS.CCSR includes only data from high-quality cardiac centers in China, hospitals with lower operation volumes are not included.A definitive diagnosis of HF requires objective laboratory and ultrasound indicators.Unfortunately, there are many missing data of these indicators in the current database.Although this model can be used for preoperative evaluation, it is not a complete preoperative evaluation model due to the inclusion of intraoperative predictors.This model also needs external validation in real world practice to evaluate its clinical applicability.In addition, although all patients had HF at admission, after preoperative medical treatment, some of them had improved cardiac function by the time of surgery, and this updated information might not be collected in time.As a result, the data in CCSR might not actually reflect the latest status of every patient before surgery, and is one of the major limitations of this study.

Conclusions
The new risk score is an effective and concise tool that could accurately predict postoperative mortality rates in suspected HF patients after valve surgery.

Fig. 3 .
Fig. 3. Calibration curves of the risk score model: predicted vs. actually observed probability of mortality (A: training group; B: testing group).

Fig. 4 .
Fig. 4. Receiver operating characteristic curves from final logistic model: simplified risk score model and EuroSCORE II (A: training group; B: testing group).

Table 5 . Brier scores of final multivariate logistic model, simplified risk score model and EuroSCORE II.
The target population of this study was therefore identified as suspected HF, which could be quickly diagnosed based on symptoms, signs, and valvular abnormalities, thereby enhancing the clinical application of this model.Previous prediction models may no longer be able to reliably estimate current surgical risk due to improvements in surgical techniques and perioperative treatment.Prediction models are time-sensitive: an excellent prediction model must be continuously updated.For instance, consider EuroSCORE II was released in 2012, and has nearly fully replaced EuroSCORE I, which was first published in 1999.Therefore, in our study, we developed a prediction model based on the most recent clinical data that could objectively reflect current VHD features and surgical outcomes.
diagnosis of HF, on the other hand, is challenging, especially for HF with preserved LVEF, which necessitates a combination of clinical symptoms and signs, as well as a variety of objective laboratory and ultrasound indicators.Many patients cannot receive a precise diagnosis of HF prior to surgery due to the wide discrepancies in preoper-ative examination of VHD patients among different cardiac institutes in China.