Machine Learning Model for Predicting Risk of In-Hospital Mortality after Surgery in Congenital Heart Disease Patients

Background: A machine learning model was developed to estimate the in-hospital mortality risk after congenital heart disease (CHD) surgery in pediatric patient. Methods: Patients with CHD who underwent surgery were included in the study. A Extreme Gradient Boosting (XGBoost) model was constructed based onsurgical risk stratification and preoperative variables to predict the risk of in-hospital mortality. We compared the predictive value of the XGBoost model with Risk Adjustment in Congenital Heart Surgery-1 (RACHS-1) and Society of Thoracic Surgery-European Association for Cardiothoracic Surgery (STS-EACTS) categories. Results: A total of 24,685 patients underwent CHD surgery and 595 (2.4%) died in hospital. The area under curve (AUC) of the STS-EACTS and RACHS-1 risk stratification scores were 0.748 [95% Confidence Interval (CI): 0.707–0.789, p < 0.001] and 0.677 (95% CI: 0.627–0.728, p < 0.001), respectively. Our XGBoost model yielded the best AUC (0.887, 95% CI: 0.866–0.907, p < 0.001), and sensitivity and specificity were 0.785 and 0.824, respectively. The top 10 variables that contribute most to the predictive performance of the machine learning model were saturation of pulse oxygen categories, risk categories, age, preoperative mechanical ventilation, atrial shunt, pulmonary insufficiency, ventricular shunt, left atrial dimension, a history of cardiac surgery, numbers of defects. Conclusions: The XGBoost model was more accurate than RACHS-1 and STS-EACTS in predicting in-hospital mortality after CHD surgery in China.


Introduction
Congenital heart disease (CHD) is the most common congenital malformations.The prevalence of CHD at birth is about 75-90/10,000 for live births and total pregnancies, with CHD occurring in approximately 1% of live births and 10% of aborted fetuses [1,2].In addition, CHD is the leading cause of mortality in children with birth defects [3] and affects 0.7% children born in China [4].The risk of mortality in Chinese children with CHD has been increasing [5].
Surgery has been the cornerstone in the treatment of patients with CHD [6].Without interventions, patients with CHD will experience significant mortality.In developed countries, surgery has greatly improved the outcome of patients with CHD and significantly reduced the mortality rate [7].However, approximately 20% children who undergo surgery for pediatric CHD are readmitted within 30 days, and 4.2% patients who undergo surgery for CHD die [8,9].Early mortality after cardiac surgery in the neonatal period is approximately 10% [1].
Risk of death in CHD patients is associated with complexity of surgical procedures [10].Accurate prediction of in-hospital death is important to facilitate clinical decisions-making for the performance of certain procedures and improve patient's outcome [11].Several major risk stratification categories are currently available for the prediction of mortality and morbidity in children undergoing surgery for CHD-Risk Adjustment in Congenital Heart Surgery-1 (RACHS-1) [12], Aristotles Basic Complexity, and Aristotles Comprehensive Complexity [13], Society of Thoracic Surgery-European Association for Cardiothoracic Surgery (STS-EACTS) Congenital Heart Surgery (STAT) Mortality Categories [14].These risk adjustment categories have been developed based on projections of risk or complexity and heavily rely on expert experience and consensus [15].These traditional tools focus on surgical procedure categories and do not include sufficient individual patient risk factors.Therefore, it may have lower predictive accuracy for individual patients.The prognosis should be determined by combined analysis of multiple features.Thus, it is of great clinical significance to build a prediction model that includes multiple important clinical features.Some studies have shown performance of machine learning-assisted tools were better than standard scoring systems [16,17].Machine learning has the advantage of flexibility and scalability compared to traditional biostatistical methods [18].It is well suited for complex multidimensional data and may uncover interactions that hard to be identified and illustrated through classic statistical analysis [19].Extreme Gradient Boosting (XGBoost) is a machine learning algorithm.It is an implementation of Gradient Boosting that was originally started as a research project by Tianqi Chen as part of the Distributed (Deep) Machine Learning Community (DMLC) group at the University of Washington [20].Currently being the fastest and the best open source boosting tree toolkit, XGBoost has made many optimizations, such as significant improvements in model training speed and accuracy.Kilic evaluated the predictive performance of XGBoost model for risk of death after cardiac surgery, and found that the XGBoost model was superior in predictive performance compared to Society of Thoracic Surgeons Predicted Risk of Mortality (STS-PROM) score [21].Zeng et al. [22] showed that a XGBoost model has better prediction performance for predicting postoperative complications than other traditional risk adjustment models after paediatric cardiac surgery.However, research on the application of machine learning model for the prediction of mortality risk in children with CHD is lacking, especially in China.
The aim of the study was to establish and validate a XGBoost model for predicting the in-hospital mortality risk in pediatric CHD surgery, and to compare the predictive value of the XGBoost model with the RACHS-1 and STS-EACTS categories.

Study Design and Population
Patients aged 0-18 years who were diagnosed with CHD and underwent CHD surgery at Shanghai Children's Medical Center, School of Medicine, Shanghai Jiaotong University between January 1, 2006 and December 31, 2017 were included.For patients with multiple surgical records within a month, only the information of the last surgical record was extracted, and the previous surgical records were regarded as "operation history".The exclusion criteria included general thoracic surgery (not involving cardiac surgery), patients with incomplete or missing inhospital survival records, and surgical procedures that were performed in less than 3 patients.Our study was approved by the Ethical Committee of Shanghai Children's Medical Center, School of Medicine, Shanghai Jiaotong University.As our study only involved a retrospective review of previous clinical data, the requirement for informed consent was waived.

Data Source and Extraction
The database was constructed by merging information from multiple data sources, including the laboratory information management system, hospital information system, intensive care unit database, clinical data repository, and the surgical record database of the cardiac surgery department in Shanghai Children's Medical Center.We built a feature engineering pipeline to load and transform clinical data during and before CHD surgery for each individual.The collected data were divided into five categories as follows: (i) demographic data, such as sex, body mass index, and age; (ii) preoperative clinical factors, including diagnosis, numbers of defects, pulse oxygen saturation, a history of cardiac surgery (any prior cardiac surgeries), numbers of defects, non-cardiac malformations, and other risk factors; (iii) complexity of the CHD surgery according to RACHS-1 and STS-EACTS morbidity categories; (iv) cardiac Doppler ultrasound data; and (v) preoperative laboratory test results, including routine blood test findings, liver function test results, and coagulation index.Variables with more than 30% missing values were excluded.

In-Hospital Mortality and Estimation of Mortality Rates
The study endpoint was in-hospital mortality, defined as death due to any cause during hospitalization after surgery.The cause of death was defined as the disease, situation, or occurrence that causes a series of events, ultimately result in death [23].And the cause of death in this study included cardiac, peri-operative, vascular and non-cardiovascular causes.Cardiac deaths included sudden death, documented ventricular arrhythmias, heart failure, infective endocarditis and myocardial infarction [23,24].Vascular death included haemorrhage, stroke, rupture of aneurysm, pulmonary embolism, and dissection [23,24].Non-cardiovascular death included malignancy, pneumonia, sepsis (excluding endocarditis), other infections, peritonitis, hip fracture, renal failure, suicide, and unknown [23,24].
Mortality risk stratification was performed by classifying the procedures into clusters based on estimated mortality, following the statistical method proposed by a previous study [14].First, we used a Bayesian random effect model to calculate the posterior probability distribution of the mortality rates of all procedures.Second, a homogeneity criterion was used to evaluate a partition scheme, which measured the within-category homogeneity of the mortality rates.The optimal partition solution to maximize the homogeneity criterion can be achieved using a dynamic programming algorithm.Finally, we successively performed the abovementioned calculations for 2-20 categories to determine the number of categories.The optimal category number was determined using of the Bayesian information criterion, a trade-off between homogeneity and partition complexity.All procedures were finally categorized into five relatively homogeneous categories.According to the pseudo-code algorithm description (see Appendix of the previous study) [14] we implemented handcrafted codes of the stratification computation pipeline using Python language (version 3.7.6,Python Software Foundation, Wilmington, DE, USA).

Construction of an In-Hospital Death Predictive Model Using a Machine Learning Algorithm
We used the XGBoost algorithm to build a in-hospital mortality predictive model for children with CHD.Dataset was devided into training set and testing set according to the 7.5:2.5 ratio.The training dataset was used for feature selection and model training, while the testing dataset was used for validation after model training.The importance of each feature was assessed using the recursive feature elimination (RFE) algorithm, and all features were sorted based on their level of importance.The RFE algorithm was used to recursively remove features and build a model on the remaining features.Among all possible combination of features, the model with the highest AUC was determined and the features included are eventually selected to build the XGBoost model.Furthermore, Grid Search was used to adjust the hyperparameters of model to reduce overfitting and improve the model accuracy.The stability of the model is tested by Bootstrap algorithm with random resampling of the samples, and 95% confidence interval (95% CI) was exported.Finally, we assessed the predictive power of the model using the area under the receiver operating characteristic curves (AUC), sensitivity, and specificity.Fig. 1 presents the whole process described above.The XGBoost was developed in Python language (version 3.

Statistical Analysis
Continuous variables are described as the median (range); all were non-normally distributed.Categorical variables are described using frequency (%).To assess the distributive balance between the training and valida-tion sets, comparisons between groups were performed using the Mann-Whitney U test, Fisher's exact test, and the chi-square test, as appropriate.The Area under receiver operating characteristic (ROC) curve (AUC) with 95% confidence interval (CI) was calculated to evaluate the predictive power.In addition, the optimal threshold was chosen by maximizing the Youden Index.The sensitivity and specificity of the predictive model were obtained based on the threshold.All statistical tests were two sided, and p-value of <0.05 was considered statistically significant.All analyses were performed using SAS software (SAS Institute Inc., Cary, NC, USA), version 4.2.0.

Characteristics and Stratification of Surgical Procedure
A total of 24,685 patients underwent surgery for CHD were included (Table 1).The mean age of the patients was 316 (1-6568) days, and 14,215 (57.59%) patients were male.A total of 591 (2.4%) in-hospital deaths occurred.Other patient characteristics are summarized in Table 1.
Comparative analysis of in-hospital mortality and risk categories for each procedure are listed in Table 2.The most common procedures included ventricular septal defect (VSD) membranous repair, tetralogy repair, and VSD subarterial repair.The RACHS-1 categories consist of six groups labeled 1-6, and the STS-EACTS categories consists of five groups labeled 1-5, a higher number means a higher mortality risk.The risk of mortality associated with each procedure was calculated.The in-hospital mortality rate for each procedure ranged from 0-75%, and no death was recorded in 31 procedures.Mortality rates and risk stratification for specific procedures were also estimated using a Bayesian random effects model.

Importance of the Top 10 Variables in the Prediction of the XGBoost Model
The top 10 variables that contribute the most to the prediction power of the model are listed in Table 4.The higher the weight coefficient of a feature, the more significant role it plays in the model for outcome classification.The weight coefficient for saturation of pulse oxygen categories, risk categories, age, preoperative mechanical ventilation, atrial shunt, pulmonary insufficiency, ventricular shunt, left atrial dimension, a history of cardiac surgery, numbers of defect were 0.10638574, 0.07759346, 0.07303152, 0.07014898, 0.065226465, 0.05785214, 0.05760804, 0.052233107, 0.051096234, 0.0437589, respectively (Table 4).The excluded variables are listed in Supplementary Table 1.

Discussion
Prediction of in-hospital mortality risk is clinically important for directing patient postoperative management.In our study, we compared the XGBoost model with traditional tools for predicting mortality in pediatric CHD surgery.To our knowledge, this is the first study to compare machine learning algorithms with the RACHS-1 and STS-EACTS categories for the prediction of in-hospital mortality risk in pediatric CHD surgery.And we found that in children with CHD of China, the XGBoost model was more accurate in predicting in-hospital mortality for CHD surgery than in the RACHS-1 and STS-EACTS categories.
In our study, the in-hospital mortality rate after CHD surgery was 2.4%, which is consistent with that previously reported in China [5] and in western countries [25][26][27], but much lower than that reported in developing countries [28,29].Cardiac surgeons use traditional tools such as the RACHS-1 and STS-EACTS to report patient outcomes.The RACHS-1 categories were constructed based on a combination of the opinions of 11 experts and empirical data to predict in-hospital mortality [12,30].The RACHS-1 categories classifies procedures into six levels of risk of mortality based on a few clearly defined criteria.In previous studies, the risk of mortality from CHD surgery with different RACHS-1 stratification ranged from 0.26% to 62% [31][32][33], which is consistent with the findings of our study.An objective, empirically based tool named STS-EACTS, without the input of an expert panel, has been developed for analyzing in-hospital mortality associated with CHD surgery [14,34].However, the RACHS-1 and STS-EACTS categories only consider procedural characteristics and ignore individual patient characteristics.Both categories lack precision when estimating the risk for individual patients.In addition, some procedures (19%) in our cohort could not be classified using the RACHS-1 or STS-EACTS categories.Due to the enormous differences in patient-related variables and CHD surgery, it is difficult to establish a mortality risk prediction model.
Hence, one of the greatest challenges in medicine is how to deal with individuals who have the same disease but with different manifestations.Machine learning algorithms may provide a solution for this problem.Furthermore, previous study further demonstrated that machinelearning methods, especially gradient boosting models, are promising to outperform existing clinical risk scores [35].Machine learning models have currently been explored as a tool to predict mortality, morbidity, and complications in patients with CHD [22,36,37].A XGBoost model was constructed based on surgical risk stratification and patient preoperative variables in this study.According to previous studies, the AUC of the STS-EACTS and RACHS-1 categories to predict the mortality and complications of CHD surgery in children was 0.68-0.79[14,22,28,38,39].In our study, the AUC of the XGBoost model was 0.887, which was better than that of the STS-EACTS (AUC = 0.748) and RACHS-1 (AUC = 0.677) categories.The results showed that the XGBoost model was able to predict in-hospital mortality with improved predictive power compared to the STS-EACTS and RACHS-1 models.
In addition to surgical risk stratification, patientrelated variables also had a significant impact on the performance of the postoperative mortality risk prediction model.The XGBoost model incorporates demographic characteristics, preoperative echocardiography characteristics, and laboratory examination results into the final predictive model.We analyzed the importance of these risk factors in the XGBoost model.Our results showed that saturation of pulse oxygen categories had the greatest impact on the predictive performance of the model.Previous studies have shown that oxygen saturation correlates with mortality in children undergoing CHD surgery [40,41].A decrease in oxygen saturation at 24 hours after the operation may increase the mortality rate of newborns with cyanotic cardiopathies [42].
For patients with CHD who have received cardiac intervention, preoperative mechanical ventilation and RACHS categories may affect their mortality during hospitalization [43].Mechanical ventilation has been shown to be a strong predictor of in-hospital mortality in children with noncardiac surgery [44].In addition, study has found that newborns with severe CHD may have increased mortality if they need unplanned cardiac reintervention, and according to the results of multifactorial analysis, mechanical   ventilation before heart intervention and the larger RACHS-1 category are independent risk factors for unplanned cardiac re-intervention [45].In this study, we also found that risk categories and preoperative mechanical ventilation are top influencing factors of the predictive performance in XGBoost model.This reminds clinicians to pay more attention to the prognosis of these patients whose surgery is at greater risk category or who require preoperative mechanical ventilation.
Age, as an important factor in the mortality rate of children with CHD, has also been mentioned in several studies.A study in Taiwan Province of China found that the majority (i.e., more than 90%) of CHD deaths occur within the first 5 years of life (mainly in infancy) [46].The study showed that the mortality rate of CHD patients has a downward trend with the increase of age [47].Our study also found that age is an influencing factor of in-hospital mortality in children with CHD and made a great contribution to the model.Clinicians should pay more attention to younger CHD patients in practice.
In addition to the above factors, the results of the machine learning model in this study also show that atrium shunt and ventricular shunt may also be the factor of inhospital mortality.The reason may be that these factors affect the occurrence of postoperative complications in patients.Low cardiac output syndrome (LCOS) is a common life-threatening postoperative complication of heart disease that may contributes to postoperative morbidity and mortality [48][49][50][51].Atrial shunt and ventricular level shunt were all independent risk predictors of LCOS [52].Bangrong Song et al. [51] believed that more attention should be paid to CHD patients age ≤4 years, preoperative oxygen saturation ≤93%, CPB duration ≥60 minutes, two-way ventricular shunt, postoperative residual shunt to improve the prognosis of these patients.In addition, study has found that left atrium dimension is associated with long-term adverse outcomes (hospitalization due to heart failure, all-cause mortality, new-onset atrial fibrillation, and/or embolic stroke during follow-up) of rheumatic heart disease [53].
In this study, we found that the number of defects influences in-hospital mortality in patients with CHD.The reason may be that if the patient carries multiple CHD at the same time, the prognosis is much worse than that of the patient with a single type of CHD.Presently, the most common types of CHDs include ventricular septal defect (VSD), patent ductus arteriosus (PDA), secundum atrial septal defect (ASDII), pulmonary stenosis (PS) and tetralogy of Fallot (TOF), and the incidence of each CHD is different [54].Several studies have further divided the CHD population into simple and severe CHD groups [54,55], and it is found that individuals with simple CHD (e.g., VSD or ASD) have higher survival rates, almost similar to normal populations [47].However, the prognoses of patients with severe CHD varies widely [47].
Clinicians should focus on the top variables in the model to improve patient outcomes by dealing with variables that can be managed.For example, on the premise of ensuring the treatment effect, surgery with low-risk categories should be selected to improve the postoperative survival rate of patients.For the CHD patients with abnormalities in the above top indicators should be paid closer attention to their postoperative recovery status, which is the important significance of this study to propose this model.
This study has some limitations.First, this study is a single-center retrospective study.However, this is the first study to use a machine learning algorithm to predict mortality in pediatric CHD surgery in a large sample.In addition, our center is the most famous treatment center for children with congenital heart disease in China, with patients from most provinces in China.Second, the in-hospital mortality rates recorded in this study may not include all operationrelated deaths, and need to include data from patients after discharge.This needs to be addressed in a more complete data source.Third, there was an imbalance in the number of patients who died and survivors in this study, and we did not perform a 1:N case-control match at the time of patient enrollment.During model building, we try to deal with the imbalance of samples in the training set, we tried to process the unbalanced classes and tuned the parameter 'scale_pos_weight', a parameter adjusting the balance of positive and negative weights in the XGBoost package.However, this did not improve the predictive performance of the model.

Conclusions
In conclusion, our single-center study of 24,685 patients demonstrated that using a combination of procedure complexity categories and preoperative patient-level factors, the XGBoost model had higher accuracy in in-hospital mortality prediction than both the RACHS-1 and STS-EACTS categories.In clinical practice, machine learning models can be established based on the surgical database for risk prediction to improve cardiac surgical care.

Fig. 2 .
Fig. 2. Comparison of the prediction values of the XGBoost model, STS-EACTS categories, and RACHS-1 categories in the testing set.