1 Department of Obstetrics, The Affiliated Hospital of Qingdao University, 266003 Qingdao, Shandong, China
2 Department of Gynecology, The Affiliated Hospital of Qingdao University, 266003 Qingdao, Shandong, China
Abstract
The application of artificial intelligence (AI) in medicine has advanced significantly, particularly in obstetrics, where it plays an increasingly prominent role in predicting modes of delivery and assessment of maternal risks. AI-assisted prediction of delivery modes, a cutting-edge field at the intersection of medicine and computer science, aims to support clinicians in making more accurate and safer delivery decisions by utilizing advanced AI technologies and big data analytics. With increasing individual variability among pregnant women, traditional clinical experience is often insufficient to meet the requirements of personalized medicine; therefore, establishing a scientific prediction model is particularly crucial. This systematic review aims to evaluate the current state of research on AI-assisted prediction of delivery modes, compare AI predictions and traditional statistical methods, and propose future research directions.
A comprehensive literature search was conducted in the PubMed, Web of Science, and ScienceDirect databases, encompassing publications up to November 2024.
Analysis of existing studies demonstrates that AI models outperform conventional statistical methods in predicting delivery modes, highlighting their potential as valuable tools in obstetric diagnosis and clinical decision-making. However, several critical limitations persist in current research, including: (a) the absence of real-time decision support during dynamic labor progression; (b) insufficient multi-center collaboration and a lack of external validation frameworks; and (c) inadequate standardization of clinical parameters (e.g. inconsistent definitions of cervical dilation thresholds and fetal descent metrics). These methodological gaps limit the clinical applicability and generalizability of AI-driven predictive systems across diverse obstetric populations and care settings.
Future research should prioritize data standardization and sharing, enhance the generalizability of prediction models, address ethical considerations, and ensure the fairness and transparency of AI algorithms to improve clinical trust and applicability.
The study has been registered on https://www.crd.york.ac.uk/prospero/ (registration number: CRD420251068005).
Keywords
- artificial intelligence
- mode of delivery prediction
- cesarean section
- natural childbirth
- obstetrics
- machine learning
Artificial intelligence (AI) is a broad and all-encompassing term. Machine learning (ML), deep learning, and natural language processing (NLP) are its subtypes [1]. The widespread use of AI has completely transformed various fields of life, including business and trade, social and electronic media, education and learning, manufacturing, and medicine [2]. Particularly in the medical field, the application of AI in predicting and assisting in guiding diagnosis and treatment, providing personalized medical services for patients, and other aspects is becoming increasingly common. With the continuous development and improvement of AI, its status in the medical field is also rising. Currently, AI often assists in medical decision-making by extracting features from large and complex data sets. Current research proposed that AI has provided assistance and convenience in multiple medical fields, such as disease prediction, decision-making based on extracted medical features, and patient management [2]. Additionally, with the continuous improvement of electronic health records and the increase in available data, AI is increasingly being used to establish predictive models to assist in medical decision-making and clinical consultations [3]. These predictive models are the foundation of biomedical research and are used as an indispensable part of the clinical decision-making process [4].
As the application of AI in the medical field deepens, especially in obstetrics, its contribution to predicting delivery methods and assessing maternal risks is becoming increasingly prominent [5]. For example, in the prediction of preterm birth, Chen H-Y et al. [6] employed neural networks and decision tree algorithms to identify factors associated with preterm delivery. Rawashdeh H et al. [7] utilized random forest (RF), decision trees (DT), K-nearest neighbors (KNN), and neural networks (NN) to assess the risk of preterm birth. In the context of shoulder dystocia prediction, Tsur A et al. [8] developed and externally validated a machine learning model integrating maternal risk factors with fetal biometric parameters through biostatistical methods to forecast shoulder dystocia. Furthermore, AI has been increasingly applied in predicting the mode of delivery. Currently, the main delivery methods are vaginal delivery and cesarean section. When complications occur during vaginal delivery, vacuum extractions and obstetric pincers can be chosen [9]. How to assist pregnant women in choosing the most appropriate delivery method is a fundamental capability that obstetricians must possess. The choice of delivery method mainly relies on the experience of obstetricians and the results of auxiliary examinations, which require high experience from the physicians and are subject to subjectivity and uncertainty, lacking certain objective data support. In addition, the subjective feelings of pregnant women also affect the choice of delivery method. Usually, due to factors such as unbearable pain and fear, the rate of cesarean section increases. Therefore, for obstetricians, accurately predicting the delivery method remains a challenge [10]. Accurate prediction of vaginal delivery and cesarean section can reduce unnecessary medical intervention, optimize maternal and neonatal outcomes, improve delivery prognosis, and lower medical costs.
Currently, the prediction of delivery methods mainly relies on clinical judgment and traditional statistical methods, which have limitations. Traditional statistical methods can only include a limited number of variables and may not fully capture the complex interactions among various risk factors, making them susceptible to subjective biases. Limited data also restricts the ability to conduct a comprehensive assessment. Moreover, the application of traditional statistical methods in predicting delivery methods is also limited, which leads to restrictions on the amount of data analyzed in a single analysis and the ability to analyze complex data sets. In contrast, AI, as a tool with the ability to analyze complex data sets and identify complex patterns, provides a more powerful and accurate solution for assisting in predicting delivery methods. Recent studies have shown that ML algorithms and other AI technologies can effectively predict delivery methods by analyzing various factors, including maternal age, body mass index, fetal position, and previous pregnancy and childbirth history [11]. AI-assisted prediction of obstetric delivery methods, as a frontier field of intersection between medicine and computer science, aims to assist clinicians in making more accurate and safe delivery decisions by applying advanced AI technologies and big data analysis methods [1]. With the increasing individual differences among pregnant women, traditional clinical experience makes it difficult to meet the needs of personalized medical care, making the establishment of scientific predictive models particularly important. With the rapid advancement and extensive application of AI, AI-assisted prediction of delivery methods has emerged as a crucial tool for facilitating the professional growth of obstetricians and aiding in their judgment. The existing research mainly concentrates on predicting delivery methods, vaginal delivery, cesarean section, and vaginal delivery after cesarean section.
This review aims to analyze the current research status of AI applications in assisting the prediction of delivery methods, identify the shortcomings in current research, and propose future research directions to improve the application of AI in assisting the prediction of delivery methods. Ultimately, this will help improve the health outcomes of pregnant women and fetuses and enhance the health levels of mothers and infants.
A comprehensive search strategy was employed to identify relevant studies on the application of AI in predicting the mode of delivery. The databases searched included PubMed, Web of Science, and ScienceDirect. The search terms used were combinations of artificial intelligence, machine learning, mode of delivery, cesarean section, vaginal delivery, andobstetrics. Boolean operators (AND, OR) were utilized to refine the search. The search was limited to articles published from inception up until November 2024 to ensure the inclusion of the most recent and relevant studies. In addition, reference lists of identified articles were further reviewed for inclusion. This study was previously registered with International Prospective Register of Systematic Reviews (PROSPERO) (CRD: 420251068005) and followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
In order to ensure that only suitable articles are being selected for this study, some eligibility criteria were considered. Out of the 292 studies, only 13 studies were considered for the systematic review. The few selected articles were chosen using some inclusion-exclusion criteria. A study was eligible for reviewing if it met all the following criteria: (a) investigations utilizing AI-based techniques for predicting mode of delivery; (b) full-text research articles; and (c) publications dated between 2000 and 2020. Exclusion criteria comprised: (a) abstract-only studies; (b) duplicate publications; and (c) non-English language articles. If an article undeniably met one or more of these criteria, it was ruled out from later review. The summary of the search and selection of final articles are illustrated in Fig. 1. The papers were selected by focusing on the abstract and introduction mainly. 292 research works were discovered as primary materials during the preliminary search. 58 articles were chosen after duplicates, non-English articles were removed. After evaluating articles’ titles and abstracts, the first level of screening yielded 121 articles excluding 113 articles. Following that, after reading the abstract and introduction, and methodology, the next level of screening was carried out, yielding a list of 13 articles that were selected for the final review analysis. The study characteristics (e.g., AI/ML models used, sample size, outcome measures, performance metrics) and quality assessment results of the reviewed articles included in this systematic review are summarized in Table 1 (Ref. [2, 3, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]).
Fig. 1. Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) flow diagram for the selection of articles.
| Authors | Reference | Year | AI/ML models used | Sample size | Outcome measures | Performance metrics | NOS score |
| De Ramón Fernández A et al. | [11] | 2022 | SVM, MLP, RF | 25,038 records | Cesarean section, euthocic vaginal delivery, instrumental vaginal delivery | SVM: accuracy | 9 |
| Ullah Z et al. | [2] | 2021 | DT, RF, AdaBoostM1, Bagging,and K-NN | 80 records | Predict the mode of delivery (cesarean section, vaginal delivery) | k-NN: accuracy 84.38%; bagging: accuracy 83.75%; RF: accuracy 83.13%; DT: accuracy 81.25%; AdaBoostM1: accuracy 80.63% | 5 |
| Kuanar A et al. | [12] | 2024 | DNN | 101 records | Predict the mode of delivery (cesarean section, vaginal delivery) | Train set: AUC 0.99, KS score 0.98; The prediction error rates: cesarean section 0.02, vaginal delivery 0.00 | 5 |
| Ferreira I et al. | [13] | 2025 | LR, MLP, RF, SVM, XGBoost,and AdaBoost classifiers | 2434 records | Predict vaginal delivery after labor induction | LR: AUC 0.794, sensitivity 0.766, specificity 0.910; RF: AUC 0.777, sensitivity 0.756, specificity 0.904; SVM: AUC 0.774, sensitivity 0.747, specificity 0.954; AdaBoost: AUC 0.767, sensitivity 0.753, specificity 0.890; XGBoost: AUC 0.754, sensitivity 0.738, specificity 0.863; MLP: AUC 0.744, sensitivity 0.737, specificity 0.855 | 8 |
| Wong MS et al. | [14] | 2024 | Automated ML (Partometer) | 37,932 records | Predict vaginal delivery | Partometer accuracy: 87.1%, AUC: 0.82 | 6 |
| Guedalia J et al. | [15] | 2021 | Gradient boosting (CatBoost) | 94,480 records | Predict cesarean section | Admission data only: AUC of 0.817; Real-time cervical examination data: Initial AUC of 0.819, increasing to 0.917; Real-time FHR data: Initial AUC of 0.824, increasing to 0.928; All-inclusive real-time data: Initial AUC of 0.833, increasing to 0.932 | 9 |
| Fergus P et al. | [16] | 2017 | Deep learning classifiers, Fisher’slinear discriminant analysis classifiers, and random forest classifiers | 506 controls and 46 cases | Predict cesarean section | Deep learning classification: sensitivity = 94%, specificity = 91%, Area under the curve = 99%, F-score = 100%, and mean square error = 1% | 5 |
| Nagayasu Y et al. | [17] | 2022 | Continuous recursive rule extraction (Re-RX) algorithmwith J48graft | 1513 singleton deliveries | Predict an emergency cesarean section | Average accuracy: 81.90%; AUC: 71.46% | 5 |
| Meyer R et al. | [18] | 2023 | XGBoost, DRF, GBM, XRT | 73,667 records | Predict unplanned cesarean delivery | Training data set: AUC 0.874; Validation data set: AUC 0.839; Test data set: AUC 0.84 (XGBoost) | 8 |
| Islam MS et al. | [19] | 2022 | GNB, LDA, KNN, GBC, LR | 15,409 records | Predict cesarean section | HGSORF: accuracy 98.34%; GBC: accuracy 93.20%; GNB: accuracy 87.36%; KNN: accuracy 88.32%; LDA: accuracy 91.90%; LR: accuracy 92.24% | 5 |
| Lindblad Wollmann C et al. | [3] | 2021 | Conditional inference tree, Conditional RF, Lasso binary regression | 3116 records | Predict vaginal birth after previous cesarean | AUC ranged from 0.61 to 0.69, with sensitivity (probability of correctly identifying a VBAC for second delivery) above 91% and specificity (probability of correctly identifying a repeat CD for second delivery) below 22% for all models | 9 |
| Meyer R et al. | [20] | 2022 | RF, GLM, XGBoost | 989 records | Predict successful VBAC or failed TOLAC | RF: AUC-PR 0.351, XGBoost: AUC-PR 0.350, GLM: AUC-PR 0.336; MFMU: AUC-PR 0.325 | 7 |
| Lipschuetz M et al. | [21] | 2020 | Gradient boosting | 9888 records | Predict VBAC | First-trimester model: AUC of 0.745; Pre-labor model: AUC of 0.793; stratification into risk groups with VBAC success rates of 97.3% (low), 90.9% (medium), and 73.3% (high) | 5 |
AI, artificial intelligence; ML, Machine Learning; NOS, Newcastle-Ottawa Scale; SVM, support vector machine; RF, random forest; DT, decision trees; k-NN, k-nearest neighbor; GNB, Gaussian Naive Bayes; LR, logistic regression; FHR, fetal heart rate; AUC, Area Under the ROC Curve; AUROC, areas under the receiver-operating-characteristics curve; AdaBoostM1, Adaptive Boosting version Ml; TOLAC, trial of labor after cesarean delivery; MFMU, Maternal-Fetal Medicine Units; XGBoost, eXtreme Gradient Boosting; DNN, Deep Neural Networks; LDA, Linear Discriminant Analysis; GBC, Gradient Boosting Classifier; GLM, Generalized Linear Model; CD, Cesarean Delivery; VBAC, Vaginal Birth After Cesarean; PR, Precision-Recall; KS, Kolmogorov-Smirnov statistic; HGSORF, Henry Gas Solubility Optimization-based Random Forest; MLP, Multilayer Perceptron.
The Newcastle-Ottawa Scale (NOS) was applied to evaluate study quality (Table 1), with scores
A systematic analysis of three pivotal studies [2, 11, 12] reveals distinct methodological approaches and performance outcomes in AI-driven delivery mode prediction (Table 2, Ref. [2, 11, 12]). Data were categorized into three dimensions: algorithmic architecture, dataset characteristics, and validation rigor.
| Metric | De Ramón Fernández A et al. [11] | Ullah Z et al. [2] | Kuanar A et al. [12] |
| Sample size | 25,038 (retrospective) | 80 (SMOTE-augmented) | 101 (single-center) |
| Top algorithm | Random Forest (91% Acc) | k-NN (84.38% Acc) | DNN (AUC 0.99) |
| Strengths | Large-scale validation | SMOTE efficacy proven | High theoretical AUC |
| Limitations | Static features only | High risk of overfitting | Minimal external validity |
Acc, accuracy; SMOTE, synthetic minority oversampling technique.
(a) Traditional ML Models: De Ramón Fernández A et al. [11] employed Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Random Forest (RF) on a large retrospective cohort (n = 25,038), achieving
Ullah Z et al. [2] compared five ML algorithms (DT, RF, AdaBoostM1, Bagging, k-NN) on a small enriched dataset ((synthetic minority oversampling technique) SMOTE-augmented n = 80). K-NN achieved the highest accuracy (84.38%), while AdaBoostM1 performed poorest (80.63%). Data augmentation improved model performance by 3–5%, highlighting its utility in addressing class imbalance.
(b) Deep Learning (DL) Innovations: Kuanar A et al. [12] pioneered Deep Neural Networks (DNN) adoption (n = 101), reporting exceptional training metrics (Area Under the ROC Curve (AUC) = 0.99, Kolmogorov-Smirnov statistic (KS) = 0.98) but limited external validity due to minimal sample size. Prediction error rates for cesarean and vaginal delivery were 0.02 and 0.00, respectively, though these results may reflect overfitting.
(a) Scale Disparity: Large-scale studies [11] (n = 25,038) demonstrated stable performance (AUC 0.87–0.91), whereas small cohorts [2, 12] (n = 80–101) exhibited inflated metrics (AUC up to 0.99), likely due to limited generalizability.
(b) Feature Engineering: Maternal age, Body Mass Index (BMI), and parity (Static Parameters) dominated input features across studies [2, 11]. None incorporated real-time intrapartum progression metrics (e.g., cervical dilation rate), constraining clinical utility [11].
(a) Internal Validation: All studies used random splits, with only De Ramón Fernández A et al. [11] achieving NOS = 9 through rigorous sensitivity analysis.
(b) Temporal/External Gaps: No study implemented prospective temporal validation, and inter-hospital transportability tests were absent.
A synthesis of two seminal studies [13, 14] demonstrates advancements and limitations in AI-driven vaginal delivery prediction, categorized into algorithmic innovation, dynamic data integration, and clinical validation (Table 3, Ref. [13, 14]).
| Metric | Ferreira I et al. [13] | Wong MS et al. [14] |
| Sample size | 2434 (retrospective) | 37,932 (real-time intrapartum) |
| Top algorithm | Logistic Regression (AUC 0.79) | AutoML (AUC 0.82) |
| Key predictors | Bishop score, maternal height | Cervical dilation rate |
| Strengths | High interpretability | Dynamic data integration |
| Limitations | Static features only | Limited model transparency |
AutoML, automated machine learning.
(a) Traditional ML Models: Ferreira I et al. [13] developed a multivariable logistic regression (LR) model using retrospective data from singleton term pregnancies (n = 2434). The LR model achieved an areas under the receiver-operating-characteristics curve (AUROC) of 0.794, with Bishop score, maternal height, and inter-delivery interval identified as top predictors through SHAP analysis. Despite moderate discrimination, the model prioritized clinical interpretability over complex architectures. This study excluded real-time intrapartum parameters, relying solely on static admission features, which constrained its utility in dynamic labor management.
(b) Automated Machine Learning (AutoML) Innovations: Wong MS et al. [14] pioneered AutoML (Partometer) using real-time intrapartum data (n = 37,932). The model achieved 87.1% accuracy and AUC = 0.82 in predicting vaginal delivery within 4 hours of admission. Key dynamic predictors included cervical dilation rate and fetal head descent, underscoring the value of temporal feature integration. While AutoML streamlined model development, the “black-box” nature of feature selection reduced clinician interpretability, a critical barrier to adoption.
(a) Scale and Diversity: Large-Scale Dynamic Data [14]: The inclusion of 37,932 records with real-time monitoring metrics provided robust statistical power, though the cohort was limited to primiparous women in high-resource settings.
Static Data Limitations [13]: Despite a moderate sample size (n = 2434), reliance on retrospective, single-center data impeded generalizability to diverse obstetric populations.
(a) Internal Validation: Both studies employed cross-validation [13, 14], with Wong MS et al. [14] achieving superior discrimination (AUC = 0.82) due to dynamic feature inclusion.
(b) Temporal and External Gaps: No External Validation: Neither study tested models across institutions or regions, risking overfitting to local practice patterns.
Real-World Implementation: While Wong MS et al. [14] demonstrated the feasibility of real-time prediction, the absence of clinician-AI interaction protocols limited practical utility.
A synthesis of five pivotal studies [15, 16, 17, 18, 19] reveals diverse algorithmic strategies and challenges in AI-driven cesarean section (CS) prediction, categorized into model architecture, dataset dynamics, and validation rigor (Table 4, Ref. [15, 16, 17, 18, 19]).
| Metric | Guedalia J et al. [15] | Fergus P et al. [16] | Nagayasu Y et al. [17] | Meyer R et al. [18] | Islam MS et al. [19] |
| Sample size | 989 (inter-hospital) | 552 (FHR signals) | 1513 (single-center) | 73,667 (multi-center) | 15,409 (balanced) |
| Top algorithm | RF (AUC-PR 0.351) | Deep Learning (AUC 0.99) | Re-RX (Acc 81.9%) | XGBoost (AUC 0.84) | HGSORF (Acc 98.34%) |
| Key predictors | Fetal head position | FHR variability | Bishop score, parity | Maternal BMI | Placental markers |
| Strengths | Transportability focus | High sensitivity | Rule-based clarity | Large-scale validation | XAI interpretability |
| Limitations | Performance variability | Small sample size | Low AUC | Static features | Limited external tests |
XAI, explainable AI; Re-RX, recursive-rule eXtraction.
(a) Traditional ML Models: Meyer R et al. [18] employed XGBoost on a large cohort (n = 73,667), achieving AUC = 0.84 for unplanned CS prediction. Feature importance analysis identified maternal BMI and labor progression rate as top predictors, aligning with clinical intuition.
Islam MS et al. [19] proposed the Henry Gas Solubility Optimization-based Random Forest (HGSORF) algorithm (optimized RF), reporting 98.34% accuracy on a balanced dataset (n = 15,409). Explainable AI (XAI) analysis revealed placental insufficiency and uterine contractility patterns as critical drivers, enhancing model interpretability.
(b) Deep Learning (DL) and Hybrid Models: Fergus P et al. [16] applied deep learning classifiers to fetal heart rate (FHR) signals (n = 552), achieving 94% sensitivity and 91% specificity. AUC = 0.99 underscored DL’s potential but raised concerns about overfitting in small samples.
Nagayasu Y et al. [17] utilized the recursive-rule eXtraction (Re-RX) rule extraction method (n = 1513), yielding 81.9% accuracy and AUC = 0.71. While interpretable, the model’s lower discrimination highlighted trade-offs between simplicity and predictive power.
(c) Cross-Institutional Validation: Guedalia J et al. [15] tested model transportability between hospitals, finding performance drops (
(a) Scale and Diversity: Large-Scale Cohorts [18, 19]: Studies with
(b) Dynamic Data Integration: Only Fergus P et al. [16] incorporated real-time FHR signals, while others relied on static admission parameters (e.g., parity, BMI) [17, 18, 19].
(a) Internal Validation: All studies used cross-validation [15, 16, 17, 18, 19], with Islam MS et al. [19] achieving the highest accuracy (98.34%) through Adaptive Synthetic Sampling Approach (ADASYN)-balanced data.
(b) External and Temporal Gaps
Limited Generalizability: Only Guedalia J et al. [15] addressed inter-hospital variability, revealing institutional bias as a critical barrier.
Real-Time Application: Despite high accuracy, no study has implemented prospective real-time prediction in clinical workflows.
A synthesis of three pivotal studies [3, 20, 21] highlights advancements and persistent challenges in AI-driven VBAC prediction, categorized into algorithmic strategies, dataset robustness, and clinical validation (Table 5, Ref. [3, 20, 21]).
| Metric | Lindblad Wollmann C et al. [3] | Meyer R et al. [20] | Lipschuetz M et al. [21] |
|---|---|---|---|
| Sample size | 3116 (population-based) | 989 (single-center) | 9888 (multi-stage) |
| Top algorithm | Conditional RF (AUC 0.69) | XGBoost (AUC-PR 0.351) | Gradient Boosting (AUC 0.79) |
| Key predictors | Prior vaginal delivery | Maternal age, BMI | Gestational age, parity |
| Strengths | Population diversity | Model simplicity | Dynamic risk stratification |
| Limitations | Low specificity | Limited feature depth | Retrospective data |
(a) Traditional ML Models: Lindblad Wollmann C et al. [3] compared ML models (conditional RF, lasso regression) with existing clinical scores (Swedish cohort, n = 3116). All models achieved AUROC 0.61–0.69, with sensitivity
Meyer R et al. [20] implemented RF and XGBoost (n = 989), reporting AUC-PR 0.351 (RF) vs. 0.325 (MFMU model). The XGBoost model required only 8 variables, emphasizing parsimony but sacrificing nuanced risk stratification.
(b) Dynamic Risk Stratification: Lipschuetz M et al. [21] developed gradient boosting models using first-trimester and pre-labor data (n = 9888). The pre-labor model achieved AUC = 0.793, significantly outperforming the first-trimester model (AUC = 0.745). Risk stratification categorized 42.4% of women as low-risk (VBAC success 97.3%, demonstrating clinical utility for personalized counseling.
(a) Scale and Diversity: Large National Cohorts [3]: Population-based data (n = 3116) enhanced generalizability but lacked granular intrapartum metrics (e.g., cervical dilation trends).
Real-Time Data Gaps: All studies relied on retrospective, static parameters (e.g., prior vaginal delivery, maternal BMI), neglecting dynamic labor progression [20, 21].
Geographic Bias: Studies focused on high-income populations (Sweden [3], Israel [21]), limiting applicability to low-resource settings.
(a) Internal Validation: Lindblad Wollmann C et al. [3] used cross-validation but reported low specificity (22%), reducing clinical confidence in avoiding unnecessary cesareans.
Lipschuetz M et al. [21] demonstrated temporal validity with pre-labor data integration, aligning closer to clinical workflows.
(b) External and Practical Gaps: No Inter-Hospital Testing: Despite Meyer R et al. [20]’s multi-algorithm comparison, no study validated models across institutions, risking practice pattern overfitting.
Patient-Clinician Discordance: Meyer R et al. [20] noted 28% patient refusal of VBAC attempts due to anxiety, a psychological factor absent in AI frameworks.
AI is a transformative technology that aims to simulate, extend, and augment human intelligence through the development of advanced algorithms and data analysis techniques. It can concurrently handle and analyze a vast amount of clinical data. In facilitating the prediction of delivery methods, it can fully exploit the advantages of big data to enhance the accuracy and reliability of predictions. Moreover, AI enables individualized clinical decision-making. In the current era marked by the progressive improvement of electronic health records, AI can leverage its strengths to comprehensively analyze historical big data. Based on the specific circumstances of each pregnant woman and the current pregnancy examination data, it can offer more personalized guidance. Simultaneously, it can provide more objective support for the clinical diagnosis and treatment work of obstetricians, optimize delivery outcomes, and continuously elevate the health status of mothers and infants.
4.1.2.1 Data Quality and Security
Delivery is a dynamic process. Some unpredictable variables may appear during labour thereby affecting the final outcome. During the labor process, the selection of delivery mode and maternal-fetal outcomes are influenced by a constellation of factors, including objective maternal-fetal parameters, environmental variables, and maternal subjective perceptions. Investigations must comprehensively account for the influence of these multifaceted variables on predictive outcomes. However, inherent methodological limitations inevitably arise in such studies. Current research has predominantly focused on static data parameters, while neglecting the monitoring of dynamic physiological indicators such as fetal heart rate variability and cervical dilation progression. Future studies incorporating these time-varying parameters could significantly enhance the predictive accuracy of delivery mode outcomes.
In addition, the NOS quality assessment reveals that while single-center studies may control for confounding factors influencing delivery mode prediction during labor, multicenter study designs enhance methodological rigor. However, current healthcare data quality exhibits significant heterogeneity across medical institutions, with multimodal data (e.g., electronic health records, imaging, and monitoring signals) suffering from inconsistent acquisition standards and insufficient structuralization, thereby constraining the generalizability of AI-based predictive models. Future research should prioritize establishing a tripartite framework to address these limitations: (1) Standardized Data Acquisition Protocol Development: Implement lifecycle-wide standardized protocols aligned with Health Level Seven Fast Healthcare Interoperability Resources (HL7 FHIR) specifications to unify perinatal data element definitions (e.g., gestational age measurement rules, delivery mode coding systems). Leverage natural language processing (NLP) techniques to extract structured insights from unstructured labor progression narratives. Concurrently deploy intelligent validation engines for real-time monitoring of data completeness and logical consistency. (2) Privacy-Enhanced Data Sharing Mechanisms: Enable cross-institutional collaborative modeling through federated learning frameworks. Integrate homomorphic encryption and secure multi-party computation (SMPC) to ensure “data usability without visibility” of raw datasets. Implement dynamic de-identification for high-risk pregnancy data. Establish data consortia with governance rules for contribution assessment and ethical oversight. (3) Source-Level Data Quality Control: Directly interface Internet of things (IoT)-enabled devices (e.g., fetal monitors, ultrasound systems) to automate time-series data acquisition and real-time calibration. Systematically reduce manual entry errors through sensor-to-database pipelines. Collectively, these interventions would substantially enhance data utility and lay critical foundations for developing generalizable predictive models.
4.1.2.2 Generalization Ability of Prediction Models
The global obstetric field currently faces significant regional disparities in healthcare resource allocation: variations in medical standards, professional competencies of healthcare providers, and region-specific care delivery models persist across nations and institutions. Maternal and neonatal healthcare outcomes in high-income regions markedly surpass those in low-resource settings, reflecting both technological gradients and strong correlations with regional economic development. Existing perinatal health prediction models predominantly derive from single-center datasets, exhibiting critical limitations in generalizability and cross-institutional interoperability that constrain clinical translation efficacy.
To address these challenges, we propose a “multi-center collaboration
This integrated approach effectively mitigates single-center study biases, enhances model adaptability across heterogeneous healthcare environments, and provides scalable technical infrastructure for global maternal-neonatal health optimization.
4.1.2.3 Ethical and Social Implications
In the clinical implementation of AI-assisted delivery mode prediction, robust data sharing mechanisms and standardization frameworks serve as foundational prerequisites for algorithmic optimization and model development. Current obstetric data systems are plagued by core challenges including multi-source heterogeneity, inconsistent standards, and cross-institutional sharing barriers. Without standardized data governance, algorithmic fairness and model generalizability remain fundamentally compromised. We propose implementing a comprehensive data stewardship framework through the following steps. (1) Standardized Perinatal Data Protocols: Develop unified data collection guidelines specifying critical delivery-related metrics (e.g., pelvimetry parameters, labor progression staging), aligned with international standards such as HL7 FHIR for structured interoperability. Establish definitive coding schemas for obstetric indicators through multidisciplinary consensus. (2) Secure Cross-Institutional Collaboration Infrastructure: Deploy federated learning architectures for distributed model training without raw data transfer. Integrate homomorphic encryption and dynamic de-identification techniques to preserve patient confidentiality. Implement quantifiable data contribution assessment mechanisms to incentivize multi-center participation.
Building upon this technical foundation, addressing ethical and societal implications requires prioritized attention. (1) Algorithmic Transparency: Ensure interpretability of feature weights in delivery prediction models through XAI frameworks. (2) Medicolegal Accountability: Formalize legal liability delineation protocols for human-AI decision conflicts. (3) Patient-Centric Governance: Establish dynamic consent management systems for continuous data usage authorization.
Concurrent interdisciplinary collaboration among medical ethics boards, AI developers, and clinical practitioners is imperative to formulate obstetric-specific AI governance guidelines. This tripartite synergy enhances clinician-patient acceptance of predictive systems while balancing technological innovation with ethical imperatives, ultimately fostering responsible integration of AI in maternal care.
Through a review of existing research, it becomes evident that AI-assisted prediction of delivery methods, as a cutting-edge domain at the intersection of medicine and computer science, still demands intensified collaboration between obstetricians and computer scientists. In the ensuing research, efforts should be directed towards further optimizing algorithms and prediction models. On this foundation, data sharing and standardization efforts among various medical centers should be strengthened. We should persistently undertake multi-center studies in collaboration with the global community to ultimately facilitate the widespread application of AI technology in clinical decision-making, achieving individualized and precise medical care. Naturally, while conducting in-depth research, it is imperative to augment the exploration of the ethical and social implications of AI technology.
We anticipate that with the unwavering efforts of all obstetricians and computer scientists, a comprehensive AI-assisted prediction model for delivery methods can be established, furnishing clinical practitioners with more accurate predictions and decision support for delivery methods, enabling personalized clinical decision-making and real-time monitoring and early warning, and providing more all-encompassing and effective safeguards for the health of mothers and infants.
This review highlights the significant potential of AI in predicting delivery modes, demonstrating its superiority over traditional statistical methods in terms of accuracy and reliability. However, several challenges remain, including data standardization, model generalizability, and ethical concerns. Future research should prioritize multi-center collaborations to enhance the generalizability of AI models, develop standardized protocols for data collection and sharing, and address the ethical implications of AI in obstetrics. By addressing these challenges, AI can be effectively integrated into clinical practice, ultimately improving maternal and neonatal outcomes.
All relevant data are within the manuscript and its supporting information files.
JZ and YZ designed the research study. YZ was responsible for manuscript writing. The table was conducted by JL and FMZ, while the graphic figures were created by ZL and EHG, the data from studies were compiled by XMH and YNX. YW and MZS have been involved in drafting the manuscript and provided help and advice on the search of reference. All authors contributed to editorial changes in the manuscript. All authors read and approved the final manuscript. All authors have participated suffciently in the work and agreed to be accountable for all aspects of the work.
Not applicable.
Not applicable.
This research was funded by Qingdao Outstanding Health Professional Development Fund. This research was also funded by the Clinical Medicine +X Scientific Research Project of the Affiliated Hospital of the Affiliated Hospital of Qingdao University, grant number QDFY+X2024111.
The authors declare no conflict of interest.
Supplementary material associated with this article can be found, in the online version, at https://doi.org/10.31083/CEOG37807.
References
Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

