The Accuracy of Third-Trimester Ultrasound in Predicting Large for Gestational Age or Macrosomic Fetuses in Diabetic and Non-Diabetic Pregnant Women: A Systematic Review and Meta-Analysis

Background : The accuracy of third-trimester ultrasound in detecting large for gestational age and macrosomic fetuses in diabetic and non-diabetic pregnant women is unclear in the literature. The aim of the study is to examine the precision of the 4-parameter Hadlock formula for the prediction of large fetuses in these two populations. Methods : A systematic review and meta-analysis were performed, and only studies evaluating the accuracy of third-trimester ultrasound using the 4-parameter Hadlock formula were included. Data were extracted, and the meta-analysis was performed using STATA software and Meta-disk 2.0 aiming to obtain the pooled sensitivity and specificity. Quality assessment of the risk of bias was performed using the QUADS-2 tool. Results : Nine articles were included in the final analysis together with 24,693,702 pregnancies screened and 2336 real large fetuses. The included articles were judged to be at high risk of bias in more than half of the cases and at doubtful risk in the remaining cases. Comparison between diabetic and non-diabetic populations was impossible because the studies considered mixed pregnancies (diabetic and non-diabetic) or only healthy, so the comparison was made between the latter two groups. The pooled sensitivity was 0.54 (95% confidence interval (CI): 0.40–0.68), and the pooled specificity was 0.94 (95% CI: 0.90–0.97). The heterogeneity estimated by the Bivariate I 2 was 0.92, and the area under the summary Receiver Operating Characteristics curve was 0.19. The subgroup analysis revealed a higher level of heterogeneity for the mixed group (I 2 = 0.92) and a lower one for the healthy group (I 2 = 0.67). The relative sensitivity between the mixed population and the healthy one was 0.85 (95% CI: 0.49–1.45; p = 0.57), and the relative specificity between the mixed population and the healthy one was 0.98 (95% CI: 0.91–1.04; p = 0.54), the difference between healthy and mixed groups was not significant ( p = 0.11). Conclusions : Despite the high heterogeneity of the data, the overall accuracy of ultrasound is similar in mixed and healthy populations and is overall moderate in predicting large fetuses.


Introduction
A large for gestational age (LGA) fetus is defined by the presence of a prenatal abdominal circumference (AC) and/or estimated fetal weight (EFW) ≥90°percentile [1]. As a result of the increased incidence of obesity in mothers and thus also diabetes [2], the risk for the fetus to be LGA, or being born macrosomic, that is defined by a neonatal weight ≥4000 grams, is considerable. Because of the possible perinatal and maternal complications associated with the presence of a large fetus, such as shoulder dystocia and thirdand fourth-degree perineal lacerations, the prenatal identification of an LGA fetus may reduce these risks. However, the assessment of the estimated fetal weight by ultrasound has shown a poor prediction rate for LGA and macrosomia, and the likelihood of error is greater, the greater the estimated fetal weight and gestational age [3,4]. Formulas used for calculating the EFW tend to underestimate or overesti-mate fetal size by a range of 10-15%, making the prenatal estimation of birth-related risks ineffective or inappropriate [5][6][7]. This effect is secondary to different variable such as the error related to every measured parameter and the large intra-and interobserver variability. Furthermore, it appears that most formulas are mostly accurate for weights up to 3500 grams, albeit tending, in the opinion of some authors [6,8,9], to underestimate large fetuses. For other authors, on the other hand, the overestimation of weight would seem to be all the greater the higher the EFW [10]. Among all available formulas to calculate fetal weight, Hadlock's 4-parameter formula (including biparietal diameter (BPD), head circumference (HC), AC, and femur length (FL)), is the most widely used, and it seems to provide the best predictions of birth weights over 3500 grams [8].
Pregnancies complicated by gestational diabetes mellitus (GDM) have a 2 to 4 times higher risk of having LGA fetuses than non-diabetic women [11], along with a higher risk of perinatal morbidities related to neonatal macrosomia [12]. It is debated whether the presence of maternal diabetes reduces further the accuracy of ultrasound in estimating fetal weight [13]. In fact, it has been shown that the percentage difference in EFW may be as low as 0.2% in non-diabetic women when fetal biometry is performed within one week before delivery, while it rises to 7.9% in diabetic women [13]. However, the accuracy of ultrasonography in these two groups is not well described in the literature. Therefore, the aim of this review is to define the accuracy of the 4-parameter Hadlock formula in predicting the EFW of LGA fetuses in diabetic and non-diabetic pregnant women.

Materials and Methods
A meta-analysis on the accuracy of third-trimester ultrasound in estimating the actual birth weight of suspected LGA and macrosomic fetuses was conducted. The study was registered with the International Prospective Register of Systematic Reviews (PROSPERO) database (CRD42023407146) [14]. The Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) guidelines were followed in reporting the results [15].

Search Strategy
An English literature search was performed from inception until July 2022 in PubMed (Medline). For the purpose of the search, a combination of key terms was used, which, together with the search strategy, is given in the Appendix. Original articles and studies reporting the accuracy of the EFW Hadlock 4 formula [16] in detecting LGA and macrosomic fetuses were considered for inclusion, while literature reviews, meta-analyses, and case reports were not considered eligible. The diabetic patients could have had pregestational type I or II diabetes or GDM. According to the International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) and the American College of Obstetricians and Gynecologists (ACOG) [1,5], an LGA fetus was defined by an EFW or AC above the 90th percentile according to gestational age, while a macrosomic fetus was considered a fetus with an EFW above 4000 grams.
Data extracted or derived from the available data of each study included the type of population undergoing ultrasound and the total number of patients scanned, the sensitivity, the specificity, and the total number of true-positive (TP), false-positive (FP), true-negative (TN), and falsenegative (FN) results. A meta-analysis was conducted to present sensitivity and specificity estimates along with 95% confidence intervals (CIs).

Quality Assessment of Included Studies
The quality assessment of each included study was performed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) criteria [17]. After applying the two separate quality criteria, the Robvis tool web app (Version of 2022, University of Bristol, Bristol, United Kingdom) [18] was then used to visualize the risk-of-bias assessments creating traffic-light plots and weighted bar plots.

Statistical Analysis
Derived data on TP, FP, TN, and FN were obtained by knowing the number of patients studied and the sensitivity and specificity values through a 2 × 2 table. The metaanalysis (hierarchical and bivariate models) was performed using the Metandi and Metadata commands on STATA software (Stata 17, StataCorp LLC., College Station, TX, USA) [19][20][21] and Meta-disk 2.0 (Ramón y Cajal Research Institute, Madrid, Spain) [22], calculating: the pooled accuracy estimation (sensitivity, specificity, diagnostic odds ratio (DOR), positive likelihood ratio (LR+), negative likelihood ratio (LR-)), and false positive rate (FPR) with their corresponding 95% CIs, the model parameter estimates (logit Sensitivity, logit Specificity, logits variances, and correlation), and the heterogeneity statistics including bivariate Isquared, the median odds ratio (MOR), and the area of 95% prediction ellipse.
For the aim of the study, a subgroup analysis between healthy and mixed populations was performed. A comparative analysis was run using a random effects model with one categorical covariate from Meta-disk 2.0. Summary receiver operating characteristic (ROC) curve and forest plots were also reported.

Search Results
A total of 1855 studies were identified through the search of the literature. 1792 titles and abstracts were screened, resulting in 63 proceeding to the full-text screen. Of these, 20 articles were excluded because of the formula used (non-Hadlock, Hadlock 1, Hadlock 2, or Hadlock 3 formula). Further, eight publications were excluded because of the incorrect study design or outcome, and the other three manuscripts because of insufficient data reported. Finally, 23 articles were excluded because they did not directly report TP, FP, TN, and FN data or could not be derived due to missing data. Thus, nine articles were included in the present meta-analysis [3,7,[23][24][25][26][27][28][29], of which 3 represented a healthy population [7,25,27], one a diabetic population [26], and 5 a mixed population (healthy and diabetic) [3,23,24,28,29].
The selection process of included articles is presented in Fig. 1, while PRISMA checklist is given in the Supplementary Materials.

Risk of Bias of Included Studies
The risk of bias of included studies was represented with the traffic light plots and weighted bar plots according to the QUADAS-2 criteria (Fig. 2, Ref. [3,7,[23][24][25][26][27][28][29] ; Fig. 3). The rating obtained on "overall" risk of bias was high risk in more than half of the publications included in the review.
In the remaining cases, the overall risk of bias was rated as doubtful. The fields with the highest risk of bias were "index test" because of the interval between ultrasound scans and delivery. Conversely, reference standard bias and flow and timing bias were found to be low risk for all publications.

Description of Included Studies
For the purpose of the present study, the studies were divided into two major groups: group 1, defined as the "healthy" non-diabetic population; and group 2, defined as the "mixed" population including a population of healthy and diabetic patients [3,23,24,26,28,29].

Subgroup Analysis and Meta-Regression Analysis
The subgroup analysis shows a higher level of heterogeneity for the mixed group (I 2 = 0.92) and a lower one for the healthy group (I 2 = 0.67). The correlation between the two groups compared was negative (-1.00 and -0.89, respectively). The sensitivity for the healthy population was 0.60 (95% CI: 0.35-0.80), while specificity was assessed at 0.95 (95% CI: 0.88-0.98). As regards the mixed population, sensitivity was 0.51 (95% CI: 0.33-0.68), while the specificity was 0.93 (95% CI: 0.87-0.96). The ROC curve in the two populations is shown in Fig. 6.
The meta-regression analysis for the subgroup (healthy and mixed populations) assessed that the sensitivity and specificity parameters did not differ between the mixed population and the healthy one (p = 0.11). In fact, the relative sensitivity between the mixed population and the healthy one was 0.85 (95% CI: 0.49-1.45; p = 0.57), and the relative specificity between the mixed population and the healthy one was 0.98 (95% CI: 0.91-1.04; p = 0.54).

Discussion
The results of this study indicate that the overall accuracy of ultrasound in estimating fetal birth weight is in the range of 54% with a specificity of 94%. The data have not changed much if we consider the sub-analysis of the two population groups. In fact, the sensitivity for the nondiabetic population is 60%, while for the mixed population of diabetic and healthy is 51%. The main problem, however, is the high statistical heterogeneity found among the studies, together with the statistical methodology used, which does not allow to reach conclusive studies on the subject.
Several aspects make approaching this topic complex: first, the choice of growth curves. In planning the study, we had to deal with the question of which growth curve to consider in the systematic review, and we chose to narrow our assessment to that of Hadlock-4 only. In fact, in a study that evaluated the performance of 36 growth curves on a total population of 350 newborns weighing more than 4000 grams, the Hadlock-4 formula identified 74% of fetuses weighing ≥4000 grams with a systematic error not significantly different from zero [31]. However, a false positive rate of 31% was reported. This leads to the second aspect to consider when addressing the issue of weight estimation in large fetuses, which is the decrease in ultrasound accuracy observed for an examination performed in the last weeks of pregnancy and for an EFW ≥4000 g. In 2017, the World Health Organization (WHO) published reference ranges for fetal growth charts based on the prospective assessment of 1387 women from 10 different countries [32]. They observed that the growth curve tends to widen towards the end of pregnancy, indicating greater variability in the estimation of fetal weight. Not only but also, this variability seemed to be greater for higher percentiles. In other words, while a small fetus tends to be "more equally small", in the large fetus there is greater variability that makes it difficult to use a standardized cut-off and to make recommendations on it. The reasons why this variability is greater for larger fetuses, especially near the term of pregnancy, have not been elucidated. Several factors have been implicated, such as the technical difficulties in measuring a large fetus at term gestation, with the consequent difficulty of being able to obtain proper imaging plane for the measurement, or the presence of maternal obesity and diabetes, which could reduce the quality of the images in the former and determine different fetal body composition in the latter [33,34]. These aspects lead to the third aspect to consider when estimating the weight of large fetuses: is ultrasound accuracy different in large fetuses of non-diabetic mothers compared to diabetic and/or obese mothers?
The results of our review show that the accuracy of ultrasound in predicting the birth weight of a large fetus in a population of healthy mothers is nearly superimposable to that of diabetic mothers. This result is in line with the previous study that did not find an association between maternal diabetes and poorer accuracy of ultrasound while showing a negative correlation between obesity and performance of the test, although this association is not strong [33,34]. Studies that aimed to improve the accuracy of ultrasound for the diagnosis of LGA or macrosomia by introducing maternal features have failed. Body mass index, fetal sex, and multiparity have no significant influence on measurement error [6] or accuracy. Adding clinical and demographic variables to the ultrasound assessment, including maternal weight and body mass index, does not improve the prediction of macrosomia [35]. In fact, if the ultrasound is performed by an experienced sonographer the impact of maternal body mass index is scarce [36]. This concept is reasonable if we consider that ultrasound is an operatordependent examination and, therefore, it could indicate that the estimation of fetal weight should be performed by an experienced operator, if LGA is suspected. Although this is, to our knowledge, the first study that attempts to evaluate the difference in the detection rate of the ultrasound estimation of fetal weight in the diabetic and non-diabetic populations, there are some limitations.
The first is the small number of included studies characterized by high statistical variability. In many studies, the population included a mixed population of diabetic and non-diabetic women, for which it was necessary to merge the data, and this could have had an impact on the results. Moreover, the majority of studies reported on both LGA and macrosomic fetuses, which frequently overlap but maybe two different entities. In addition, the timing of ultrasound performance varies from 7 to more than 30 days contributing to the variability in terms of detection rate.

Conclusions
The estimation of the fetal weight in diabetic women is of paramount importance as it may help in identifying the optimal time of delivery of LGA fetuses in an attempt to prevent possible complications related to the birth of a macrosomic fetus. This review confirms that the accuracy of ultrasound in predicting large fetuses at birth is only moderate but that its performance is similar in mixed and nondiabetic populations. However, there is a high heterogeneity between studies that impede the drawing of definitive conclusions. Further studies are needed to establish the exact accuracy of ultrasound estimation of fetal weight.

Availability of Data and Materials
All data generated or analyzed during this study are included in this published article.

Author Contributions
IF designed the research study. SB performed the research. VC and SN analyzed the data. SB and IF wrote the manuscript. MG and RR provided help and advice on the study design and final draft. All authors contributed to editorial changes in the manuscript. All authors have red and approved the final manuscript.

Ethics Approval and Consent to Participate
Not applicable.