Antral follicle count (AFC) and anti-Müllerian Hormone (AMH) are considered the best markers of ovarian reserve and ovarian response to stimulation. It is not clear whether they complement each other or act interchangeably in predicting ovarian response and individualizing gonadotropin dosage. Objective: To compare the predictive value of AFC and serum AMH for ovarian response and pregnancy outcome in intracytoplasmic sperm injection (ICSI) patients stimulated by an antagonist protocol. Moreover, to decide if measuring both markers adds to the power of predicting a response. Materials and Methods: A prospective diagnostic test study of infertile women. Setting: A private in vitro fertilization unit, Agial Hospital, Alexandria, Egypt. Women aged 20-39 years (n=700) and undergoing their first ICSI cycle were included in this study. AFC and AMH measurements were taken. All patients were stimulated with a fixed antagonist protocol with a starting dose of 200 IU of recombinant follicle stimulating hormone (rFSH). Main outcome measures included the number of oocytes retrieved and the clinical pregnancy rate. Results: Age, AMH, AFC, and a score combining both AMH and AFC (AMHxAFC) were statistically significant discriminators of the occurrence of an excessive response. The cutoff values for AMH, AFC, and AMHxAFC were > 3.75 ng/mL, > 23, and > 64.8. While AMH and AFC were equally effective in predicting an excessive response, the combined score AMHxAFC was significantly better than AFC or AMH alone. AMH, AFC, and AMHxAFC were significantly better predictors of an excessive response than age. Age, AFC, AMH, and the AMHxAFC were statistically significant discriminators of the occurrence of a poor response; however, AFC was the best predictor of a poor response, with a cutoff < 12. Age, AFC, and AMH were statistically significant discriminators of the occurrence of pregnancy, yet their predictive power is low. Conclusion: Measuring both AMH and AFC adds to their predictive power for a high or an excessive response. AFC alone is an excellent predictor of a poor response and is significantly better than age and AMH. Age, AMH, and AFC have poor predictive power for pregnancy.
Anti-Müllerian hormone (AMH) and antral follicle count (AFC) have been reported to be reliable markers of ovarian reserve [1-6]. They are useful for the determination of the starting dose of follicle stimulating hormone (FSH) in controlled ovarian stimulation (COS) cycles for patients undergoing assisted reproduction (ART) [7,8].
An accurate starting dose estimation is important to avoid both an excessive and a poor response to ovarian stimulation. An excessive response results in pelvic pain and discomfort during stimulation, painful oocyte retrieval, and increased risk of cycle cancellation; most importantly, it carries the risk of ovarian hyperstimulation syndrome (OHSS), one of the most dreadful outcomes of COS . Moreover, the pregnancy rate and live birth rate in the fresh cycle might be compromised [10-12]. On the other hand, AMH and AFC can predict a poor response to COS and the subsequent increased risk of cycle cancellation and poor outcome accompanying it.
Adjusting the starting FSH dose for expected poor responders may minimize cycle cancellation and reduce cost of treatment. It has been shown that 40% of patients drop out after their first ART cycle, due not only to the physical and psychological burden of treatment but also to the poor response . AMH and AFC have been shown to have comparable predictive powers for both a poor and an excessive response. AMH have not shown good predictive powers for the occurrence of pregnancy, indicating that it is a quantitative rather than qualitative marker of ovarian reserve. The ability of AFC to predict pregnancy outcome and live birth is not clear [8,14-18].
The authors studied the value of serum AMH and AFC in the prediction both of a poor or an excessive response and of treatment outcome in patients undergoing intracytoplasmic sperm injection (ICSI) using a fixed-dose antagonist protocol. They also explored whether measuring AFC and AMH together adds better predictive power than either alone.
The Study was approved by the ethics committee of Alexandria university. All patients signed a written consent before being included in the study. Infertile successive patients aged between 20 and 39 undergoing their first cycle of ICSI in a private in vitro fertilization (IVF) center (Agial IVF Center, Alexandria, Egypt) between June 2015 and July 2016 were included in this study. Patients with abnormal prolactin (PRL) or thyroid-stimulating hormone (TSH) were excluded. All patients had a normal gynecologic examination. Patients with severe endometriosis, fibroids, previous ovarian surgery, or uterine anomalies were excluded. All patients had a basal estimation of serum AMH, PRL, and TSH. The AFC was assessed by a single observer before the start of gonadotropin stimulation.
Transvaginal scans were performed with a Voluson E10, BT 15 ultrasound scanner using a wideband convex-volume endocavity high-resolution (6-12 MHz) transducer. Measurements were performed for antral follicles 2-10 mm in diameter according to the standards of Broekmans et al. . All patients were primed with oral contraceptive pills starting on the first day of a spontaneous or induced menses.
Pills were administered for 10 to 16 days then stopped, and recombinant FSH (rFSH) was started after a five-day pill-free period. All patients were started on a fixed dose of 200 IU of rFSH. The patients were scanned on day 5 of stimulation, and the rFSH was adjusted if the response was poor or excessive. Daily, starting on the 5th stimulation day, GnRH antagonist cetrorelix acetate 0.25 mg was administered subcutaneously. Follicular monitoring was continued using vaginal ultrasound examinations. Estradiol and progesterone were measured on the trigger day. Criteria of the trigger were the presence of at least three follicles 17-20 mm in diameter.
The trigger was accomplished by administering 250 µg of rhCG. Patients with 15-20 follicles >12 mm on the day of hCG were administered a dual trigger of agonist (0.2 mg of triptorelin acetate) and 1,000 IU hCG. Patients with >20 follicles >12 mm on the day of hCG were triggered by an agonist alone (0.2 mg of triptorelin acetate). In the case of agonist or dual trigger, intensive luteal support was adopted. Ovum pickup was scheduled 35-36 hours after rhCG. One or two blastocysts were transferred on day 5 or 6. In 9.7% of patients, three blastocysts were transferred due to old age and/or poor-quality embryos. The primary outcome measures in this study are the number of oocytes retrieved and the clinical pregnancy rate.
The data were tabulated and analyzed using SPSS, version 21.0. A Kolmogorov-Smirnov test of normality revealed no significance in the distribution of the variables, so the parametric statistics were adopted. Data were described using minimum, maximum, mean, and standard deviation. The area (AUC) under the receiver operating characteristic (ROC) was carried using MedCalc Software version 14. The Youden index was used to determine the best cutoff value. Conceptually, the Youden index is the vertical distance between the 45-degree line and the point on the ROC curve.
Comparison of ROC curves using the method of DeLong et al. was adopted to test the statistical significance of the difference between the areas under than two dependent ROC curves (derived from the same cases). An alpha level was set to 5% with a significance level of 95%, and a beta error was accepted up to 20% with a power of study of 80%.
A total of 700 couples undergoing ICSI were enrolled in the study. The ovarian response categories are defined according to the number of oocytes retrieved: poor (≤ 3), low (4-7), appropriate (8-14), high (15-19), and excessive (≥ 20). The patient characteristics and demographics are shown in Table 1. Patient outcome characteristics are summarized in Table 2. When pregnant and non-pregnant patients were compared, the age was significantly higher in the non-pregnant than in the pregnant group (31.47 ± 4.81 vs 30.73 ± 4.29 years, p (w) = 0.033), while serum AMH and AFC were significantly higher in the pregnant patients (4.49 ± 2.71 vs. 3.98 ± 2.30 ng/mL; p = 0.008, and 37.25 ± 22.82 vs. 32.66 ± 19.44; p(w)=0.004). Moreover, age, serum AMH, and AFC were significantly different in the various response categories; exceptions were that the age was not significantly different between the low- and appropriate-response groups and that both AMH and AFC were not significantly different between the poor- and the low-response groups (Figures 1 and 2).
— Mean plot for AMH in different response groups.
— Mean plot for AFC in different response groups.
|Female age (years)||31.09 ± 4.56|
|Infertility duration (years)||5.65 ± 3.45|
|Primary cause of infertility, n (%)|
|Male factor||349 (49.9%)|
|Tubal and adhesions||127 (18.1%)|
|AMH (ng/mL)||4.24 ± 2.53|
|AFC, n||35.00 ± 21.34|
|Total FSH dose, IU||2904.19 ± 1143.85|
|E2 on hCG day (pg/mL)||3649.89 ± 2314.53|
|P4 on hCG day (ng/mL)||1.06 ± 0.6|
|Mode of triggering|
Values are mean ± standard deviation, or n (%), PCOS = polycystic ovarian disease; AMH = anti-Müllerian hormone; AFC = antral follicle count; hCG = human chorionic gonadotropin; rhCG = recombinant hCG.
|Number of oocytes||22.83 ± 13.17|
|Poor (≤3 oocytes), (%)||3%|
|Low (4-7 oocytes), (%)||5%|
|Appropriate (8-14 oocytes), (%)||21%|
|High (15-20 oocytes), (%)||12%|
|Excessive (≥20), (%)||59%|
|Number of fertilized oocytes||13.69 ± 8.38|
|Blastocysts/oocytes retrieved (%)||28.32 ± 19.33|
|Blastulation rate (%)||46.59 ± 27.21|
|Expanded blastocyst rate (%)||62.29 ± 36.43|
|High-quality blastocyst rate (%)||28.32 ± 19.33|
|High-quality blastocyst rate (%)||24.84 ± 30.49|
|High-quality blastocyst/oocytes retrieved (%)||7.55 ±10.89|
|Number of embryos transferred||2.03± 0.50|
|Beta hCG positive
|Implantation rate (%)||27.1%|
The correlation between markers of ovarian reserve (age, AMH, and AFC) and ovarian response (number of oocytes retrieved) was obtained using the Pearson correlation coefficient. The correlation was found to be significant at the 0.01 level (Table 3). The authors performed ROC curve analyses to estimate the predictive value of age, AMH, and AFC for ovarian response. A score combining both AMH and AFC (AMHxAFC) was tested together with AFC and AMH separately. AFC, AMH, and AMHxAFC were statistically significant discriminators of the occurrence of an appropriate or a high response, with the AUC = 0.752 vs. 0.730 vs. 0.766 (95% CI=0.719-0.784 vs. 0.695-0.762 vs. 0.733 to 0.797) (Z=13.331 vs. 12.03 vs. 14.414, p < 0.0001 vs. < 0.0001 vs. < 0.0001). The diagnostic criterion using the Youden index is the level of <23 vs. <3.75 ng/mL vs. <75, with a sensitivity of 66.67 vs. 69.70% vs. 66.67%, specificity of 80.81 vs. 73.13 vs. 80.60%, positive predictive value (PPV) of 63.1 vs. 56.1 vs. 62.9%, and negative predictive value of 83.1 vs. 83.1 vs. 83.1%. Comparing the ROC curves for AMH, AFC, and AMHxAFC showed no significant difference between AMH and AFC, indicating that they are equally effective in predicting a high response, while the combined score AMHxAFC was significantly better than AFC or AMH alone (Figure 3). Age was a statistically significant discriminator of the occurrence of an excessive response, with an AUC = 0.654 (95% CI=0.618-0.689) (Z=7.071, p < 0.0001). The diagnostic criterion using Youden index is the level of < 34 years with a sensitivity of 84.75%, specificity of 46.34%, Positive predictive value (PPV) of 69.4% and negative predictive value of 67.9%.
— Receiver operating characteristic (ROC) curve analysis showing the predictive value of AMH, AFC, and AMHxAFC for the estimation of the appropriate or high response.
AFC and AMH and the AMHxAFC were statistically significant discriminators of occurrence of excessive response with area under the ROC curve (AUC) = 0.856 vs. 0.828 vs. 0.868 (95% CI=0.828 to 0.881 vs. 0.798 to 0.855 vs. 0.840 to 0.892) Comparing the ROC curves for age, AMH, AFC, and AMHxAFC showed no significant difference between AMH and AFC (p = 0.1381) indicating that they are equally effective in predicting excessive response, while the combined score AMHxAFC was significantly better than AFC or AMH alone (p = 0.0426 and 0.0003).
AFC, AMH, and AFCxAMH were all significantly better predictors of an excessive response than age (p < 0.0001, < 0.0001, and < 0.0001, respectively) (Figure 4).
— Receiver operating characteristic (ROC) curve analysis showing the predictive value of female age, AMH, AFC, and AMHxAFC for the estimation of excessive response.
Age, AFC, AMH, and AMHxAFC were statistically significant discriminators of an occurrence of poor response, with AUC = 0.881 vs. 0.950 vs. 0.914 vs. 0.942 (95% CI=0.855 to 0.904 vs. 0.931 to 0.965 vs. 0.891 to 0.934 vs. 0.922 to 0.958) (Z=19.207 vs. 50.772 vs. 34.737 vs. 45.604, p < 0.0001 vs. < 0.0001 vs. < 0.0001 vs. < 0.0001). The diagnostic criterion using the Youden index is the level of > 34 years vs. ≤ 12 vs. ≤ 1.47 ng/mL vs. ≤ 17.64 (Figure 5).
— Receiver operating characteristic (ROC) curve analysis showing the predictive value of female age, AMH, AFC, and AMHxAFC for the estimation of the poor response.
A pairwise comparison of ROC curves showed that AFC and AFCxAMH were significantly better than age in the prediction of a poor response (p = 0.0008 and 0.0065), while AMH was not significantly different from age (p = 0.2214). AFC was the best predictor of a poor response, as it was significantly better than AMH and AMHxAFC (p = 0.0001 and 0.0008). This coincides with the findings of Multu et al. . These results show that the combined score AMHxAFC is even less accurate than AFC alone in the prediction of a poor response, that AFC and AMH are interchangeable, and that AFC alone is better than AMH in poor-response prediction.
|Pregnancy||Test of significance|
|Poor (3%)||Low (5%)||Appropriate (21%)||High (12%)||Excessive (59%)|
|Female age (years)|
|± SD||37.00a||34.00b, c||33.90b, c||28.58d||30.05e|
|Anti-Müllerian hormone (ng/mL)|
|± SD||1.30a, b
|± SD||7.67a, b
BF: Brown-Forsythe robust test of equality of means. Different superscripts indicate pairwise significant difference using Games-Howell multiple comparison method.
The present authors performed ROC curve analyses to estimate the predictive value of age, AMH, and AFC for pregnancy.
Age is a statistically significant discriminator of occurrence of pregnancy, with an AUC = 0.551 (95% CI=0.513 to 0.588) (Z=2.332, p < 0.0197).
The diagnostic criterion using the Youden index was at a level of < 32 years with a sensitivity of 66.11%, specificity of 50.59%, PPV of 58.6%, and NPV of 58.5%. However, the AUC was only 0.551, indicating the low discriminatory power of age in predicting pregnancy.
AFC and AMH were statistically significant discriminators of occurrence of pregnancy, with an AUC = 0.545 vs. 0.551 (95% CI=0.507 to 0.582 vs. 0.514 to 0.589) (Z=2.055 vs. 2.339, p < 0.0399 vs. < 0.0193). The AUC is only 0.545, indicating a low predictive power for pregnancy.
The diagnostic criterion using the Youden index is the level of > 60 vs. > 3.75 ng/mL with a sensitivity of 21.39 vs. 67.5 %, specificity of 91.76 vs. 50.0%, PPV of 73.3 vs. 58.8, and negative predictive value of 52.4 vs. 59.2%.
The AMHxAFC score was not significantly different from AMH or AFC alone for the prediction of pregnancy (Figure 6).
The prevention of an excessive response with a consequent risk of OHSS is of paramount importance . Moreover, the prediction of a poor response is of great value in counseling the infertile couple and in the choice of protocol and stimulation dose. An effective prevention depends on reliable tools. Female age and ovarian reserve tests, mainly AMH and AFC, are the mainstays in this regard . To be clinically useful, clinicians must depend on the most sensitive markers, and proper cutoff levels must be established for the prediction of a poor or an excessive response. In the present work, the mean age of the patients was significantly lower in the pregnant patients compared to the non-pregnant patients. Similarly, AMH level and AFC were higher in the pregnant compared to the non-pregnant. The female age was also significantly lower, and AMH levels and AFC were significantly higher in the high- and excessive-response groups compared to the poor-, low- , and appropriate-response groups.
— Receiver operating characteristic (ROC) curve analysis showing the predictive value of female age, AMH, AFC, and AMHxAFC for the estimation of the occurrence of pregnancy.
Both AFC and AMH were found to be good predictors of appropriate, high, and excessive response. This has been repeatedly confirmed by many observers [3-5,15,17,20]. The concordance between AMH and AFC is expected because AMH is derived from the primary, secondary, preantral, and small antral follicles less than 4 mm in diameter [21,22]. To find whether measuring both AMH and AFC adds value to the prediction of a response, the authors added a combined score by multiplying the AMH and AFC; the authors called it AMHxAFC. Comparing the ROC curves for AMH, AFC, and AMHxAFC showed no significant difference between AMH and AFC, indicating that they are equally effective in predicting a high response. This coincides with the findings of the meta-analysis conducted by Broer et al. . On the other hand, the combined score AMHxAFC was significantly better than AFC or AMH alone. This indicates that it is more useful to measure both AMH and AFC to predict an excessive response than to depend on only one of them. Moreover, AFC and AMH had better predictive power of a high or excessive response than age.
Arce et al. and Polyzos et al. found similar cutoff values for AMH (3.9 and 3.52 ng/mL) for the prediction of an excessive response. The cutoff value for AFC to predict an excessive response is less clear, and few studies have reported values ranging from > 9 to > 16 [6,23-25,26].
It was noticed that AFC is observer-dependent, requires standardization, and it is affected by the recent advances in ultrasound technology in terms of resolution.
More recent studies are more representative of what applies nowadays. In the present study, AFC was the best predictor of a poor response; it was significantly better than age, AMH, and the combined AMHxAFC score. Yet, age and AMH can predict a poor response, and there was no statistically significant difference between them in the prediction of a poor response. The addition of AMH to AFC did not add to its predictive power.
The cutoff value of AFC in this study for the prediction of a poor response was < 12, reflecting the use of transvaginal ultrasound machines with very high resolution and measuring both the small (2-5 mm) and the large (6-9 mm) antral follicles. There is considerable disagreement in the literature regarding the cutoff value of AFC for a poor-response prediction: values vary from < 3  in older references to < 12  in more recent ones. Most frequently, the accepted cutoff value is in the range of < 5 to < 7 [28,29]. The cutoff value of AMH in the present study for the prediction of a poor response was ≤ 1.47 ng/mL. Values ranging between 0.7-1.36 ng/mL are reported in the literature [8,30]. There are few multifactor prediction models of ovarian response that use female age, FSH, and either AMH or AFC . The present authors suggest that further research is needed to build a prediction model that incorporates both AMH and AFC for the prediction of an excessive response. On the other hand, age, AFC, and AMH could predict pregnancy, but the predictive power, although significant, is poor. AMHxAFC was not significantly different from AMH or AFC alone. The poor prediction of pregnancy has been confirmed in many studies [15,17,31,32].
In conclusion, this work shows that measuring both AMF and AFC adds to their predictive power for a high or an excessive response. Moreover, AFC alone is an excellent predictor of a poor response. AMH has the same predictive power as age for the prediction of a poor response. Good prediction is a key step toward the individualization of ovarian stimulation to optimize outcome and prevent complications and cancellations.