The Current Diagnostic Performance of MRI-Based Radiomics for Glioma Grading: A Meta-Analysis

Lucio De Maria; Francesco Ponzio; Hwan-ho Cho; Karoline Skogen; Ioannis Tsougos; Mauro Gasparini; Marco Zeppieri; Tamara Ius; Lorenzo Ugga; Pier Paolo Panciani; Marco Maria Fontanella; Waleed Brinjikji; Edoardo Agosti

doi:10.31083/j.jin2305100

Home
About
Articles
Special Issues
- Special Issues
- Edit a Special Issue
For Authors
Editorial Office

NULL

Advanced

Countries | Regions

Article Types

Year

—

Volume

Issue

Pages

—

IMR Press / JIN / Volume 23 / Issue 5 / DOI: 10.31083/j.jin2305100

Cite this article

Nanoparticles: Properties, applications and toxicities

142

Downloads

Citations

443

Views

Submit to JIN

Review for JIN

Apply for Special Issue

Academic Editor

Gernot Riedel

Chapters

Figures

References

Abstract

Keywords

1. Introduction

2. Materials and Methods

Availability of Data and Materials

Author Contributions

Ethics Approval and Consent to Participate

Open Access Systematic Review

The Current Diagnostic Performance of MRI-Based Radiomics for Glioma Grading: A Meta-Analysis

Lucio De Maria¹, Francesco Ponzio², Hwan-ho Cho³, Karoline Skogen⁴, Ioannis Tsougos⁵, Mauro Gasparini⁶, Marco Zeppieri^7,*, Tamara Ius⁸, Lorenzo Ugga⁹, Pier Paolo Panciani¹, Marco Maria Fontanella¹, Waleed Brinjikji¹⁰, Edoardo Agosti¹

Show Less

Affiliation

¹ Division of Neurosurgery, Department of Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, 25123 Brescia, Italy

² Interuniversity Department of Regional and Urban Studies and Planning, Politecnico di Torino, 10125 Torino, Italy

³ Department of Medical Artificial Intelligence, Konyang University, 35365 Daejeon, Republic of Korea

⁴ Department of Radiology and Nuclear Medicine, University of Oslo, 0372 Oslo, Norway

⁵ Department of Medical Physics, University of Thessaly, 413 34 Larissa, Greece

⁶ Department of Mathematical Sciences “Giuseppe Luigi Lagrange”, Politecnico di Torino, 10123 Torino, Italy

⁷ Department of Ophthalmology, University Hospital of Udine, 33100 Udine, Italy

⁸ Neurosurgery Unit, Head-Neck and NeuroScience Department University Hospital of Udine, p.le S. Maria della Misericordia 15, 33100 Udine, Italy

⁹ Department of Advanced Biomedical Sciences, University of Naples “Federico II”, 80126 Naples, Italy

¹⁰ Department of Neurosurgery and Interventional Neuroradiology, Mayo Clinic, Rochester, MN 55905, USA

^*Correspondence: markzeppieri@hotmail.com (Marco Zeppieri)

J. Integr. Neurosci. 2024, 23(5), 100; https://doi.org/10.31083/j.jin2305100

Submitted: 13 November 2023 | Revised: 28 December 2023 | Accepted: 4 January 2024 | Published: 14 May 2024

This is an open access article under the CC BY 4.0 license.

PDF

Brower Figures

Cite

Abstract

Background: Multiple radiomics models have been proposed for grading glioma using different algorithms, features, and sequences of magnetic resonance imaging. The research seeks to assess the present overall performance of radiomics for grading glioma. Methods: A systematic literature review of the databases Ovid MEDLINE PubMed, and Ovid EMBASE for publications published on radiomics for glioma grading between 2012 and 2023 was performed. The systematic review was carried out following the criteria of Preferred Reporting Items for Systematic Reviews and Meta-Analysis. Results: In the meta-analysis, a total of 7654 patients from 40 articles, were assessed. R-package mada was used for modeling the joint estimates of specificity (SPE) and sensitivity (SEN). Pooled event rates across studies were performed with a random-effects meta-analysis. The heterogeneity of SPE and SEN were based on the $\chi{}$ ${}^{2}$ test. Overall values for SPE and SEN in the differentiation between high-grade gliomas (HGGs) and low-grade gliomas (LGGs) were 84% and 91%, respectively. With regards to the discrimination between World Health Organization (WHO) grade 4 and WHO grade 3, the overall SPE was 81% and the SEN was 89%. The modern non-linear classifiers showed a better trend, whereas textural features tend to be the best-performing (29%) and the most used. Conclusions: Our findings confirm that present radiomics’ diagnostic performance for glioma grading is superior in terms of SEN and SPE for the HGGs vs. LGGs discrimination task when compared to the WHO grade 4 vs. 3 task.

Keywords

glioma grading

radiomics

magnetic resonance imaging (MRI) features

systematic review

meta-analysis

1. Introduction

With the 5th edition of the World Health Organization (WHO) Classification of Tumors of the Central Nervous System (CNS) published in 2021 [1], a major role has been assigned to molecular patterns for the differential diagnosis of gliomas [2]. These innovations have been followed by improvements in therapeutic strategies, with more targeted and focused therapies [3]. The gold standard for diagnosis of gliomas remains biopsy, although it puts the patients at inevitable risk of procedure-related complications [4]. Over the last 10 years, artificial intelligence (AI) applied to diagnostic imaging progressively gained more popularity [5]. Radiomics, through deep learning (DL) and machine learning (ML) techniques, refers to the extraction of mineable data from medical imaging, boosting its diagnostic capability. Since its first appearance in 2012 as a computer-aided detection and diagnosis system [6], radiomics has grown massively infiltrating the diverse fields of neuro-oncology [4]. Clinical applications of these noninvasive radiomics-based models range from screening and characterization to monitoring and prediction [7]. This has clinical implications for any CNS tumor, from gliomas to meningioma and pituitary tumors [8, 9, 10].

The grading of glioma has been one of the main focuses of radiomics [11]. Nowadays, multiple radiomics-based models have been proposed for grading glioma, using different features, models, and sequences of magnetic resonance imaging (MRI). Each carried out its diagnostic accuracy, thus, the actual performance of radiomics is dependent on the specific model and is not standardized [12]. Given the heterogeneity and rapid expansion of this technique, we aimed to better define its overall potential in grading gliomas. Therefore, based on the current literature we conducted a meta-analysis and systematic review to evaluate the current performance state of radiomics for the grading of brain gliomas.

2. Materials and Methods

2.1 Systematic Review and Inclusion Criteria

The Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) criteria were followed in conducting the systematic review [13]. An expert librarian created and carried out a thorough literature search of the Ovid MEDLINE, PubMed, and Ovid EMBASE databases with advice from the authors. Three terms were utilized in “AND” combinations: “radiomics”, “glioma”, and “grading”. Only publications published between 2012 and 2023 were included in the search.

Two Authors (L.D.M. and F.P.) assessed the study inclusion criteria during the review process. The following inclusion criteria included: (1) Written in the language of English; (2) case series including based on 10 patients or more; (3) studies reporting exclusively histologically proven brain gliomas; (4) studies that included the WHO grade; (5) studies based on radiomics; (6) studies that reported the performance data of radiomics for glioma grade prediction.

2.2 Data Abstraction

Three investigators (L.D.M., F.P., and E.A) independently collected data from systematic reviews that were eligible using piloted forms. Each systematic review was then validated by a different investigator, and consensus was used when confronted with disagreements by. Study characteristics were obtained from each eligible systematic, based on the baseline information that included: year of publication, number of patients, WHO grade, and MRI protocol. As for the radiomics model, we collected information about: the AI sub-category (i.e., ML or DL), the classification algorithm (i.e., logistic regression (LR), gradient boost (GB), naïve bayes (NB), support vector machine (SVM), multilayer perceptron (MLP), elastic net regression (ENR), linear discriminant analysis (LDA), nearest neighbors (NN), random forest (RF), convolutional neural network (CNN), and others), the selected features (i.e., textural, geometrical or morphological, voxel intensities-based, others), the employed MRI modalities (i.e., T1-weighted [T1W], T2-weighted [T2W], others), and the application of cross-validation analysis (i.e., yes or no).

2.3 Quality Assessment

We analyzed the clarity of reporting in eligible systematic reviews with the PRISMA checklist. The checklist includes 27 items that are used to assess the reporting quality. The PRISMA checklist is shown in Supplementary Material [13].

2.4 Outcomes

Our primary outcomes were the specificity (SPE), sensitivity (SEN), and Summary Receiver Operating Characteristics (SROC) curve of radiomics for predicting the WHO grade of gliomas of the brain. Bivariate analyses by discrimination task between low-grade gliomas (LGGs; WHO grade 1 and 2) vs. high-grade gliomas (HGGs; WHO grade 3 and 4), and WHO grade 4 gliomas vs. WHO grade 3 gliomas were conducted.

The impact of the following variables on the performance of the proposed radiomics models was evaluated as secondary outcomes: year of development, cohort size, AI sub-category, validated classifiers, selected features, MRI modalities, and cross-validation strategy. These variables were also studied in quantitative terms to picture the current trends of radiomics models for glioma grading.

2.5 Study Risk of Bias Assessment

To evaluate the methodologic quality of the studies that were part of our meta-analysis, we adjusted the Newcastle-Ottawa Scale (NOS) [14]. The purpose of this tool is to be used in comparative investigations. However, as our investigations lacked a control group, we evaluated the methodologic quality of the data using a subset of the scale’s items, paying particular attention to the following queries: (1) Was a random sample used in the study, or did all patients or consecutive patients participate? (2) Was the research prospective or retrospective? (3) Did the clinical follow-up provide enough information to determine every outcome? (4) Were the results published? (5) Were the criteria for inclusion and exclusion well-defined [15]?

2.6 Statistical Analysis

Data from primary studies were reported in a 2 $\times{}$ 2 contingency table consisting of true positive (TP), false negative (FN), false positive (FP), and true negative (TN) based on the concordance between biopsy results and the radiomics tool predictions. Such a table served as input for the R-package mada (https://cran.r-project.org/web/packages/mada/index.html) [16], used for modeling the joint estimates of SEN and SPE and their 95% confidence intervals (CIs). A random-effects meta-analysis was used to pool the event rates across studies, and the $\chi{}$ ${}^{2}$ test was performed to analyze the heterogeneity of SPE and SEN, considering the null hypothesis as equality in each case.

3. Results

3.1 Literature Reviews

After duplicates were eliminated, 390 papers were found. 103 articles were found for full-text study following the analysis of the titles and abstracts. Forty papers were evaluated for eligibility. The following criteria led to the exclusion of the remaining 63 articles: (1) studies not reporting data on radiomics performance for glioma grading (34 articles), (2) studies not reporting the WHO grade (13 articles), (3) studies reporting on AI-based models other than radiomics (8 articles), (4) improper study design (5 articles), and (5) language other than English (3 articles). The Authors of three included articles [17, 18, 19] provided missing performance data and the results were integrated into the data abstraction process. For each of the patient groups under consideration, at least one or more outcome measures were available for all of the studies that were part of the analysis. The PRISMA statement’s flow chart is seen in Fig. 1.

Fig. 1.

PRISMA flow diagram showing the search process of the literature. Abbreviations: PRISMA, preferred reporting items for systematic reviews and meta-analysis; WHO, World Health Organization.

3.2 Baseline and Radiomics Data

A total of 7654 patients were included in this study. Most studies were published in 2022 (25%), followed by 2021 (20%), and 2020 (17.5%). The smallest study included 26 patients, while the largest 572. Differentiation in LGG and HGG was reported for 6951 patients (90.8%), of which 4537 had HGGs (65.3%). Among those with HGGs, discrimination between WHO grade 4 and WHO grade 3 was reported for 2742 patients (60.4%), of which 1781 had grade 4 gliomas (65%). Each study included different MRI sequences and the most common was the contrast-enhanced T1W (CE-T1W; 92.5%), followed by T2W (T2W; 77.5%), T2W fluid-attenuated inversion recovery (T2-FLAIR; 70%), and T1W (65%). Other MRI modalities were diffusion-weighted imaging (DWI; 20%), perfusion-weighted imaging (PWI; 15%), and proton magnetic resonance spectroscopy (1H-MRS; 5%).

A number of 31 studies reported on ML (77.5%), 3 articles reported on DL (7.5%), and 6 were hybrid studies (15%). As for the classifiers, LR was the most adopted (55%) followed by SVM (40%), RF (30%), CNN (17.5%), GB (7.5%), and others. Table 1 (Ref. [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56]) shows a summary of the studies included.

Table 1.Summary of studies included.

Author, Journal, Year	Patients (n.)	WHO Grade (n.)				MRI Protocol	Method	Classifiers
		LGG	HGG
				3	4
Chen, Int J Biomed Imaging, 2018 [20]	274	54	220	NA	NA	T1W, CE-T1W, T2W, T2-FLAIR	ML & DL	SVM, CNN
Cheng, IEEE J Biomed Health Inform, 2022 [21]	438	119	319	NA	NA	T1W, CE-T1W, T2W, T2-FLAIR	DL	CNN
Cheng, IEEE/ACM Trans Comput Biol Bioinform, 2022 [22]	350	92	258	NA	NA	T1W, CE-T1W, T2W, T2-FLAIR	ML	LR, SVM, RF, GB
Cho, Annu Int Conf IEEE Eng Med Biol Soc, 2017 [23]	108	54	54	NA	NA	T1W, CE-T1W, T2W, T2-FLAIR	ML	LR
Cho, PeerJ, 2018 [19]	285	75	210	NA	NA	T1W, CE-T1W, T2W, T2-FLAIR	ML	LR, SVM, RF
Ding, Quant Imaging Med Surg, 2022 [24]	50	NA	NA	NA	25	CE-T1W	ML & DL	RF, CNN
Ditmer, J Neurooncol, 2018 [25]	94	14	80	NA	NA	CE-T1W	ML	LR
Gao, Front Oncol, 2020 [26]	369	147	222	116	106	CE-T1W	ML	LR, SVM, RF
Gihr, Front Oncol, 2020 [27]	26	26	0	0	0	T2W, DWI	ML	LR
Guo, Diagn Interv Radiol, 2021 [28]	152	47	105	39	66	T1W, CE-T1W, T2W, T2-FLAIR	ML	LR
Gutta, Am J Neuroradiol, 2021 [29]	220	59	161	46	115	T1W, CE-T1W, T2W, T2-FLAIR	DL	CNN
Hashido, J Comput Assist Tomogr, 2021 [30]	52	18	34	NA	NA	T1W, CE-T1W, T2W, T2-FLAIR, DWI, PWI	ML	LR, SVM, RF
Hashido, Sci Rep, 2020 [31]	46	15	31	4	27	T1W, CE-T1W, T2W, T2-FLAIR, PWI	ML	LR
Hu, Comput Biol Med, 2021 [32]	505	233	272	107	165	T1W, CE-T1W, T2W, T2-FLAIR	ML	SVM
Huang, J Comput Assist Tomogr, 2021 [33]	59	13	46	NA	NA	T1W, CE-T1W, T2W, T2-FLAIR	ML	LR
Kobayashi, Sci Rep, 2021 [34]	355	NA	NA	NA	259	T1W, CE-T1W, T2W, T2-FLAIR	ML & DL	LR, CNN
Li, Cancers, 2022 [35]	212	105	107	0	107	T1W, CE-T1W, T2W, T2-FLAIR	ML	SVM
Lin, Med Phys, 2022 [36]	100	50	50	NA	NA	T1W, CE-T1W, T2W, DWI, 1H-MRS	ML	LR
Liu, Neuroradiology, 2022 [37]	182	63	119	NA	NA	T1W, CE-T1W, T2W, T2-FLAIR, DWI	ML	LR
Lu, Clin Cancer Res, 2018 [38]	214	NA	NA	NA	106	CE-T1W, T2-FLAIR, DWI	ML	SVM
Nakamoto, Sci Rep, 2020 [39]	157	NA	157	55	102	CE-T1W, T2W	ML	LR, SVM, RF, NB, NN
Ning, Ann Transl Med, 2021 [40]	334	112	222	117	105	CE-T1W, T2-FLAIR	ML & DL	SVM, CNN
Park, Korean J Radiol, 2019 [41]	314	213	101	101	NA	CE-T1W, T2W, T2-FLAIR	ML	RF, GB, ENR, LDA
Reza, J Med Imaging, 2019 [42]	285	75	210	NA	NA	CE-T1W, T2W, T2-FLAIR	ML	SVM, RF, GB
Skogen, European Journal of Radiology, 2016 [17]	95	27	68	34	34	T1W, CE-T1W, T2W, T2-FLAIR	ML	LR
Su, Am J Transl Res, 2021 [43]	139	69	70	36	34	T1W, CE-T1W, T2W, T2-FLAIR, DWI	ML	LR
Su, Eur Radiol, 2019 [44]	217	95	122	61	61	T1W, CE-T1W, T2W, T2-FLAIR, DWI, PWI	ML	LR
Sudre, BMC Med Inform Decis Mak, 2020 [45]	333	101	232	74	158	PWI	ML	RF
Takahashi, Int J Radiat Oncol Biol Phys, 2019 [46]	55	14	41	12	29	T1W, CE-T1W, T2W, T2-FLAIR, DWI	ML	LR, SVM
Tian, J Magn Reson Imaging, 2018 [47]	153	42	111	33	78	T1W, CE-T1W, T2W, DWI, PWI	ML	SVM
Vamvakas, Phys Med, 2019 [18]	40	20	20	NA	NA	T1W, CE-T1W, T2W, T2-FLAIR, DWI, PWI, 1H-MRS	ML	SVM
van der Voort, Neuro Oncol, 2023 [48]	238	47	191	59	132	T1W, CE-T1W, T2W, T2-FLAIR	DL	CNN
Wang, J Magn Reson Imaging, 2019 [49]	85	34	51	NA	NA	CE-T1W, T2W, DWI	ML	LR
Xie, J Magn Reson Imaging, 2018 [50]	42	15	27	13	14	T1W, CE-T1W, T2W, T2-FLAIR	ML	NA
Xu, Front Oncol, 2022 [51]	572	190	382	NA	NA	T1W, CE-T1W, T2W	ML & DL	RF
Xu, Quant Imaging Med Surg, 2022 [52]	129	62	67	NA	NA	CE-T1W, T2 FLAIR, DWI	ML	LR, SVM, RF, NB, NN
Zhang, J Digit Imaging, 2020 [53]	108	43	65	NA	NA	DWI	ML & DL	SVM
Zhao, BMC Neurol, 2020 [54]	69	36	33	33	NA	CE-T1W, T2-FLAIR	ML	RF
Zhou, Int J Clin Pract, 2022 [55]	114	35	79	21	58	T1W, CE-T1W, T2W	ML	LR
Zhou, Neuro Oncol, 2017 [56]	84	NA	NA	NA	0	T1W, CE-T1W, T2W, T2-FLAIR	ML	LR

Abbreviations: NA, not available; WHO, world health organization; LGG, low-grade glioma; HGG, high-grade glioma; MRI, magnetic resonance imaging; T1W, T1-weighted; CE-T1W, contrast-enhanced T1W; T2W, T2-weighted; T2-FLAIR, T2W-fluid-attenuated inversion recover; DWI, diffusion-weighted imaging; PWI, perfusion-weighted imaging; 1H-MRS, proton magnetic resonance spectroscopy; ML, machine learning; DL, deep learning; SVM, support vector machine; CNN, convolutional neural network; LR, logistic regression; RF, random forest; GB, gradient boost; NB, naïve bayes; NN, nearest neighbors; ENR, elastic net regression; LDA, linear discriminant analysis.

3.3 Primary Outcomes

The performance of radiomics for LGGs vs. HGGs and WHO grade 3 vs. 4 categorizations was reported for a total of 3290 patients and 704 patients, respectively. Overall SPE and SEN for differentiation between LGGs and HGGs were 91% (95% CI = 0.86–0.94) and 84% (95% CI = 0.78–0.89), respectively. With regards to the discrimination task between WHO grade 4 and WHO grade 3, the overall SEN was 89% (95% CI = 0.82–0.94) and the overall SPE was 81% (95% CI = 0.66–0.91), respectively. Fig. 2 shows the univariate forest plots of the analysis for discrimination between LGGs vs. HGGs and grade 3 vs. grade 4 gliomas. Fig. 3 provides the respective SROC curves for the different binary categorization tasks considered in our analysis. In the canonical receiver operating characteristic (ROC) curve, each data point belongs to a single study in which several different diagnostic thresholds are employed to categorize between two classes of interest (e.g., cases and non-cases). In a single study, changing the threshold results indeed in the plot of the true positive rate (TPR) against the false positive rate (FPR) at each threshold setting. Conversely, in a meta-analysis, the units of analysis are separate and single studies. Hence, the SROC curve aims to represent the relationship between TPR and FPR across studies, recognizing they may have used diverse thresholds. As above mentioned, to fit such curves, we put into effect the R package mada, which, besides the SROC, provides two further figures of merit: (i) the confidence region of the summary estimate, which provides a measure of the ROC-based performance heterogeneity among the included studies; (ii) the prediction region for a new hypothetical study, which is the zone in the false positive rate vs. sensitivity space, in which it is likely to expect a new study.

Fig. 2.

Forest plots for discrimination between LGGs vs. HGGs (A) and grade 3 vs. grade 4 gliomas (B). The red diamonds represent the overall results.

Fig. 3.

SROC curves for the differentiation between LGGs vs. HGGs (A) and grade 3 vs. grade 4 gliomas (B). Each curve provides: (i) the summary estimate of the considered study (see the red circle); (ii) the confidence region of the summary estimate (see the red ellipse-like trace), which furnishes a measure of the performance heterogeneity among the included studies; (iii) the prediction region for a new hypothetical study, where it is likely to expect a new study (see the dashed line). Abbreviations: SROC, summary receiver operating characteristic.

3.4 Secondary Outcomes

Our subgroup analysis did not evidence variables to significantly impact the performance of the models. Nonetheless, when looking at the most performing classifiers, we observed a better trend for non-linear algorithms such as SVM and CNN, as shown in Fig. 4.

Fig. 4.

SROC curve for the two classification tasks considered together. From this graph, we can assess that the non-linear classifiers tend to perform better with respect to simpler solutions such as logistic regression classifiers. This can be evinced from the position of the non-linear classifiers (CNN, SVM, RF, and CNN + SVM) which are more towards the top-left corner of the graph, with respect to the linear (LR).

We furthermore investigated the following variables, which may significantly impact the model performance: (i) the selected features, which are the final data representation, in a numerical form, fed to the AI algorithm to perform the given classification task; (ii) the MRI modalities from which the input images are obtained. Regarding features, the most used were the textural (28.6%), followed by the deep (20.4%) and voxel intensities features (12.2%). Fig. 5A provides a visual representation of the features distribution among the studies included in our review. In this figure, each square corresponds to a single radiomic model. Note that in a single study, more than one radiomic model may be described, each one tailoring a different and specific classification task. Concerning the employed MRI modality, the CE-T1W was the most reported MRI sequence (36.5%), followed by T2-FLAIR (15.6%), and DWI (14.6%). Fig. 5B shows the distribution of the MRI modalities among the included studies. A majority of studies (58.1%) used a cross-validated analysis.

Fig. 5.

Radiomics features. (A) A visual representation of the radiomics features distribution among the studies included in our review. Each square corresponds to the features fed as input to a single radiomic model. Note that in a single study, more than one radiomic model may be described, each one tailoring a different and specific classification task. A final remark must be made concerning the so-called “deep features from Imagenet”. This is indeed a widely used data representation in the context of deep learning, as detailed in the corresponding studies. (B) Pie chart showing the distribution of the MRI modalities among the included studies. Note that a single radiomic model may be fed with data obtained from different MRI modalities.

3.5 Study Heterogeneity

The $\chi{}$ ${}^{2}$ test suggested substantial heterogeneity of SEN and SPE, for both LGGs vs. HGGs and WHO III vs. IV categorizations.

4. Discussion

We provided an overview of the performance of the current radiomics models for glioma grading prediction. The overall performance resulted higher for the HGGs vs. LGGs discrimination task than the WHO grade 3 vs. 4 task, both in terms of SPE and SEN. The FPR was higher than the false negative rate (FNR) for both differentiation tasks indicating a greater capability to rule out HGGs or WHO grade 4 gliomas rather than identifying these entities. The studied variables did not impact significantly the performance, but we observed a better trend for non-linear classifiers such as SVM and CNN.

4.1 Radiomics Models

As shown in Fig. 6, there has been an outstanding growth of expertise in AI developments and laboratories, which has brought on a significantly higher number of studies published over the years.

Fig. 6.

Cumulative number of radiomics studies included in our study over years.

Radiomics has captured the interest of neuroradiologists and neuro-oncologists, who progressively upheld the research field [12, 57]. On the one hand, novel strategies for feature extraction and segmentation methods have been developed that reverberated the impact of radiomics on many fields of neuro-oncology and neuroradiology, multiplying the possible clinical applications [4, 7, 9, 11]. On the other hand, the concept of DL, as a method of data representational learning able to learn from end to end by itself without requiring any handcrafted features or human-based data representation, originated as an innovative branch of radiomics [58]. Even though a minority of the articles focused on DL, many hybrid studies were included that imbricate the ML classifiers with novel DL algorithms, accomplishing fully automatic detection processes. All studies included in our meta-analysis extracted features (handcrafted or DL-based) from MRI sequences, but other imaging modalities have been used for different purposes, particularly positron emission tomography in radiogenomics [59, 60].

4.2 Radiomics Performance

Our data confirm the paramount role of radiomics in predicting the grade of cerebral gliomas, as suggested by previous authors [61]. The overall performance reflects the great predictivity of the current models included in the meta-analysis.

We found a higher prediction capability of radiomics for the HGGs vs. LGGs differentiation task than the WHO grade 3 vs. grade 4 task. In accordance with the current literature, the main prognostic impact of glioma grading is provided by the LGGs vs. HGGs differentiation [62, 63, 64]. Even though the definitive diagnosis of gliomas is histopathological, these data strongly support the potential role of radiomics for an initial diagnostic orientation in the definition of grade glioma. This may be particularly important for those patients with doubtful neuroradiological imaging where there is no clear orientation toward a diagnosis of LGG vs. HGG [65, 66, 67, 68].

When looking at the SROC curves, we observed a smaller and better-performing prediction region in the LGGs vs. HGGs differentiation task, while the prediction region of grade 3 vs. grade 4 resulted to be larger and more oriented toward superior FPRs. This suggests that although we have not witnessed any advancements in performance thus far, we have reason to believe that we may observe progress in the upcoming years in distinguishing between LGGs and HGGs. This is shown by the SROC curve, which illustrates the potential for future improvement.

4.3 Determinants of Performance

Given the blossoming of AI models over the last decade, we expected the year of publication to significantly impact the performance of the models. Yet, we did not find a linear correlation between time of development and performance.

Textural features, along with deep and voxel intensities features were the most performing. Feature extraction is driven by algorithms to select ones appropriate for a precise task [9]. For glioma grading, textural features were reported as the best-performing. These common features, quantify the spatial variation of grey-level intensity inferring image heterogeneity [17, 69]. Frequent textural features are energy, entropy, inertia, correlation, and others [47, 50]. Zhou et al. [56] identified gray-level co-occurrence matrix (GLCM)-homogeneity as the main texture feature to predict histological grade from CE-T1W images. Likewise, Liu et al. [37] found two GLCM texture features to reflect the glioma grade. The multiparametric texture analysis of Hashido et al. [31] also included the GLCM features. Their study found that GLCM-based entropy effectively differentiated between LGGs and HGGs in PWI. Skogen et al. [17] used MRI texture analysis (MRTA) to assess tumor heterogeneity. Extraction of texture features at fine anatomical scales best discriminated LGG and HGG.

The most reported best-performing MRI sequences were the CE-T1W, T2-FLAIR, and DWI. The CE-T1W and T2-FLAIR modalities were particularly regarded for the HGGs vs. LGGs differentiation task, while the DWI has mostly been reported as the best-performing sequence for the WHO grade 3 vs. grade 4 discrimination task. Apparent diffusion coefficient (ADC) maps calculated from DWI, along with dynamic contrast-enhanced (DCE), dynamic susceptibility contrast (DSC), and arterial spin labeling (ASL) PWI have been increasingly reported as promising alternatives and will perhaps be the sequences most chosen for feature extraction over the next years [27, 30, 43, 45].

Future models should not disregard external validation to ascertain a sufficient level of reproducibility and reliability. The lack of standardization and the customizability of radiomics reduce the applicability of AI in daily practice. Clear routes of development should be delineated to overcome the diversities in radiomics laboratories. Hopefully, this meta-analysis will orient forthcoming models to more defined and shared processes that will be possibly implemented in clinical practice.

4.4 The Role of Radiomics in the Prediction of Cellular and Molecular Patterns of Gliomas

Gliomas, a collection of primary brain tumors, result from the abnormal proliferation of glial cells, including ependymal cells, oligodendrocytes, and astrocytes. These tumors display distinct cellular characteristics based on their histological subtype. For example, astrocytomas, which make up a significant portion of gliomas, primarily consist of proliferating astrocytes. Oligodendrogliomas, on the other hand, stem from oligodendrocyte precursor cells and are identified by their “fried-egg” appearance, characterized by round nuclei and clear cytoplasm. Ependymomas, the third major subtype, develop from ependymal cells that line the brain and spinal cord ventricles. Each glioma subtype exhibits unique cellular features crucial for precise diagnosis and classification.

The classification and comprehension of gliomas heavily rely on their cellular characteristics. However, the 2016 WHO classification ushered in a transformative era of understanding these tumors through their molecular traits, carrying substantial implications for diagnosis, prognosis, and therapy selection [70]. A pivotal molecular anomaly identified in gliomas is the isocitrate dehydrogenase (IDH) mutation, especially the IDH1 and IDH2 mutations, which manifest in a subset of gliomas. These mutations are linked to distinct clinical and histological attributes and wield a critical role in glioma classification and management. Another crucial molecular alteration in glioma involves the loss of the tumor suppressor gene TP53, contributing to the pathogenesis of high-grade gliomas. Furthermore, the methylation status of the O-6-methylguanine-DNA methyltransferase (MGMT) gene promoter stands as a noteworthy molecular marker that impacts the response to alkylating chemotherapy agents. Molecular profiling has empowered the refinement of glioma categorization into more precise groups, providing guidance for treatment decisions and enhancing prognostic accuracy [71, 72, 73].

The 2021 WHO classification of gliomas brought significant innovations to the molecular classification of these brain tumors, offering a more detailed and comprehensive approach. It emphasizes the importance of integrating both histological and molecular data to establish an integrated diagnosis, leading to a more precise glioma classification. These molecular markers play a pivotal role in refining glioma classification, offering valuable insights for prognosis, treatment planning, and personalized therapeutic strategies. The inclusion of telomerase reverse transcriptase (TERT) promoter mutation, H3K27M mutation, v-raf murine sarcoma viral oncogene homolog B1 (BRAF)-fusion mutation, and the increased focus on O-6-methylguanine-DNA methyltransferase (MGMT) promoter methylation reflects the evolving understanding of glioma biology and the growing need for more accurate diagnosis and management [2].

In this context, radiomics can assume a central role in unveiling the cellular and molecular patterns of gliomas. Utilizing advanced radiomic analyses, which are based on quantitative characteristics derived from medical images, offers a non-invasive approach to assessing the tumor’s heterogeneity, microenvironment, and genetic attributes. For instance, research such as Kickingereder et al. [74] demonstrates that radiomic features can capture variations in cell density, vascularity, and necrosis within gliomas, effectively reflecting their histological and cellular diversity. Furthermore, radiomics can aid in the identification of crucial genetic and molecular markers, like IDH mutations and 1p/19q co-deletion [75]. Additionally, radiomic features may also mirror the molecular diversity of gliomas, contributing to more precise diagnoses, treatment planning, and patient stratification, as highlighted by Lambin et al. [6]. However, it is worth noting that at present, there is still a scarcity of radiomic studies that indicate predictive features for the molecular sub-classifications proposed in the WHO 2021 classification.

4.5 Limitations

The meta-analysis limiting in nature since it was mostly based on retrospective cohort studies, even if the number of patients was considerable. Due to the limited data available, we could not ascertain the performance of radiomics in other discrimination tasks, such as WHO grade 1 vs. grade 2, and grade 3 vs. grade 4. Given the bivariate model of the meta-analysis, we did not calculate the overall accuracy for the differentiation tasks. Nonetheless, to the best of our knowledge, this is the first meta-analysis to picture the current performance of radiomics for glioma grading, providing cutting-edge conclusions to pilot future models.

Moreover, a limitation possibly affecting the generalizability of radiomics studies for classification tasks consists of the data drift phenomenon. Data drift in radiomics can occur due to changes over time in the classification used in radiomics analysis, in this specific case represented by the CNS WHO Classification of Tumors [1]. This can lead to a loss of accuracy and reliability in radiomics models trained on the old classification criteria, which may no longer apply to the new classification [76]. It is worth noting that when it comes to radiomics models, the classification criteria used can also have an impact on their effectiveness. A recent study by Moodi et al. [77] found that ML algorithms delivered superior results in grading gliomas based on WHO 2021 criteria, as compared to the WHO 2016 classification criteria. To mitigate this type of data drift, it is important to update radiomics models and retrain them on the new classification criteria. It is also important to carefully track any changes in classification criteria and ensure that they are well documented so that radiomics analysis can be properly adjusted and validated accordingly [78].

5. Conclusions

This meta-analysis suggests that the current radiomics models perform better in distinguishing between LGGs and HGGs than between WHO grade 3 and WHO grade 4 gliomas, in terms of both SPE and SEN. Enhanced future models with increased accuracy can prove to be of clinical use for categorizing HGGs and LGGs.

MRI is the most preferred imaging method, and the CE-T1W sequence is found to be the most effective for current radiomics models. Textural features are commonly used, and modern non-linear classifiers show a promising trend.

Abbreviations

1H-MRS, proton magnetic resonance spectroscopy; ADC, apparent diffusion coefficient; AI, artificial intelligence; ASL, arterial spin labeling; CE-T1W, contrast-enhanced T1W; CIs, confidence intervals; CNN, convolutional neural network; CNS, central nervous system; DCE, dynamic contrast-enhanced; DL, deep learning; DSC, dynamic susceptibility contrast; DWI, diffusion-weighted imaging; ENR, elastic net regression; FN, false negative; FNR, false negative rate; FP, false positive; FPR, false positive rate; GAN, generative adversarial network; GB, gradient boost; GLCM, gray-level co-occurrence matrix; HGG, high-grade glioma; IDH, isocitrate dehydrogenase; LDA, linear discriminant analysis; LGG, low-grade glioma; LR, logistic regression; MGMT, O-6-methylguanine-DNA methyltransferase; ML, machine learning; MLP, multilayer perceptron; MRTA, MRI texture analysis; NB, naïve bayes; NN, nearest neighbors; NOS, Newcastle-Ottawa scale; PRISMA, preferred reporting items for systematic reviews and meta-analysis; PWI, perfusion-weighted imaging; RF, random forest; SE N, sensitivity; SPE, specificity; SROC, summary receiver operating characteristic; SVM, support vector machine; T2-FLAIR, T2W-fluid-attenuated inversion recover; TN, true negative; TP, true positive; WHO, world health organization.

Availability of Data and Materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Author Contributions

Conceptualization, EA, MZ, LDM, MMF and PPP; methodology, EA, MZ, LDM, FP, HC, KS, IT, MG, LU and WB; validation, LDM, EA, FP, LU, MG and MMF; formal analysis, LDM, EA, PPP and FP; investigation, LDM, EA, PPP and FP; resources, EA, MZ, LDM, FP, HC, KS, IT, MG, PPP and LU; data curation, LDM, EA, PPP and FP; writing—original draft preparation, LDM, EA, PPP and FP; writing—review and editing, EA, LDM, FP, MG, MMF, WB and LU; visualization, EA, LDM, FP, MG, MMF, WB, TI, MZ, PPP and LU; supervision, EA, LDM, PPP, MMF and WB; project administration, LDM, EA, PPP and MMF. All authors contributed to editorial changes in the manuscript. All authors read and approved the final manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.

Ethics Approval and Consent to Participate

Not applicable.

Acknowledgment

Not applicable.

Funding

This research received no external funding.

Conflict of Interest

The authors declare no conflict of interest.

Associated Data

Supplementary Material.docx

References

[1]

WHO Classification of Tumours Editorial Board. Central Nervous System Tumours. 5th edn. World Health Organization: Geneva, Switzerland. 2022.