- Academic Editors
Background: DNA damage repair (DDR) related genes are associated with the development, progression, aggressiveness, and heterogeneity of low-grade gliomas (LGG). However, the precise role of DDR in LGG prognosis and molecular subtypes remains to be elucidated. Methods: We analyzed 477 and 594 LGG samples from the Cancer Genome Atlas (TCGA) and Chinese Glioma Genome Atlas (CGGA) to develop a prognostic model using the random forest algorithm and Cox regression. Independent prognostic factors were incorporated into a nomogram, and its performance was assessed using receiver operating characteristic and calibration curves. We also used Connectivity Map analysis to identify potential small molecule drugs targeting DDR. Molecular subtypes based on DDR were identified by consensus cluster analysis, and the clinical characteristics, mutation landscape, immune tumor microenvironment, and drug sensitivity of patients with different subtypes in the TCGA and CGGA datasets were further compared. The Boruta algorithm was used to select features from the differentially expressed genes between clusters to generate DDR scores. Results were further validated in the Glioma Longitudinal AnalySiS consortium dataset. Statistical analysis and tests were implemented using R software version 4.0.2. Results: We developed a prognostic model containing six DDR-related genes, which served as a potential independent prognostic indicator in LGG across three datasets. The area under the curve (AUC) values for 1-, 3-, and 5-year survival in the TCGA dataset were 0.901, 0.832, and 0.771, respectively. The nomogram demonstrated high accuracy in predicting 1-, 3-, and 5-year survival, with AUC values greater than 0.8. Additionally, we identified and validated two molecular subtypes based on DDR genes. These subtypes exhibited significant differences in somatic mutations, clinical prognosis, and immune cell infiltration. One subtype showed higher immune and stromal scores, worse prognosis, and increased sensitivity to common chemotherapeutic agents. Finally, we established a DDR score which served as another promising prognostic predictor for LGG. Conclusions: The prognostic model and molecular subtypes based on DDR genes can help in more detailed classification and provide insights for personalized management of LGG and clinical drug development.
Low-grade gliomas (LGG) are WHO grade II and III gliomas and include oligodendroglioma and anaplastic astrocytoma [1]. Currently, even with the standard treatment strategies, such as maximum safe resection, radiotherapy and chemotherapy, the mean survival varies significantly from 2 to 10 years [2, 3]. Histopathological and clinical results indicate that high heterogeneity and aggressiveness contribute to the great variability in survival of LGG [4]. In fact, residual tumor cells may recur and develop malignant progression soon after surgery, resulting in resistance to treatment and making it more difficult [5]. Based on the above characteristics of LGG, it is crucial to identify biomarkers that may affect the prognosis of LGG and improve the risk stratification of LGG.
Accumulated evidences revealed that molecular diagnosis is more important than tumor grade in the classification and prognosis of LGG [6, 7, 8]. Some biomarkers, including isocitrate dehydrogenase 1 (IDH1) mutation, 1p/19q combined deletion (1p/19q co-deletion) and O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation, revealed the histological characteristics of LGG and could be used to guide LGG treatment regimen [7]. However, in the genetically heterogeneous population, the heterogeneity of LGG determined the low predictive value of these widely used biomarkers, which can neither fully elucidate individual variation nor correctly addressing the problem of LGG risk stratification. With the advanced molecular platforms, this enables the development of new prognostic signatures, accurately identify LGG novel categorized methods and stratify risk based on molecular profiles [9].
In order to maintain genome integrity and stability, cells develop DNA damage repair (DDR) mechanisms that identify and repair damage to the DNA molecules encoded by the genome. Previous studies have shown that DDR, as a widely concerned hallmark in tumor biology, plays a crucial role in tumor genesis, progression and therapeutic response [10, 11, 12, 13]. Defects in DDR prevent cells from properly repairing damaged DNA, leading to the accumulation of genetic changes in normal cells that turn into cancer cells [14]. Consequently, cancer cells generally exhibit a decreased ability to repair DNA compared with normal cells. As a result, DNA replication stress accumulates and DNA damage is superimposed due to dependence on a subset of other repair pathways. Studies have reported the abnormal expression of DNA repair genes contributed to tumorigenesis [15]. Indeed, cells lacking DDR are often associated with sensitivities to DNA damaging agents, so DDR defects associated with cancer may also lead to vulnerabilities that can be exploited therapeutically [16]. This evidence highlights the importance of DDR regulatory molecules in targeted anticancer therapy [17]. The current findings suggest that DDR may be the underlying mechanism of gliomagenesis, rather than a random secondary event [18]. Alterations in DDR pathways have been considered to play essential roles in many aspects of glioma biology such as genesis, progression, therapy resistance, and recurrence [19, 20, 21, 22]. Gliomas usually rely on intrinsic DNA repair pathways to handle the adverse effects of DNA damage such as single- and double-strand DNA breaks caused by radiotherapy and chemotherapy. MGMT, for instance, repairs the cytotoxic DNA damage caused by temozolomide (TMZ), which is considered to be the common mechanism of chemotherapy resistance in glioma [20, 22]. Based on the above knowledge of DDR, a recent glioma treatment strategy is to induce overwhelming DNA damage and inhibit glioma repair mechanisms, and cause fatal cytotoxicity to gliomas, resulting in cell cycle arrest and reduced drug resistance [21, 22]. In addition, targeting DDR with novel agents or therapeutic combinations provides a promising adjuvant therapy for improving the clinical prognosis of glioma [21, 23]. Furthermore, a DDR-related signature had been identified in glioblastoma to predict prognosis [24]. Therefore, the inclusion of DDR-related genes in the study of risk stratification, subtypes, prognosis and treatment response of LGG is promising to provide a new perspective for the clinical treatment of patients.
Based on the Cancer Genome Atlas (TCGA, training set, n = 477) and Chinese Glioma Genome Atlas (CGGA, validation set, n = 594) RNA sequencing datasets, Cox regression and random forest analysis were performed to identify a DDR-related signature and to construct a nomogram for survival prediction of LGG patients. We also demonstrated the underlying mechanism of risk characteristics and screened small molecule drugs targeting for DDR. Moreover, we systematically analyzed DDR-related genes in LGG, and classified LGG subtypes according to DDR-related genes. Mutations of each molecular subtype were further presented, and the relationship between the molecular subtype and the prognosis, immune tumor microenvironment (TME), chemotherapy and clinical characteristics of patients were further explored. Finally, the DDR score which could be served as an independent and effective prognostic factor was constructed. The results of the analyses were validated in an additional validation dataset-Glioma Longitudinal AnalySiS (GLASS) [25] consortium dataset (https://www.glass-consortium.org). Robust prognostic model, DDR-score system and molecular subtype based on DDR-related genes will improve risk stratification in LGG patients and provide more accurate personalized clinical management assessments.
mRNA expression data with clinical information of 506 and 596 LGG patients were collected from TCGA (training dataset) and CGGA (validation dataset), respectively. The corresponding clinical data include: gender, age, histology, subtype, IDH1 mutational status, MGMT promoter status, 1p/19q co-deletion status and survival information. Somatic mutation data were obtained from the TCGA portal. Considering that the GLASS dataset (https://www.glass-consortium.org) contained only the minimal clinical information (n = 330) [25], the GLASS dataset only served as an additional validation dataset to validate a portion of results. A total of 150 genes (Supplementary Table 1) associated with DDR were retrieved from Molecular Signature Database V7.0 (MSigDB) [26] (http://www.broad.mit.edu/gsea/msigdb/, HALLMARK_DNA_REPAIR).
The mRNA expression data for samples were pre-processed according to the following steps. Samples with missing clinical information and those with an overall survival (OS) of less than 30 days were excluded, and 477 LGG samples from TCGA were eventually included in this study for follow-up analysis (Supplementary Table 1). In addition, the GLASS dataset was used as an independent validation dataset, in which samples from TCGA were excluded.
The DDR-related genes shared among the TCGA, CGGA, and GLASS datasets were retained for further analysis. Consequently, the final dataset comprised expression profiles of these common DDR-related genes derived from the aforementioned three datasets.
The CGGA mRNA sequencing data included both mRNAseq_693 (Illumina HiSeq platform) and mRNAseq_325 (Illumina HiSeq 2000 and 2500 platforms). Therefore, the batch effects between these two datasets were removed using the “sva” R package [27] and a total of 594 LGG patients in CGGA were retrieved (Supplementary Table 1).
In order to identify gene signature and
construct risk score, the R package
(http://cran.r-project.org) “survival” and
“survminer” were used to perform univariate Cox proportional hazard regression
on 477 LGGs in TCGA dataset to select DDR-related genes that are highly
correlated with prognosis (p
Risk score
According to the risk score formula, samples from the training and validation datasets were divided into high-risk and low-risk groups based on the median risk scores. Kaplan-Meier (KM) curves were used to estimate the survival divergence between different risk groups. To determine the independent predictive power of each variable, we conducted multivariate and univariate regression analyses using R “survival” package and visualized by the “forestplot” package in R. Additionally, we retrieved the gene protein expression level from the Human Protein Atlas (HPA) database to evaluate the expression of prognostic genes.
A nomogram is a graphical representation of a mathematical model that estimates the outcome or probability of an event based on multiple variables or inputs, serving as a visual tool to simplify complex calculations. Nomograms facilitate clinicians’ decision-making processes, enabling them to tailor treatment strategies for individual patients according to their unique characteristics and risk factors. Univariate and multivariate Cox regression were conducted to identify the independent prognostic risk factors for nomogram construction. Using the “rms” packages, calibration curves were drawn to assess the OS probability for LGG patients at 1-, 3-, and 5-years. In addition, nomogram accuracy was also determined by concordance index (C index).
Gene set enrichment analysis (GSEA) was performed using the GSEA V3 software (Broad Institute, Inc., Massachusetts Institute of Technology, and Regents of the University of California) [28] to identify gene sets that were statistically different between different risk groups in LGG samples of TCGA and CGGA mRNA expression profiles. GSEA V3 software was used to select the MSigDB “c2.cp.kegg.v7.1.symbols.gmt” gene set as a reference to screen out significantly enriched pathways with NOM p-value and false discovery rate (FDR) less than 0.05 and 0.25, respectively.
We consulted the recently updated Connectivity Map (CMap,
http://cmap.topcoder.com/) to screen for potential small molecule compounds
targeting DDR-related genes in the hope of achieving new breakthroughs in LGG
therapy. “limma” R package [29] was applied to identify differential expression
genes (DEGs) with FDR
“ConsensusClusterPlus” [31] R package was performed for consensus clustering of the prognostic DDR-related genes. Similarity distance among samples was calculated using Euclidean distance and clustering was conducted using K-means. Resampling analysis was conducted to estimate the optimal cluster number based on cumulative distribution function (CDF) and consensus matrix. And the clustering results determined by the above methods were also validated in GLASS dataset. The principal component analysis (PCA) was applied to evaluate the distribution of the molecular subtypes in TCGA and GLASS datasets via the “princomp” package. The differences in OS among different clusters in TCGA and GLASS datasets were present using KM curves. We further estimated single nucleotide polymorphism (SNP) distribution of each cluster. The somatic mutation profiles of LGG were firstly obtained from TCGA data portal and were divided into distinct clusters for analysis according to consensus clustering. The somatic mutation data of each cluster were summarized, analyzed, and visualized based on the Mutation Annotation format (MAF) file using the “maftool” [32] R package, and the driver genes were identified. The copy number alterations (CNA) data in each cluster were also generated by GISTIC2.0 from GDAC Firehose (https://gdac.broadinstitute.org) [33].
782 marker genes for predicting the abundance of 28 tumor-infiltrating immune cells (TIICs) were obtained from Broad Institute and Bindea et al. [34]. According to the characteristics of expressed genes, gene set variation analysis (GSVA) software “gsva” [35] R package was used for single-sample gene set enrichment analysis (ssGSEA), and the enrichment scores of 28 different immune cell types in each LGG sample were quantitatively analyzed. The levels and functions of immune infiltration were compared using Kruskal-Wallis test, and heatmaps and boxplots were used to visualize the results. In addition, the immune score and matrix score were calculated through the ESTIMATE algorithm [36].
The chemotherapy response of LGG samples in TCGA and GLASS were predicted using the Genomics of Drug Sensitivity in Cancer (GDSC) database [37]. GDSC is a large-scale research project that aims to identify genomic biomarkers associated with the response of cancer cells to various anti-cancer drugs. We assessed the sensitivity of eight chemotherapy drugs (Cisplatin, Erlotinib, Methotrexate, Vincristine, Carmustine, TMZ, Rapamycin and Doxorubicin) in gliomas. By using the “pRRophetic” [38] R package we estimated LGG’s half maximal inhibitory concentration (IC50) by ridge regression, and 10-fold cross-validation was performed to determine the accuracy.
To construct a DDR-related score, the “limma” [29] R package was applied to
filter DEGs among DDR clusters (adjusted p
Statistical analysis and tests were implemented through R software version 4.0.2
(R Foundation for Statistical Computing, Vienna, Austria). One-way ANOVA with
Tukey’s test (q-value 0.05, ǀlog2FCǀ
An LGG prognostic risk model was constructed based on DDR gene in training set.
Firstly, the relationship between 150 DDR-related genes and survival was analyzed
by univariate Cox regression and 87 DDR-related genes that were associated with
LGG survival (p
Risk Score = TARBP2
Gene | coef | HR | Lower 95% CI | Upper 95% CI | p value | Importance | Relative importance |
POLL | –1.114708732 | 0.328010804 | 0.184609791 | 0.582802716 | 0.000144148 | 0.014227642 | 1 |
POLR3GL | –1.10017873 | 0.332811595 | 0.178848352 | 0.619315506 | 0.000516393 | 0.009700665 | 0.681818182 |
SDCBP | 0.433562531 | 1.542743818 | 0.920800243 | 2.584771783 | 0.099640942 | 0.004018847 | 0.282467532 |
POLD3 | 0.748353899 | 2.113518088 | 1.328505275 | 3.36239441 | 0.001582825 | 0.003972653 | 0.279220779 |
STX3 | 0.300994954 | 1.351202523 | 1.018388798 | 1.792781167 | 0.036954795 | 0.00392646 | 0.275974026 |
TARBP2 | 0.512574212 | 1.66958353 | 1.062467899 | 2.623617303 | 0.026234321 | 0.003603104 | 0.253246753 |
Abbreviations: CI, confidence interval; HR, hazard ratio; coef, coefficient; OS, overall survival.
As shown in Fig. 1, there were more deaths in TCGA cohort with a high-risk score
than low-risk score, while the heatmap (Fig. 1A) showed that changes in
TARBP2, POLR3GL, POLL, STX3, POLD3
and SDCBP gene expression increased the risk value in different samples.
KM curve analysis results indicated that the prognosis of patients in different
risk groups was significantly distinct (p
Construction of prognostic gene signature in the Cancer Genome Atlas (TCGA) dataset. (A) Risk score, survival time, survival state and expression of the 6 genes in TCGA training dataset. The panel consists of three rows: top row showed the risk score distribution for the high- and low-risk score group, color transition from green to red indicates the increasing risk score of corresponding expression level of DNA damage repair (DDR) genes from low to high; middle row represents the low-grade gliomas (LGG) patients’ distribution and survival status; the bottom row shows that the heatmap of 6 prognostic DDR-related genes expression. (B) Kaplan–Meier curves of prognostic models for high- (n = 238) and low-risk (n = 239) subgroups of patients with LGG in TCGA datasets. (C) Receiver Operating Characteristic (ROC) curve and area under the curve (AUC) of the 6-gene signature classification.
To validate the survival risk score, we calculated the risk score of patients in CGGA and GLASS validation datasets with the same regression coefficient and repeated the above analysis, and as expected, the validation datasets also achieved consistent results (Supplementary Fig. 2A–C and Supplementary Fig. 3A,B). We also compared our signature with other previously reported risk models [42, 43, 44, 45], the result showed that our signature had excellent AUC value when compared to the other signature, indicating that our risk model is the optimal (Supplementary Fig. 4). In addition, we also evaluated the protein expression level through the HPA database and found that TARBP2 and POLD3 were highly expressed in the tumor tissue, while POLR3GL and SDCBP showed low expression in normal tissue (Supplementary Fig. 5).
The association between risk score and major clinical characteristics in LGG was investigated. We found a significant divergence of risk score between different IDH1, MGMT promoter methylation, 1p/19q deletion, and pathological grades in TCGA and CGGA datasets. Patients with IDH1-wildtype (WT), 1p/19q non-codel, G3 grade and unmethylated MGMT promoter in LGG had significantly higher risk scores than those with IDH1-Mutant (Fig. 2C), 1p/19q codel (Fig. 2D), G2 grade (Fig. 2F) and methylated MGMT promoter (Fig. 2E) in TCGA and CGGA datasets (Supplementary Fig. 6C–F). The above results were similar to our previous studies [46].
Comparisons of risk score plot between age (A), gender (B), IDH (C), 1p/19q (D), MGMT (E), and grade (F) in TCGA dataset. IDH, isocitrate dehydrogenase; MGMT, O6-methylguanine-DNA methyltransferase.
However, differences in risk scores were not consistent across age stratification and gender in both datasets. In the TCGA dataset, patients over 50 years old had significantly higher risk scores than those aged 50 years or younger (Fig. 2A), and there was no difference between male and female patients (Fig. 2B). However, in CGGA dataset, patients of different ages (Supplementary Fig. 6A) and genders (Supplementary Fig. 6B) did not have significantly different risk scores. Possibly, this is due to differences in basic clinical characteristics between the two datasets, such as bias or ethnicity.
In addition, we validated the prognostic differences between high- and low-risk group in stratified cohorts of different clinical characteristics (e.g., IDH1-WT/-Mutant, MGMT promoter methylated/unmethylated and 1p/19q codel/non-codel cohorts). Results were consistent with expectations. In the stratified cohorts of TCGA (Fig. 3) and CGGA (Supplementary Fig. 7) datasets with different clinical characteristics, except for the IDH1-WT (Fig. 3F) group of TCGA, grade G2 (Supplementary Fig. 7K) and IDH1-WT (Supplementary Fig. 7F) of CGGA, the high-risk groups showed poor prognosis. A possible reason for this due to the small sample size of the IDH1-WT group in TCGA and CGGA.
Kaplan-Meier survival curves for the high- and low-risk groups stratified by clinical factors in TCGA dataset. (A,B) Age. (C,D) Gender. (E,F) IDH. (G,H) 1p/19q. (I,J) MGMT. (K,L) Grade.
Clinical characteristics (age, gender, risk score, grade, IDH1, MGMT promoter
methylated, 1p/19q codeletion) in TCGA and CGGA datasets were analyzed by
conducting univariate and multivariate Cox regression. As shown in the forest
plot (Fig. 4), the risk score constructed by the 6 DDR-related genes was an
independent prognostic risk factor in TCGA (p
DDR signature as an independent prognostic factor shows strong prognostic power. (A) Univariate and (B) multivariate Cox regression analysis of clinical characteristics and DDR signature for survival of LGG in TCGA training dataset.
To further improve the accuracy of prognosis, a nomogram integrating risk score
and independent clinical characteristics was constructed to visualize the
prognostic value of different survival-related variables in TCGA and CGGA
datasets (Fig. 5A and Supplementary Fig. 9A). In addition, to assess the
performance of nomogram, a calibration curve analysis was conducted (“rms” R
package) and we observed that the prediction curve was close to the ideal curve
(Fig. 5B–D and Supplementary Fig. 9B–D), indicating that the nomogram
had excellent prediction function. We discovered that the nomogram showed a high
accuracy in the prognosis of 1-, 3- and 5- year (AUC value
Construction of nomogram. (A) A nomogram constructed based on
independent prognostic factors. The calibration plot for internal validation of
the nomogram in 1- (B), 3- (C) and 5-years (D) OS. (E) ROC curve and AUC of the
Nomogram (F) Comparisons of the ROC curve among the clinical factors and
nomogram. Abbreviations: OS, overall survival; ROC, receiver operating
characteristic; AUC, area under the curve (AUC). * p
GSEA provides valuable insights into the
biological processes and pathways that are potentially dysregulated in the
context of the studied conditions, helping researchers understand the underlying
molecular mechanisms and guiding the development of targeted therapies or
interventions. GSEA analysis of different risk groups in the TCGA cohort revealed
different biological processes. Significant candidate pathways were selected
according to the criteria with FDR
In order to identify candidate compounds that may potentially influence the expression of DDR prognostic genes for the treatment of LGG, DEGs between different risk groups (191 up-regulated genes and 81 down-regulated genes, “limma” R package, Supplementary Table 1) were uploaded to the CMap database. As a result, 35 compounds with 25 mechanisms of action (MoA) were identified (Fig. 6). Results indicated that multiple compounds shared the same MOA. For example, six compounds (scriptaid, ISOX, vorinostat, belinostat, THM-I-94 and HC-toxin) shared the MOA of histone deacetylase (HDAC) inhibitor; four compounds including amsacrine, irinotecan, teniposide and SN-38 shared the MoA of Topoisomerase inhibitor; another two compounds (NU-7441and KU-0060648) shared the MoA of DNA dependent protein kinase inhibitor. In addition, a total of five compounds (TW-37, butein, everolimus, NVP-TAE684 and XMD-892) shared the following five tumor-related MoAs, respectively: B cell lymphoma 2 (BCL) inhibitor, epidermal growth factor receptor (EGFR) inhibitor, mammalian target of rapamycin (MTOR) inhibitor, anaplastic lymphoma kinase (ALK) inhibitor and mitogen-activated protein (MAP) kinase inhibitor. The present research identified potential small molecule compounds that target DDR.
Identification of potential molecular compounds and its MoAs in
LGG. Heatmap showed each compound from the CMap that shares a MoA (rows), sorted
by descending number of compounds with a shared MoA. The above compounds have an
absolute value of enrichment score
We calculated the enrichment levels of 28 immune-related cells for each risk group using ssGSEA algorithm. We found that most immune-related cells have a significant divergence between different risk groups in TCGA and CGGA dataset, such as Memory B cell, Regulatory T cell, Eosinophil, Macrophage, Natural killer T cell. Interestingly, the enrichment level was significantly higher in high-risk group. (Fig. 7 and Supplementary Fig. 11).
The landscape of 28 immune cell infiltration level in high-risk
(n = 238) and low-risk (n = 239) groups in TCGA dataset. ns, not significant; *
p
Consensus clustering was conducted for clustering analysis of 477 samples and 150 DDR-related genes expression profiles in TCGA RNA-seq dataset. All LGG samples were divided into K clusters (K = 2–9) using the “ConsesusClusterPlus” R package. Based on the CDF curve, 477 samples were divided into two clusters after a thorough consideration (Fig. 8A–C). OS significantly differed between the two clusters based on the KM curve analysis (Fig. 8D). PCA results showed that the two clusters were independent of each other (Fig. 8E).
DDR-related genes could distinguish LGG patients in TCGA with different clinical and molecular features. (A) Consensus clustering CDF for k = 2 to k = 9. (B) Delta area plot exhibits relative change in area under CDF curve when at a certain k value compared with k-1. (C) Consensus clustering matrix heatmap plots of 477 samples from TCGA datasets for k = 2. (D) KM analysis of patients among two clusters,the prognosis of Cluster A (n = 361) is better than Cluster B (n = 116). (E) PCA analysis of the DDR-related genes expression when k = 2. Abbreviations: CDF, cumulative distribution function; KM, Kaplan-Meier.
The above analyses were repeated in GLASS dataset and the results obtained were shown in Supplementary Fig. 12. Heatmap of the two clusters (the top 100 variably expressed genes) showed that a significant divergence existed in the IDH1 status, 1p/19q co-deletion status and MGMT methylation status (Fig. 9). Furthermore, we compared the risk score differences among the two different clusters in TCGA. Interestingly, as shown in Supplementary Fig. 13A, Cluster B with the worse prognosis had the higher risk scores. Conversely, Cluster A with the better prognosis had the lower risk scores. Therefore, we identified two LGG clusters based on DDR gene expression. Clustering results showed that there was a wide range of heterogeneity in LGG, and DDR-related genes were closely associated with LGG malignancy grade and prognosis.
The landscape of DDR-related genes among the two subtypes (Cluster A, n = 361; Cluster B, n = 116). Heatmap of two clusters defined by the top 100 variable expression genes.
We further showed the mutant landscape in different clusters of LGG patients. By analyzing the somatic mutation data in TCGA dataset, we evaluated the somatic variation distribution of driver genes among LGG subtypes, to clarify whether there are differences in the frequency of somatic mutation and observe the different patterns of mutations among LGG clusters. LGG driver genes were obtained by Maftools. The analysis of TCGA mutation annotation files showed that there were obvious mutation characteristics between Cluster A and Cluster B. Among them, IDH1, TP53 and ATRX were the top three mutation genes and Cluster A corresponded to more mutation frequency than Cluster B (Supplementary Fig. 14A,B). Our cluster-specific CNAs analysis revealed that chromosome deletions and amplifications, for example, deletion of 9p21.3 were significantly enriched in Cluster B. The 9p21.3 deletions is associated with the poor survival outcome in LGG [47] (Supplementary Fig. 15).
The scores of different clusters were compared, as shown in Fig. 10A–C, Cluster
B with the worst prognosis in TCGA dataset had the higher immune and stromal
scores compared to Cluster A, while tumor purity was lower than the other cluster
(ANOVA, p
The landscape of immune infiltration and TME characteristics in two subtypes. Violin diagram showed the differences in stromal score (A), immune score (B) and tumor purity (C) among 2 clusters. (D) Unsupervised clustering of LGG patients from the TCGA cohort using ssGSEA scores which represent the 28 cell infiltration. The cluster, age, survival status, gender, grade, IDH1 status, 1p/19q status and MGMT status are shown as patient annotations in the lower panel.
The correlation between LGG subtypes and TME features was further explored in our present study. Results of ssGSEA showed that in LGG patients of TCGA cohort, the enrich scores of activated B cells (Ba), eosinophils and monocytes in Cluster B patients with worse prognosis were highly expressed compared to Cluster A. Fig. 10D shows the high expression level of immune cells in Cluster A.
Currently, surgery and chemotherapy are the most common treatments for cancer.
Therefore, we sought to evaluate the differences in estimated IC50 levels between
the clusters for eight chemotherapeutic agents, including Cisplatin, Erlotinib,
Methotrexate, Vincristine, Carmustine, TMZ, Rapamycin and Doxorubicin. Ridge
regression was used to train the prediction model on the dataset of GDSC cell
lines, and 10-fold cross-validation was performed to evaluate prediction
accuracy. The IC50 value of each LGG patient was then calculated according to the
prediction model of the eight chemotherapeutic drugs. Results showed that except
for Erlotinib (Wilcoxon rank test, p = 0.483 (Fig. 11D), the IC50
sensitivity of the seven chemotherapeutic agents was statistically different in
the two clusters. Interestingly, the IC50 estimates for the seven
chemotherapeutic agents in Cluster B were significantly lower than those in the
other cluster, including Carmustine (Wilcoxon rank test, p = 1.927
Cluster B is more sensitive to most chemotherapies. Box plots depicted the differences in the estimated IC50 levels of (A) Carmustine; (B) Cisplatin; (C) Doxorubicin; (D) Erlotinib; (E) Temozolomide; (F) Methotrexate; (G) Vincristine and (H) Rapamycin between two clusters.
A total of 574 DEGs were identified between the two clusters (Supplementary Fig. 16A,B). The GO enrichment analysis exhibited that these DEGs mainly enriched in chromosome segregation, nuclear division, and DNA replication (Supplementary Fig. 16C). The KEGG pathway enrichment analysis result was shown as Supplementary Fig. 16D.
We first performed the univariate cox regression analysis and selected the genes
with p-value
Construction and validation of the DDR score. (A) KM survival curve analysis in the overall survival (A), progression free survival (C) and disease special survival (DSS) (E). ROC analysis was applied to evaluate the specificity and sensitivity of the DDR score in overall survival (B), progression free survival (D) and DSS (F). Abbreviations: KM, Kaplan-Meier; OS, overall survival; DSS, disease special survival.
We further evaluated the response to immune checkpoint blockade (ICB) of DDR score in the IMvigor210 cohort (bladder cancer). Of note, survival outcome was significantly different from LGG, with low DDR scores associated with poorer survival outcomes (Supplementary Fig. 18).
In the present study, we conducted a comprehensive and systematic data mining analysis on LGG, focusing on DDR-related genes, using two large-scale cohort datasets. Our research involves the development of a prognostic model, molecular subtyping, immune landscape, chemotherapy response, and drug development based on DDR, as well as the establishment of a DDR scoring system that shows promising potential as an independent prognostic factor. The validity of these findings has been confirmed through verification using additional datasets and the GLASS consortium dataset. The prognostic model, DDR scoring system, and molecular subtyping framework based on DDR-related genes hold promising potential to improve risk stratification, prognostic assessment, and treatment response prediction for LGG patients. This research offers more precise personalized clinical management decisions for patients and clinicians and has the potential to advance the development of small-molecule drugs for LGG.
In recent years, the continuous development of high-throughput whole genome sequencing technology has made it possible to further explore the molecular pathogenesis of LGG. It has been shown that DDR-related genes are related to the genesis, development, and prognosis of gliomas by some researchers. For example, Jin et al. [24] identified 5 prognostic DDR-related genes in primary GBM (pGBM) patients according to the risk scoring model. However, Jin’s study had a small sample size, with only 89 pGBM cases in CGGA dataset. Similarly, Zeng et al. [45] developed and verified DDR risk signatures (CRY2, HDAC1, DCLRE1B, KPNA2) related to the prognosis of LGG by Cox and Lasso regression in two independent datasets: CGGA (172 samples) and TCGA (451 samples). In order to quantify the prognosis of LGG patients, a nomogram prediction model was developed. Pang et al. [48] further suggested that IDH mutation may affect the prognosis of LGG by regulating the DDR pathway. The above studies all focused on the value of DDR in gene prognostic prediction of LGG, and some studies had insufficient sample size to prove the robustness and effectiveness of signature in population samples. In this study, in comparison with other studies, we further systematically and comprehensively analyzed the differences of DDR gene-based LGG molecular subtypes, the relationship between subtypes and immune microenvironment and their mutation patterns, to assist in targeting LGG chemotherapy. Finally, we also constructed a molecular subtype classifier. These differences and innovations set our study apart.
Here, we systematically identified 6 prognosis-related DDR-related genes
including POLL, POLR3GL, SDCBP, POLD3,
STX3 and TARBP2. Multiple studies underscored the tumor-driver
roles of the hub genes identified in present study. These hub genes were
potential to be considered as novel therapeutic targets and prognostic predictors
in LGG. POLL, also known as Pol or Pol
Because of the heterogeneity of tumors, patients with the same cancer may
respond differently to specific drug treatments. The identification of individual
predictive biomarkers of drug sensitivity is the key to improve the accuracy of
cancer medicine [97, 98]. TMZ is the standard first-line chemotherapy for glioma.
Despite the use of a uniform treatment regimen, the prognosis of glioma varies
greatly. Some patients have excellent results, while others do not respond to TMZ
[7, 99]. Cisplatin is a commonly used
chemotherapy drug in clinical settings, and its main target is DNA. By forming
platinum-DNA adduct in cells, it inhibits DNA replication and transcription, thus
inducing DNA damage and apoptosis in tumor cells. Cisplatin is an important agent
for clinical treatment of a variety of solid tumors. Due to its high sensitivity
to glioma [100, 101], it is also an effective chemotherapy agent for the
treatment of LGG in children [101, 102]. In addition, TMZ combined with platinum
drugs exhibits a better effect in the treatment of recurrent malignant tumors,
which can effectively improve the clinical symptoms of patients, control cancer
progression and prolong the survival time of patients [100]. In addition, the
final results of the phase 3 randomized clinical trial RTOG9802 [103] showed that compared with
adjuvant radiotherapy(RT) and RT+ PCV (Procarbazine, Lomustine, and Vincristine)
Owing to their high diversity and plasticity, immune cells can play a variety of roles in TME, such as anti-tumor or pro-tumor [104]. The interaction among TIIc, malignant cells and stromal cells leads to the formation of TME [105]. In tumor progression and response to immunotherapy, tumor infiltrating immune cells exhibit crucial effects [106]. There is growing evidence that genetic perturbations in cancer cells determine the immune environment of a tumor [104]. Tumor cells can circumvent anti-tumor immune responses or immune checkpoint blockade therapy by regulating TME, for example by down-regulating antigen-presentation features and up-regulating repressor proteins or cytokine release [107]. DDR gene mutations are associated with increased tumor-infiltrating lymphocytes, genomic instability, and tumor mutation burden in cancer [108]. Additionally, DDR and immune response systems work together by activating each other, leading to a bidirectional connection. Disruption of DDR–immune response cross talk compromises (multi)cellular integrity, leading to cell-cycle-related and immune defects [109]. Therefore, a growing number of studies have revealed the potential therapeutic significance of the interaction between DDR deficiency and immune TME components [104]. Here, we quantified immune and stromal cells via ESTIMATE and ssGSEA algorithm of each LGG samples in TCGA and CGGA datasets. Our data suggested that different LGG clusters were characterized by different immune and stromal scores as well as tumor purity. Furthermore, we found that the LGG cluster with a worse prognosis featured by increased infiltration levels of activated B cells (Ba), eosinophils and monocytes. And we have observed a higher enrichment of immune cells in the high-risk group of LGG patients. This finding is consistent with previous studies that have indicated the significance of immunobiology as a dominant factor in malignant processes, particularly in gliomas [110, 111, 112]. Additionally, the immune infiltrating cell signature has been recognized as a prognostic marker in gliomas [113, 114]. These observations suggest that immune processes play a crucial role in the development and progression of LGG.
While it may seem counterintuitive that immune cells, which are typically associated with an anti-tumor response, are enriched in the high-risk group, it is important to note that immune cells have complex and diverse functions in TMEs. The specific roles of immune cells in cancer are context-dependent and can vary based on their subtypes and functional states [115, 116]. In our study, we identified specific immune cell subtypes that were positively correlated with a higher risk score, including Memory B cell, Regulatory T cell, Eosinophil, Macrophage, Natural killer T cell, myeloid-derived suppressive cell (MDSC). The enrichment of specific immune cell subtypes in the high-risk group of LGG patients suggests their potential involvement in LGG progression and aggressiveness. For instance, MDSCs have been reported to promote B-cell-mediated immunosuppression in gliomas [117]. Infiltrated tumor-associated macrophages have been associated with poor survival in glioma patients [118]. T cells have been found to mediate immunosuppression and resistance to treatment in gliomas [119]. NK-cell-targeted therapies have also been highlighted in combating immune escape in IDHmut gliomas [120]. By interactions with cancer cells, macrophages offer a cancer-favorable microenvironment, which induce immune escape and LGG proliferation and metastases [121]. Therefore, targeting these immune cells could potentially benefit LGG patients with an unfavorable prognosis. However, the precise mechanisms through which immune cells influence LGG development and progression are still being actively investigated.
Although the results of our study contribute to the clinical treatment of gliomas, there are inevitably some limitations that need to be acknowledged. First, this study is a retrospective analysis based on open datasets and rigorous validation, but more external datasets, multi-center prospective studies even experimental studies are needed to verify the clinical value and prognostic robustness of the signature. It should be noted that while our findings are promising, the limited amount of data analyzed in a specific type of cancer (i.e., LGG) calls for cautious interpretation and further validation. Furthermore, basic experiments are necessary to verify our results and elucidate the molecular mechanism that underlies them, thereby helping us better understand DDR-related genes.
In summary, we identified the risk signatures of DDR-related genes and combined risk score with clinical information to construct a nomogram to optimize OS risk assessment. And we further analyzed and compared enrichment pathways between different risk groups and identified potential small molecule drugs targeting DDR. We systematically analyzed DDR-related genes in LGG to provide a comprehensive understanding of the molecular subtypes, mutations, clinical prognosis, chemotherapy sensitivity, and patient characteristics of LGG. Our study suggests that the DDR score system developed in this research could potentially serve as an independent prognostic indicator, providing valuable insights into the prognosis of LGG patients. In conclusion, the signature model and molecular typing based on DDR genes are helpful for more detailed classification and provide a useful clue for promoting personalized management of LGG and clinical drug development.
The datasets analyzed for this study are available from TCGA (http://cancergenome.nih.gov/), CGGA (http://www.cgga.org.cn/index.jsp), and GLASS (https://www.glass-consortium.org).
ZH conceived and designed the study. ML significantly contributed to the study design and the interpretation of the data. ZH and YX carried out data analysis and completed the image visualization. ZH drafted the manuscript, and all authors critically revised it for important intellectual content. All authors contributed to editorial changes in the manuscript, read and approved the final version, and agreed to be accountable for all aspects of the work, ensuring that questions related to any part of the work are appropriately investigated and resolved.
TCGA, CGGA, and GLASS are public datasets. The patients involved in these open-source datasets have obtained ethical approval. Since our research is based on open source data, there are no ethical concerns or conflicts of interest.
We sincerely acknowledge TCGA, CGGA, GLASS and the developers of these datasets.
This research was supported by Qingdao Clinical Research Center for Rare Diseases of Nervous System (22-3-7-lczx-3-nsh), and Qingdao Key Health Discipline Development Fund.
The authors declare no conflict of interest.
Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.