†These authors contributed equally.
Academic Editor: Andrzej Semczuk
Background: Previous studies have identified hundreds of constantly changing metabolic genes in cervical cancer, however, their prognostic effect remains to be explored. Methods: In this paper, Cox univariate regression and Lasso regression models were used to identify metabolic genes associated with squamous cervical cancer prognosis, and developed a prognostic risk score. Next, on the basis of the median risk score, cervical squamous cancer patients were divided into two groups: high- and low-risk patients. Kaplan-Meier analysis and receiver operating characteristic (ROC) curves were used to evaluate the predictive efficacy of the metabolic gene prognostic risk model. In addition, we analysed the correlation between drug sensitivity, immune cell infiltration, and Gene set variation analysis (GSVA) and the metabolic gene prognostic risk model. Results: The results showed that the prognosis of patients in the high-risk group was worse. The metabolic gene prognostic model was correlated with immune cell infiltration. It is also correlated with sensitivity to common chemotherapeutic drugs. In addition, gene set enrichment analysis results revealed several significantly enriched pathways, which may help to explain the underlying mechanisms of cervical carcinogenesis. Conclusions: The proposed prediction model can be potentially used for prognosis prediction of cervical cancer.
Cervical cancer is one of the most common malignant tumors in women. It is reported that there are approximately 530,000 new cases and 270,000 deaths per year worldwide [1, 2, 3]. Approximately 70% of cervical cancers are SCCs (squamous cell carcinomas), the most common type [4]. To treat the early-stage cervical cancer, the standard approach is through surgery, while concurrent chemo-radiation are the main treatments for advanced cervical cancer. Even though the people with early stage or localized lesions can undergo surgery to improve their chances of long-term survival. However, treatment is limited due to drug resistance and recurrence [5], and the five-year survival rate for advanced cervical cancer (stages II–IV) is only 15%–69% [6], which seriously endangers the physical and mental health of women. Numerous studies have demonstrated the tight connection between metabolic disorders and an increased risk of cancer, and malignant tumors are capable of metabolic reprogramming [7]. With the deepening understanding of tumor biology and tumor metabolism, it is found that the metabolic abnormalities in tumor tissues are far more complex than previously recognized. In addition to abnormal energy metabolism, tumor tissues also exhibit defective carbon and amino acid metabolism [8, 9]. Compared with normal tissues, different tumor cells and tumor tissues have metabolic heterogeneity. Local tumors also show a dependence on metabolic reprogramming during tumor metastasis [7]. The functions of metabolic-related mechanisms in cervical cancer patients, however, remain unclear. To build a more accurate prediction model, we used Cox univariate regression model and Lasso regression algorithm. Furthermore, the model’s clinical predictive value was also studied using more bioinformatics analysis methods. In cervical squamous cell carcinoma (CESC) patients, the metabolic gene-associated model accurately predicts survival outcomes, which can lead to individualized treatment.
From the TCGA database (https://portal.gdc.cancer.gov/), we downloaded the raw
mRNA expression data of CESC, including the normal group (n = 3) and the CESC
group (n = 306). To analyze differentially expressed genes, FPKM data of mRNA
level 3 were integrated and normalized using the Limma package. The differential
gene screening conditions were
ClusterProfiler (R3.6, R Core Team, Vienna, Austria) was used to annotate the functions of differential genes and explore their functional relevance. Relevant functional categories were identified using GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes). We considered a statistical difference if p value and q value all less than 0.05.
The prognostic correlation models were constructed using Lasso regression once differential metabolism genes were selected. For each patient, a risk score formula was constructed by incorporating the expression values of each specific gene and weighting them with their regression coefficients in Lasso regression analysis. The risk score formula is shown below.
Where Metabolic gene i is the identifier of the i-th selected core metabolic gene. According to this formula, median risk score was used to distinguish low-risk and high-risk patients. A Kaplan-Meier survival curve was computed and then compared using log-rank statistics. Using Lasso regression and hierarchical multiple regression, we examined the role of risk score in predicting patient prognosis. We evaluated the model prediction performance using ROC curves.
We predicted each tumor sample’s chemotherapy susceptibility using the “pRRophetic” function in R software (R3.6, R Core Team, Vienna, Austria) based on the GDSC Cancer Drug Susceptibility Genomics Database. Based on the GDSC dataset, 10 cross-validations were conducted to determine the accuracy of the IC50 estimates for each specific chemotherapy drug treatment. We set all the parameter to default settings, including the “combat”, which removes batch effects, and the average of duplicate gene expression.
RNA-seq data from different subgroups of CESC patients was analyzed using the
CIBERSORT algorithm to determine the relative proportions of immune infiltrating
cells. Pearson correlation analysis is performed for gene expression and immune
cell content, and statistical significance is defined as p
By thoroughly scoring the gene sets of interest, Gene set variation analysis (GSVA) turns changes at the gene level into changes at the pathway level and establishes the biological function of the samples. In this study, molecular signatures database (v7.0) gene sets were downloaded, and GSVA scores were used to assess possible biological functional changes between samples.
R software (R3.6, R Core Team, Vienna, Austria) and SPSS 19.0 (SPSS Inc.,
Chicago, IL, USA) were used for all statistical analyses. Kaplan-Meier survival
curves were generated and compared using log-rank. Multivariate analysis is
performed using Cox proportional risk models. All statistical tests were
two-sided, and we consider is as statistically significant if p
We extracted a collection of 2235 metabolism-related regulatory factors from the
downloaded mRNA expression data (FPKM). The differential expression between
cervical cancer patients and control patients are investigated using the Wilcox
test. According to the results, 378 metabolism-related genes, including 177
up-regulated and 201 down-regulated genes, were differentially expressed (logFC
The identification of metabolism-related genes with differential expression. (A) Heat map of the differentially expressed genes (DEGs). (B) Volcano plot of the DEGs.
As a result of GO and KEGG pathway enrichment analysis, a large number of pathways were significantly enriched for differentially expressed genes, such as coenzyme metabolic process, cytoplasmic vesicle lumen, coenzyme binding, etc. (Fig. 2A). In the KEGG enrichment process, there are central carbon metabolism in cancer, biosynthesis of amino acids and other metabolism-related pathways as illustrated in Fig. 2B. At the same time, through the Metascape database, we further analyzed candidate gene pathways. According to the results, these candidates were mainly enriched in hormone, organic hydroxy compound metabolic, and small molecule biosynthetic pathways (Fig. 2C). Fig. 2D shows the co-expression network analysis result of genes in the differential gene set.
Functional analysis of metabolism-related genes. (A) DEGs enriched by GO. (B) KEGG enrichment analysis of DEGs. (C) Related pathways of DEGs. (D) Protein network analysis of DEGs.
In order to identify the key metabolic genes, we collected clinical information
from CESC patients. According to Fig. 3A–C, we used Lasso regression and Cox
univariate regression algorithms to identify cervical cancer signature genes. By
using Cox univariate regression, 59 prognosis-related genes were identified:
ISCU, TCN2, FOXP3, PGK1, UCP2, SPINT2, FASN, MOCS1, TFRC, ADH1B, TNF,
MIR200A, GAPDH, MMP1, PFKFB4, GCH1, CDIPT, ACOT4, LDHA, LEPR, BCL2, SPP1,
TNFRSF1B, TGFA, CPE, CA9, NPL, SPTBN1, GALNT3, DHCR7, FGFR2, PFKM, NME4, SDS,
SQLE, SLC25A42, JUN, NT5E, TRPV4, HSPG2, PLA2G7, SLC25A10, CKB, APOC1, GART,
SLC2A3, SCD, VDAC1, ENO1, APOD, SELP, HMGCS1, NR1D2, TREM2, PLIN2, FGFR3,
PIP4K2B, COX5A and HK2. After randomly dividing TCGA patients into
training and validation sets in an equal ratio, Lasso regression analysis was
used to determine the best risk score. The metabolic gene prognostic risk score
model was constructed as the formula below: Risk Score =
FGFR2
Prognostic risk model construction. (A,B) Lasso regression. (C) The regression coefficient of differentially expressed genes. (D,E) Training and test set survival analysis.
Validation of the risk model. (A) ROC curves of the training cohort (1-year AUC = 0.88, 3-year AUC = 0.88, 5-year AUC = 0.89). (B) ROC curves of the testing cohort (1-year AUC = 0.83, 3-year AUC = 0.86, 5-year AUC = 0.74).
The relationship between the tumor immune infiltration and the risk score was
further investigated. The results showed that risk score was significantly and
positively correlated with Macrophages M0 (p
Multiple omics maps of the risk model. (A,D) Immune infiltration and risk score. (B) Relationship between risk score and drug sensitivity. (C) Relationship between model and SNP mutation. (E,F) TMB and MSI were significantly different between the two groups.
We then studied two groups’ specific signaling pathways. Results of the GSVA revealed that ADIPOGENESIS, TNFA_SIGNALING_VIA_NFKB, ANDROGEN_RESPONSE, MYOGENESIS, and KRAS_SIGNALING_DN were the most enriched pathways for the two groups of patients, as shown in Fig. 6. It has been shown that altering these signaling pathways can affect the prognosis of cervical cancer patients.
Signaling mechanisms for prognostic model.
Data on processed CESC patients with survival information was downloaded from GEO (GSE44001, GSE52903). And Kaplan-Meier survival analysis was used to assess survival differences. Compared to the low-risk group, OS was significantly lower in the high-risk group (Fig. 7A,B). ROC curve analysis was performed on external data sets to verify the model’s accuracy. The results showed that the model has a strong predictive effect on the prognosis of patients (GSE44001-C-index = 0.68, and GSE52903-C-index = 0.73), as shown in Fig. 7C,D.
Robustness of the prognostic model. (A,B) Overall survival (OS) in the external data sets (GSE44001, GSE52903). (C,D) ROC curve analysis in the external data sets.
The samples were divided into high and low risk groups based on the risk core. Univariate and bivariate analyses of CESC patients showed that the risk score was an independent prognostic factor, as shown in Fig. 8A, B. Nomograms were created to present the results of the regression analysis, the different staging patterns of cervical cancer were significantly correlated with the distribution of risk score values obtained by our model, shown in Fig. 9A. At the same time, the three-year and five-year OS of CESC patients were also predicted and analyzed, and the results showed that there was little difference between the predicted OS and the observed OS, suggesting that the nomogram model had a good prediction effect (Fig. 9B).
The model has an independent prognostic value. (A) Univariate analysis. (B) Multivariate Cox analyses.
Nomogram. (A) Indicators of clinical risk are correlated with the risk score. (B) The model accurately predicted 3- and 5-year OS of CESC.
Approximately 604,127 new cervical cancer cases are diagnosed in 2020 worldwide [10]. The current Federation International of Gynecology and Obstetrics (FIGO) staging can determine the prognosis initially, and FIGO stage II patients have an overall survival rate of 65%–69%, stage III patients are 40%–43%, and stage IV patients are 15%–20% [11]. The key to treating cervical cancer is early detection, early diagnosis, and early treatment. However, relying on FIGO staging alone to assess and judge the prognosis is not enough. From the perspective of tumor driver genes, abnormal expression of some genes is often associated with poorer prognostic outcome. By establishing risk models, we screened metabolism-related prognostic genes that may be involved in the onset, progression and malignant transformation of cervical cancer, which affect the survival of cervical cancer patients. This provides more information to decipher the role that metabolic reprogramming plays in tumor progression.
The role of abnormal energy metabolism in cancer genesis and progression has been reported by previous studies [12]. As metabolic networks in tumor cells are reprogrammed, this leads to reorganization and redirection of nutrient fluxes. We developed and validated a prognostic model based on 25 metabolic genes in this paper. First, these genes may reflect cervical carcinogenesis and contribute to the early diagnosis of cervical cancer. For example, fibroblast growth factor receptor 2 (FGFR2) has been widely studied in a variety of human cancers and may be a potential tumor marker for early screening of cervical cancer [13, 14]. Of course, there are some genes that need to be further explored in the pathogenesis of cervical cancer. Secondly, these genes are expected to be new targets for antitumor therapy targeting cellular metabolic enzymes. For example, PIP4K2A (phosphatidylinositol-5-phosphate 4-kinase type 2 alpha), the protein encoded by this gene is a family of enzymes that catalyze the phosphorylation of phosphatidylinositol-5-phosphate at the fourth hydroxyl group of the myo-inositol ring to form phosphatidylinositol-5, 4-bisphosphate. The results of the study [15] showed that through p85/p110 component degradation in PTEN-deficient glioblastoma, PIP4K2A can negatively regulate phosphoinositide 3-kinase (PI3K) signaling. In addition, Jones et al. [16] demonstrated that PIP4K2A overexpression reduced clonogenic growth. Interestingly, we found that PIP4K2A downregulation in cervical cancer was associated with better OS, which is consistent with other cancer reports and needs further investigation.
The multi-omics study shows that the chemotherapeutic drug sensitivity of tumors
has different expressions in different samples. The drug sensitivity analysis we
used provides a new scheme to distinguish different chemotherapeutic drug
combinations in the high-risk and low-risk groups. Since the tumor
microenvironment (TME) is generally associated with drug resistance in patients,
we further analyzed the relationship between risk score and immune infiltration
to provide new ideas for differentiating drug resistance mechanisms in patients
in both high-risk and low-risk groups. The characteristics of low oxygen and
nutrient deficiency in TME lead to the establishment of metabolic competition
between tumor cells and immune cells, and the accumulation of toxic metabolites
will have a negative impact on immune response [17]. Mitochondrial dysfunction in
CD8
We first used Cox regression to screen genes before constructing the risk prediction model by Lasso regression. In addition to using the test dataset to validate the predictive efficacy, we also used the GEO dataset to validate the model robustness, both of which achieve accurate prediction performance. Moreover, our study can provide insights for identifying potential diagnostic and prognostic biomarkers for other biologically heterogeneous cancers.
In summary, we identify a novel metabolic genetic prognostic model for cervical cancer prognosis prediction based on the TCGA dataset. According to our systematic and comprehensive studies, prognostic models could provide more accurate evaluations of cervical cancer patients’ prognoses.
Data are contained within the article.
XL conceived and designed the experiments, performed the experiments, analyzed the data, contributed materials/analysis tools, prepared figures, authored or reviewed drafts of the paper, approved the final draft. RG analyzed the data, authored or reviewed drafts of the paper, approved the final draft. CW and SC analyzed and interpreted the data, supervised and contributed to the writing. All authors have read and approved the manuscript.
Not applicable.
We thank all members of our study team for their whole-hearted cooperation and all included participants for their wonderful cooperation.
This project was funded by the Henan Medical Science and Technique Foundation (Grant no. LHGJ20190353).
The authors declare no conflict of interest.