Identification of hypoxia-related prognostic signature for ovarian cancer based on Cox regression model

Objective : The purpose of this study is to establish a good prognostic risk assessment model of hypoxia genes to evaluate the 3-year and 5-year survival rates of patients with high-grade serous ovarian cancer. Methods : We performed differential analysis of hypoxia genes in the GSE26712 data set. The differential genes were then, analyzed in the TCGA ovarian cancer data set for risk regression analysis and verified in the GSE26712 data set. In addition, we performed a functional enrichment analysis on the genes in the signature of hypoxia, and further analyzed the level of hypoxia risk and immune infiltration. Finally, a nomogram combining the hypoxia risk score, clinical stage, pathological grade, 3-year and 5-year survival rate was constructed. Results : A signature containing 12 hypoxia-related genes was identified as a Cox regression model for predicting the prognosis of ovarian cancer, and verified it in an independent data set. Subsequent enrichment analysis revealed that the signature is related to the immune system. We have also demonstrated a significant relationship between the signature of hypoxia and the infiltration of certain immune cells. Finally, the nomogram shows the accuracy of hypoxia characteristics in predicting ovarian cancer prognosis. Conclusion : We have established a good prognostic risk assessment model for ovarian cancer related to hypoxia risk, which provides personalized survival predictions and possible targeted treatment strategies.


Introduction
Ovarian cancer is the seventh most common cancer in women and the eighth most common cause of cancer death [1]. Ovarian cancer comprises a wide range of subtypes, which has been classified into epithelial ovarian cancers (90% of cases) and the non-epithelial subgroups. Different ovarian cancers present different prognoses. Epithelial ovarian cancer (EOC) is a lethal and aggressive gynecologic malignancy with poor long-term prognosis [2]. There is a lack of clinical means for early screening, and most screenings were at an advanced stage when while diagnosed [3]. The pathogenesis of ovarian cancer is still unclear, and it is currently believed to involve endocrine, genetic changes, immune inflammation and many other aspects [4].
In terms of treatment, although great progress has been made in the general treatment of ovarian cancer, such as surgery, radiotherapy, chemotherapy, etc. Recent studies have shown that ovarian cancer often presents genome instability, most of them in homologous recombination (HR). Poly (ADP-ribose) polymerase (PARP) inhibitors inhibit DNA repair pathways and cause apoptosis of cancer cells, especially in homologous recombination (HR)deficient cells. PARP inhibitors have drawn increasing amount of attention due to their remarkable efficacy in treating BRCA1/2 mutated ovarian cancers. To date, three PARP inhibitor drugs have been approved for treating ovarian cancer by FDA in United States, namely Olaparib, Rucaparib, and Niraparib [5][6][7]. However, the best treatment strategy remains controversial.
Better survival is achieved through complete debulking surgery and chemotherapy. Historically, neoadjuvant chemotherapy (NAC) has been introduced for unresectable disease to decrease tumor load and perform a unique complete surgery [8]. Studies have shown that at least 70% of ovarian cancer patients require combined chemotherapy after surgery, but most patients still have recurrence or distant metastasis, and the 5-year survival rate is only 27% [9]. Metastasis and drug resistance are the main causes of high mortality in patients with ovarian cancer. The exploration of markers that can be used for early diagnosis and clinical models that predict metastasis and prognosis of tumor patients are important for the treatment of ovarian cancer and improving the survival rate of patients. Although current research shows that genetic risk assessment and tumor & genetic molecular analysis have been reported as the prognostic factors for ovarian cancer in the literature [10], the prognostic biomarkers and predictors of therapeutic responses for ovarian cancer are still not clearly elucidated.
Mounting studies have confirmed that the tumor microenvironment (TME) promotes cancer progression in many aspects, especially therapeutic resistance [11,12]. TME decreases drug penetration and confers the advantage of proliferation and anti-apoptosis to surviving cells, facilitating resistance and common modifications in disease morphology [13,14]. The typical tumor microenvironment includes insufficient blood supply or insufficient oxygen sup-ply for energy metabolism due to the rapid growth of tumor cells. More and more studies have shown that hypoxia, as a marker of the tumor microenvironment, exists in most tumors, and continuous hypoxia can cause tumor cells in the part far away from blood vessels to necrosis due to hypoxia [15,16]. The presence of hypoxic regions is one of the independent prognostic factors for human cancer. Tumor cells, while adapting to hypoxia, lead to more aggressive and therapeutically resistant tumor phenotypes. Hypoxia induces changes in gene expression and subsequent proteomic changes that have many important effects on various cellular and physiological functions, ultimately limiting patient prognosis [17][18][19].
The aim of this article is to enrich the analysis of hypoxia risk characteristic genes on the TCGA ovarian cancer dataset and Genecard database hypoxia-related genes, and establish a survival analysis model of the level of hypoxia risk and the degree of immune infiltration in patients with ovarian cancer. In addition, a new detection marker and drug treatment target for ovarian cancer can be obtained to provide a new perspective for early detection and prognosis analysis of ovarian cancer.

Data acquisition and processing
The transcription expression data set GSE26712 was obtained from NCBI Gene Expression Omnibus (GEO). The Cancer Genome Atlas (TCGA) high-grade serous ovarian cancer transcriptome dataset is downloaded from the GDC data portal (https://portal.gdc.cancer.gov/). We performed RMA background correction and log2 conversion through the R packages affy and affy PLM to complete the normalized GEO raw gene expression data. Before bioinformatics analysis, gene annotation data was used to match probes with similar gene symbols. Genes with multiple missing expression values (expression level = 0 and expression level 20% higher) were deleted. These data were obtained from open access sources, so ethical approval was not required. Human high-grade serous ovarian cancer tissues and normal ovarian tissues were obtained from the Obstetrics and Gynecology Department of Shijitan Hospital, Capital Medical University, China. All the tissue donors had signed biological sample donation consent forms, and the project had been approved by the hospital ethics committee.

Gene expression differential analysis
GSE26712 contains the transcriptome expression data of 185 cases of high-grade serous ovarian epithelial carcinoma and 10 cases of normal ovarian epithelium. We searched for hypoxia keywords in the Genecard database and obtained 5000 hypoxia-related genes. Based on the processed data, we used the R package limma to further analyze these hypoxia-related genes and obtain differentially expressed genes (DEGs). DEGs standard for subsequent analysis: Fold change (FC) ≥1.5, corrected p value < 0.05, and corrected by the Benjamini-Hochberg method.

Prognostic risk prediction model construction verification
The prognostic data is formed by matching the transcriptome expression matrix of the differentially expressed genes with the follow-up data. Firstly, we used univariate Cox regression analysis to screen hypoxia genes related to the prognosis of ovarian cancer. Secondly, the Least absolute shrinkage and selection operator (LASSO) Cox regression analysis was used to determine the prognostic value of hypoxia genes. Based on the regression results of LASSO Cox and the expression level of hypoxia genes, we generated a risk score for each patient. The following formula was used to calculate the risk score: risk score = ∑ n i=1 coef * id [20]. Then, the patients were divided into high-risk groups and low-risk groups according to their scores relative to the median risk score. Kaplan-Meier analysis was used, and the log-rank test was used to compare the survival rates between the two groups. Finally, the receiver operating characteristic (ROC) analysis was used to estimate the predictive power of the signature over the 3-year and 5-year period. In addition, we verified the predictive ability of the prognostic model in the data set GSE26712.

Feature-rich analysis using literature keywords
GenCLiP3 (http://ci.smu.edu.cn/genclip3/analysis.p hp) was used to perform functional enrichment analysis in differentially expressed hypoxia genes to identify functions related to hypoxia in ovarian cancer. Keywords (pre-defined or user-supplied) from the enrichment of gene-related literature were used to annotate the gene list.

Evaluation of tumor infiltrating immune cells (TIIC)
CIBERSORTx (https://cibersortx.stanford.edu/), an online analysis tool, was used. It uses RNA expression data to determine the proportion of specific cells, thereby determining the proportion of 22 TIICs in the data set. We used R language to visualize the results and evaluate the correlation between immune cells and prognosis.

Construction of nomogram
After univariate Cox regression analysis was performed, all independent prognostic factors were screened by multivariate Cox regression analysis to construct a prognostic nomogram, and the rms R software package was used to evaluate the probability of 3 and 5 years overall survival of TCGA ovarian cancer patients. The discriminative performance of the nomogram was quantitatively evaluated by C index and AUC. The calibration chart was also used to graphically evaluate the discriminative ability of the nomogram. All analyses were performed using R (version 4.1.0, TUNA Team, Tsinghua University, China), and a p value of <0.05 was considered statistically significant. Report haz-ard ratios (HRs) and 95% confidence intervals (CIs) when necessary.

Identification of hypoxia-related differentially expressed genes (DEGs)
After differential expression analysis, in the data set GSE26712, compared with normal ovarian epithelial tissues, 544 hypoxia-related differentially expressed genes (DEGs) were identified in ovarian cancer tissues (Supplementary Table 1. Fold change (FC) ≥2, adjusted p Value < 0.05 is the standard). Among them, there are 224 up-regulated genes and 320 down-regulated genes.

Construction and verification of prognostic risk prediction model for hypoxia gene
We performed univariate Cox regression analysis on the 544 differentially expressed hypoxia-related genes to screen the prognostic genes, and found 52 hypoxia genes related to the prognosis of ovarian cancer (Supplementary Table 2). The LASSO Cox regression analysis of these 52 genes determined the best prognostic characteristics consisting of 12 genes (Fig. 1A). The following formula was used to calculate the risk score to determine the prognostic risk of these 12 genes: score = expFOXO1 × 0.182663494 + expCFI × -0.123861038 + expCLDN4 × 0.143347629 + expUBB × -0.073397997 + expTCEAL4 × -0.415789139 + expEPCAM × -0.219868476 + ex-pECI2 × -0.170319867 + expSEC22B × -0.29666949 + expMIF × -0.146635764 + expMCM3 × -0.208898617 + expOGN × 0.174011841 + expCXCL13 × -0.204790419 (Note: expFOXO1 = expression level of FOXO1, and so on). According to the critical value of the thematic risk score, patients were divided into low-risk groups and highrisk groups. Kaplan-Meier analysis showed that the survival rate of OC patients in the high-risk group was worse than that in the low-risk group (Fig. 1B). In order to determine the predictive effect of the 12 gene markers, time-dependent ROC curve analysis and the area under the curve (AUC) value were used as indicators. The analysis showed that the 3-year AUC was 0.681 and the 5-year AUC was 0.719, highlighting the great potential of this function in predicting the prognosis of OC (Fig. 1C). Based on the signatures of the above 12 genes, we verified the model in the data set GSE26712. Similarly, in the GSE26712 data set, the survival rate of OC patients in the high-risk group was worse than that in the low-risk group (p = 0.0004369, Fig. 1D). In addition, the 3-year AUC in the GSE26712 data set was 0.662 and the 5-year AUC was 0.749 (Fig. 1E). In addition, we also showed the differential expression of 12 signature hypoxia-related genes in 10 normal ovarian epithelium and 185 serous ovarian epithelial carcinoma in Table 1.

Functional enrichment analysis in predictive models
In order to further understand the possible functions of genes in the prediction model in ovarian cancer, we performed a function enrichment analysis using the GenCLiP3 online tool. Based on the analysis of 12 genes in the prediction model, more than 10 keywords with statistical significance were identified (Fig. 2). These results established multiple different clusters, indicating that ovarian cancer was related to multiple biological functions. These keywords were based on vascular cell, endothelium, monocyte, leukocyte, hypoxia, transcriptase, estrogen receptor, keratinocyte, dendritic cell, CD4+ T cell, CD8+ T cell, T cell proliferation (Supplementary Table 3).

Assessment of hypoxia-related tumor infiltrating immune cells (TIIC)
After preprocessing the TCGA data, we obtained 378 RNA expression matrix samples, which were then processed by the CIBERSORTx online tool to obtain the proportion of immune cells in each sample. There was a significant difference in the proportion of immune cells in each sample (Fig. 3A). Since our prediction model found that hypoxia genes were related to tumor immune infiltration, we divided the tumor samples into high-risk groups and lowrisk groups according to the level of hypoxia risk score and the median value of the score, and compared them. The infiltration of immune cells in the two groups was analyzed. The results showed that there were significant differences between the two groups at high and low risk of hypoxia in a variety of immune cell infiltration, including: plasma cells, T cells CD4 memory resting, T cells CD4 memory active, T cells follicular helper, and Macrophages M1 (Fig. 3B). We also explored the relationship between immune cells and prognosis. The ratio of macrophages M1 and plasma cells in ovarian cancer samples was significantly correlated with the patient's prognosis (Fig. 3C,D).

Nomogram
In order to establish a clinically applicable method for predicting the prognosis of patients with ovarian cancer, we established a prognostic nomogram based on the TCGA data set to predict the 3-year and 5-year survival rates of patients. The results of univariate and multiple Cox regression analysis were showed in Supplementary Table 4 and Supplementary Table 5. After screening, 4 independent prognostic parameters including age, clinical stage, pathological grade, and hypoxia risk were involved in the prediction model (Fig. 4A). The C index of the nomogram was 0.668 (95% CI, 0.649 to 0.687; p = 1e-12). The calibration chart (Fig. 4B-C) shows that there was a slight deviation between the 3-year survival rate nomogram in the TCGA cohort and the actual observation; however, the 5-year survival rate nomogram had excellent results consistency. These results all showed the reliability of the nomogram.

Immunohistochemistry of Hub genes
After analysis, we determined that 4 genes CXCL13, EPCAM, FOXO1, UBB were the Hub genes, and these 4 genes also occupied an important position in the risk assessment model we established before. In order to verify the accuracy of the study, we performed immunohistochemistry (IHC) on the Hub genes. The results showed that compared with normal ovarian tissue, the expression level of CXCL13 and EPCAM in High-grade serous ovarian cancer tissue was significantly increased, while the expression of FOXO1 and UBB was significantly decreased (Fig. 5). What was interesting was that the results of immunohisto- chemistry were consistent with the results of our previous analysis (Table 1). This showed that our results were accurate and reliable.

Discussion
The latest research in oncology has found that the obvious abnormality of the metabolism and redox state of cancer cells is one of the essential characteristics of cancer cells. The instability of the cancer cell genome and the excessive proliferation of cancer cells have led to an increase in the demand for ATP by cancer cells. Due to the defects in the mitochondrial DNA, protein and membrane lipid structure of cancer cells, the efficiency of the respiratory chain and ATP synthesis is low, so additional energy supply is required through glycolysis and other pathways [21,22]. The lack of oxygen supply in cancer tissues and its oxidative oxygen consumption promote competition among cancer cell populations, so that single cancer cells that are conducive to the production of high levels of mitochondrial oxidation and anaerobic glycolysis can be screened out, or they can be oxidized separately [23]. This makes the energy metabolism of cancer tissues abnormal, and further leads to abnormal redox status of cancer cells. Cancer cells can tolerate the hypoxic environment, so that it can invade and metastasize, induce angiogenesis, and even resist apoptosis and reprogramming. The overexpression of hypoxiainducible factors in cancer cells is another important discovery that the redox state of cancer cells is obviously abnormal [24][25][26]. Studies have shown that hypoxia inducible factor (HIF) plays an important role in the excessive proliferation, immortalization, invasion and metastasis of cancer cells, and immune evasion [27]. The MAPK family, especially extracellular signal-regulated kinase (ERK) was the classical anti-apoptotic pathways [28]. Activated ERK signaling pathway promotes the occurrence and development of cancer by up-regulating the expression of anti-apoptotic proteins, proliferation-related proteins and migration and invasion-related proteins [29]. The inhibition of ERK signalling transduction and the induction of apoptosis are associated with inhibited tumour progression [30]. PARP1 inhibition causes a loss of ERK2 stimulation by decreasing the activity of critical pro-angiogenic factors, including vascular endothelial growth factor (VEGF) and HIF [7].
In our study, based on the relationship between tumor and hypoxia, we used univariate Cox regression analysis to screen hypoxia genes (FOXO1, CF1, CLDN4, UBB, TCEAL4, EPCAM, EC12, SEC22B, MIF) that are related to the prognosis of ovarian cancer. This model has important value in predicting the prognosis of patients with ovarian cancer and provides possible therapeutic targets for the treatment of ovarian cancer. The results of immunohistochemistry also confirmed that the Hub genes in our predicted model were significantly different between highgrade serous ovarian cancer and normal ovarian tissue. The tumor microenvironment of ovarian cancer is an important condition for the progression, metastasis and drug resistance of ovarian cancer, and the immune cell population is an important part of the tumor microenvironment [31,32]. In primary ovarian tumors and ascites, the main population of immune cells is composed of macrophages. They have two main sources, one is tissue-resident macrophages, and the other is infiltrating macrophages recruited from bone marrow-derived monocytes. Macrophages in the tumor microenvironment are transformed into tumor-associated macrophages, which can exhibit M1 and M2 types according to different stimuli [33,34]. In our research, we also found that most of the hypoxia genes closely related to the prognosis of ovarian cancer are related to immune cell infiltration. Among them, the M1 type of macrophages and plasma cells are significantly related to the prognosis of ovarian cancer. The higher the ratio of M1 macrophages to plasma cells, the better the prognosis of ovarian cancer. This result is same to the previous anti-tumor effect on tumor-associated macrophages M1 type. Liu et al. [35] analyzed 13 independent studies involving 2218 high-grade serous ovarian carcinoma (HGSOC) patients, and the results showed that a high proportion of the M1 phenotype is associated with a favorable overall survival rate. A study of 112 ovarian cancer patients clearly showed that the higher the tumor microenvironment M1/M2 ratio in tumor specimens, the higher the survival time [36]. Another cohort study of patients with high-grade serous ovarian cancer showed that a high M2/M1 ratio is associated with reduced progression-free survival and poor overall survival [37]. This shows that the ratio of M1/M2 may be more related to the overall survival rate of patients with ovarian cancer.

Conclusions
All in all, our research had constructed a prognostic risk prediction model for ovarian cancer related to hypoxia genes. This model has important value in predicting the prognosis of patients with ovarian cancer and may propose therapeutic targets for ovarian cancer. In addition, we also found that this model may affect the prognosis of ovarian cancer by regulating immune cell infiltration, which prompted us to further understand the regulatory mechanism of ovarian cancer prognosis. Finally, we also established a new prognostic nomogram model, which combines the risk of hypoxia and clinical parameters to provide personalized survival predictions and more targeted treatment strategies. In addition, we also confirmed by immunohistochemistry that the Hub genes in the model were different in expression.

Author contributions
WS and WB designed the research study. WS performed the research, analyzed the data and wrote the manuscript. WB provided help and advice on the study. All authors contributed to editorial changes in the manuscript. All authors read and approved the final manuscript.

Ethics approval and consent to participate
The study had been approved and agreed by the Medical Ethics Committee of Beijing Shijitan Hospital, Capital Medical University (approval number: sjtkyll-lx-2020(06)). The transcription expression data set GSE26712 was obtained from NCBI Gene Expression Omnibus (GEO). The TCGA ovarian cancer transcriptome dataset is downloaded from the GDC data portal (https://portal.gdc.c ancer.gov/), and the data presented in this manuscript are available from the corresponding author on reasonable request. All original research have been ethically approved and agreed.