RBP7 Regulated by EBF1 Affects Th2 Cells and the Oocyte Meiosis Pathway in Bone Metastases of Bladder Urothelial Carcinoma

Background : Bladder urothelial carcinoma (BLCA) is a malignancy with a high incidence worldwide. One-third of patients may experience aggressive progression later on, and 70% of patients who have undergone surgical intervention will still suffer from metastasis. Materials and Methods : RNA sequencing profiles of BLCA samples were obtained from The Cancer Genome Atlas (TCGA) database. Differential expression and univariate Cox regression analyses were performed to identify prognosis-related differentially expressed immune genes (DEIGs). Subsequently, a proportional hazards model of DEIGs was then constructed by univariate regression analysis. Differential expression and correlation analyses, CIBERSORT, Single Sample Gene Set Enrichment Analysis (ssGSEA), GSVA were conducted on transcription factors (TFs), immune cells/pathways and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The regulation network was then constructed. Eventually, ATAC-seq, ChIP-seq, scRNA-seq, and multiple online databases were employed for further validation. Results : A proportional hazards model of 31 DEIGs was constructed and risk score was calculated and proven to be a independent prognostic factor. Then 5 immune genes were characterized to be significantly correlated with bone metastasis, stage and TF expression simultaneously. 4 TFs were identified to be significantly correlated with prognosis and RBP7 expression. 5 immune cells/pathways were revealed to be significantly correlated with RBP7 expression. Only 1 KEGG pathway was identified to be significant in Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA) analyses. The regulatory relationship was then constructed, in which the correlation between EBF1 and RBP7 (R = 0.677, p < 0.001), Th2 cells and RBP7 (R = 0.23, p < 0.001), the oocyte meiosis pathway and RBP7 (R = 0.14, p = 0.042) were the most statistically significant. The results were further confirmed by Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATAC-seq), Chromatin Immunoprecipitation sequencing (ChIP-seq), single-cell RNA sequencing (scRNA-seq), and multiple online databases validation. Conclusions : This study revealed that the EBF1-RBP7 regulatory relationship had potential importance in the bone metastasis in BLCA through Th2 cells and the oocyte meiosis pathway.


Introduction
Bladder cancer has the highest incidence among urologic malignancies in China and is the sixth most common cancer globally [1,2]. One third of patients might have aggressive progression later, and 70% of patients who undergo surgical intervention will still experience metastasis [3]. The pathological type of most bladder cancer patients is urothelial carcinoma, and patients with the same pathological features usually have a considerable difference in prognostic outcome [4]. Its morbidity has risen from 2000 to 2011, and patients with metastasis were reported to have a higher mortality rate [1]. After radical cystectomy, nearly 50% of patients will still develop varying degrees of progression. Around 5-10% of patients have already metastasized at the time of diagnosis [5]. Bone metastasis occurs frequently in bladder urothelial carcinoma (BLCA), which has a relative incidence of 40% and only 6-9 months of median survival from diagnosis [6,7]. Thus, there is still an unmet clinical need for further research to reveal the potential mechanism of bone metastasis in BLCA.
High throughput sequencing data analysis has been widely used for the identification of significant molecular and cellular biomarkers in the metastasis and prognosis of BLCA [8][9][10][11][12]. However, little previous research has focused on immune genes and relative networks in the bone metastasis of BLCA.
In this study, we identified differentially expressed immune genes (DEIGs) and TFs in patients with BLCA. To obtain valuable prognostic biomarkers, univariate Cox regression was applied. Based on the DEIGs, we constructed a proportional hazards model and validated it to have good prognosis-predictive performance. We employed CIBER-SORT and Single Sample Gene Set Enrichment Analysis (ssGSEA) method to quantify tumor-infiltrating immune cells/pathways, which were then co-analyzed with the DEIGs. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was carried out employing the in silico tools Gene Set Variation Analysis (GSVA) and Gene Set Enrichment Analysis (GSEA). Subsequently, KEGG signaling pathways with statistically significant correlation relationships were selected. Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATACseq), Chromatin Immunoprecipitation sequencing (ChIPseq), single-cell RNA sequencing (scRNA-seq), and multiple online databases were utilized for external validation. Taken together, we proposed a scientific hypothesis for bone metastasis in BLCA.

Differential Gene Expression and Univariate Cox Regression Analyses
Differentially expressed genes (DEGs) were identified between bone-metastatic and non-metastatic samples with the cutoff value of false discovery rate (FDR) <0.05 and |log2(fold change)| >1 through the edgeR package. And DEIGs were subsequently recognized by matching DEGs with the 2498 immune-related genes downloaded from the ImmPort database. Gene expression values were pre-processed by z-score before fold change and p value were calculated. In addition, Gene Oncology (GO) and KEGG enrichment analysis was conducted. Subsequently, univariate Cox proportional hazards model was constructed to identify DEIGs significantly correlated with overall survival. Disease-free survival analysis of these identified DEIGs was also obtained from the Gene Expression Profiling Interactive Analysis (GEPIA) database (http://gepia. cancer-pku.cn/) [16].

Construction of a Proportional Hazards Model
Using univariate and multivariate Cox regression analyses, DEIGs with statistical significance were included in the proportional hazards model. Risk score for prognosis was calculated according to the following formula: In the formula, "i" stands for the number of the patient, "n" stands for the number of significant DEIGs included in the model, and "βa" stands for the regression coefficient of the DEIGs. The median cut point of the risk scores was used as the cutoff value for high and low risk grouping. Model predictions were evaluated by plotting the Kaplan-Meier curve. Predictive performance was quantified using area under the curve (AUC). In order to further estimate the independent prognostic value of risk score, the univariate and mulitivariate Cox regression analyses were utilized. Moreover, 196 BLCA patients cohort from the Therapeutically Applicable Research To Generate Effective Treatments (TARGET) database (https://ocg.cancer.g ov/programs/target) were extracted for further validation. Differential enrichment of tumor infiltrating immune cells and immune pathways in BLCA was evaluated using the CIBERSORT and ssGSEA methods.

Differential TFs Expression and Correlation Analyses
Differentially expressed TFs were identified by the edgeR package. FDR was set to be less than 0.05 while |log2(fold change)| greater than 1. Then, correlation analysis was performed to co-analyze DEIGs and TFs, illustrating their regulatory relationships with correlation coefficient >0.300 and p < 0.001, in which case if the variables conformed to normal distribution and homogeneity of variance, Pearson correlation analysis was conducted, or else Spearman correlation analysis was used.

Immune Cells/Pathways and KEGG Pathways Analyses
Differential enrichment of tumor infiltrating immune cells/pathways in BLCA was evaluated using the CIBER- SORT and ssGSEA tools. The correlation between key DEIGs and immune cells/pathways was evaluated by Pearson/Spearman correlation coefficient. In addition, GSVA was applied to score KEGG pathways in each sample. To explore the potential KEGG pathways correlated with prognosis and metastasis in BLCA patients, Pearson/Spearman correlation analysis and univariate Cox regression were conducted. Moreover, the Venn diagram was plotted using the VennDiagram package [17].

Construction of the Regulatory Network
Focusing on the immune genes, TFs and downstream immune cells/pathways and KEGG pathways were identified which were involved in the bone metastasis in BLCA. Therefore, we constructed a regulatory network to further elucidate the mechanism of bone metastasis in BLCA by exploiting the Cytoscape package [18].

ATAC-seq, ChIP-seq and scRNA-seq Validation
To further validate the transcriptional regulation pattern between EBF1 and RBP7, data related to EBF1 acquired from chromatin immunoprecipitation sequencing (ChIP-seq) of in vitro human cell lines were downloaded from Gene Expression Omnibus (GEO) (GEO assession: samples GSM1958038 and GSM1958039 from GSE75503, GSM935375 from GSE31477, and GSM803386 from GSE32465) [19,20]. ENCODE Transcription Factor Targets [21] and JASPAR Predicted Transcription Factor Targets [22] algorithms were then applied to re-predict the transcriptional regulation pattern between EBF1 and RBP7. Additionally, ATAC-seq data of BLCA samples were obtained from the TGCA project of chromatin accessibility landscape of primary human cancers (https://gdc.cancer.g ov/about-data/publications/ATACseq-AWG), which were then utilized to explore the chromatin accessibility in specific locations of EBF1 and RBP7. What is more, the scRNA-seq profiles of BLCA samples from GEO were obtained to successively go through quality control, normalization, cell cycle assignment, feature selection, principal component analysis (PCA), Uniform Manifold Approximation and Projection (UMAP)/t-distributed Stochastic Neighbor Embedding (t-SNE) for dimensionality reduction and visualization, unsupervised clustering, differential expression analysis, and cell communication analysis.

Statistical Analysis
Survival profiles were analyzed by Kaplan-Meier survival analysis. Additionally, correlation analysis were performed using Pearson/Spearman correlation analysis. In addition, Biomarkers with prognostic value were distinguished by Cox univariate and multivariate analyses. All bioinformatic and statistical analyses were implemented with R software (Version 3.6.1; Institute for Statistics and Mathematics, Vienna, Austria; https://www.r-project.org). A significance level of a two-sided p value less than 0.05 was adopted.

Identification of DEIGs
The analysis procedure of this study was demonstrated in Fig. 1. Clinical information (Table 1) and RNA-seq data of 209 BLCA samples were accessed through TCGA database. Between bone-metastatic and non-metastatic samples, 1090 DEGs were identified ( Fig. 2A,B). Additionally, GO and KEGG enrichment analyses, visualized as bubble plots, were performed to uncover that these DEGs actively participated in pathways such as axonogenesis, metal ion transmembrane transporter activity, and neuroactive ligand-receptor interaction which might play a role in the bone metastasis of BLCA patients ( Supplementary  Fig. 1A,B). In addition, using differential analysis, 86 of 1090 DEGs were found in the list of 2498 immune genes, which were defined as DEIGs and displayed in a heatmap plot (Fig. 2C) and a volcano plot (Fig. 2D). Moreover, in univariate Cox regression analysis, 31 DEIGs were exhibited to have a significant association with overall survival, in which 29 risk factors and 2 protective factors were identified in a forest plot (Fig. 2E). What is more, the differences of disease free survival between low and high expression of the 31 DEIGs were displayed in survival plots (Supplementary Fig. 2).

Construction of a Multivariate Prognostic Model
Based on the 31 prognosis-related DEIGs, we constructed a multivariate prognostic prediction model. The receiver operating characteristic (ROC) curve of the model was exhibited in Fig. 3A, in which the AUC was 0.793, indicating a good predictive value. Sub-group survival analysis was conducted according to the median risk score, displaying a significant difference (p = 1.04 × 10 −7 ) between the low risk and high risk groups (Fig. 3B). The clinical fea-tures of risk score were represented by a risk curve (Fig. 3C) and a scatter plot (Fig. 3D). The expression levels of the 31 DEIGs between low-risk and high-risk groups were demonstrated by a heatmap (Fig. 3E). Together with four clinical features (age, gender, M and stage), the risk score in our model proved to be an independent prognostic predictor by univariate (HR, 1.011; 95% CI, 1.005-1.017; p < 0.001) and multivariate (HR, 1.009; 95% CI, 1.004-1.015; p < 0.001) Cox regression analyses (Fig. 4A,B). Moreover, significant differences of censor (p < 0.001), grade (p < 0.05), T (p < 0.05), N (p < 0.001), and stage (p < 0.001) between low risk and high risk groups in a 196 BLCA patients cohort from the TARGET database (Fig. 4C). Additionally, the distribution of different immune cell components in low risk and high risk groups was demonstrated in Supplementary Fig. 3A. What is more, the comparison of different immune cell fractions (Supplementary Fig. 3B) and ssGSEA scores (Supplementary Fig. 3C) between low risk and high risk group were displayed, which revealed that the risk score was correlated with expression of CD8 T cells (p < 0.001), T cells follicular helper (p < 0.05), and macrophages M0 (p < 0.01), along with immune functions of NK cells (p < 0.001) and Th2 cells (p < 0.001).

Regulatory Relationship between TFs and DEIGs
8 differentially expressed TFs (|log2FC| >1 and FDR < 0.05) were illustrated by a heatmap (Fig. 5A) and a volcano plot (Fig. 5B). Additionally, a venn diagram was utilized to identify 5 DEIGs which were KLRK1, IL9R, CNTFR, RBP7, and TNFSF4 related to metastasis, stage, and TF co-expression simultaneously (Fig. 5C). Aiming to identify the regulatory relationships between the identified TFs and key DEIGs, Pearson correlation analysis was utilized and 30 regulatory relationships were identified with |correlation coefficient| >0.300 and p value < 0.001. Eventually, EBF1 positively regulating RBP7 was selected as the most potential regulatory relationship (R = 0.677, p value < 0.001). In addition, the significant differences of RBP7 expression between metastatic and non-metastatic BLCA samples (p = 0.018), as well as among stage i, ii, iii, and iv of BLCA samples (p = 0.018) were demonstrated (Fig. 5D,E).

Correlation Analysis of DEIGs, Immune Cells/Pathways and KEGG Pathways
For the purpose of exploring the potential signaling pathways related to bone metastasis and BLCA prognosis, CIBERSORT, ssGSEA, and GSVA algorithms were utilized. The correlation between expression of immune cells/pathways and RBP7 expression was analyzed by Pearson's correlation (Supplementary Fig. 4A). Moreover, 5 immune cells/pathways were identified to be statistically significant (Supplementary Fig. 4B   = -0.179, p value = 0.009) and MHC class I (R = -0.148, p value = 0.032). What is more, 9 prognosis-related KEGG pathways were characterized to be significantly correlated with RBP7 expression (Supplementary Fig. 5A,B).

Identification of Potential KEGG Pathways
Using the GSEA, 23 metastasis-related KEGG pathways were identified. Thus, as illustrated in the Venn plot (Fig. 6A), the oocyte meiosis pathway was identified to be the key KEGG pathway both significant in GSEA and GSVA co-expression (Fig. 6B). What is more, the correlation relationship between the oocyte meiosis pathway and RBP7 (R = 0.14, p = 0.042) was shown in Fig. 6C. Besides, GSEA result of the oocyte meiosis was demonstrated by the enrichment plot in Fig. 6D.

Construction of the Regulation Network
The regulatory network based on TFs, RBP7, immune genes/pathways and KEGG pathways was constructed in Fig. 7A. Taken together, we proposed that the EBF1-RBP7 axis (representing TF-immune gene relationship) played an important role in bone metastasis of BLCA through Th2 cells and the oocyte meiosis pathway.

ATAC-seq Validation
ATAC-seq analysis was performed to explore the correlation between EBF1 and RBP7. Open chromatin loci on different chromosomes were displayed in Supplementary  Fig. 6A. Additionally, the venn pie and upsetplot showed the distribution and intersection of different pick types such as genic, intergenic, intron, and exon in Supplementary  Fig. 6B. Moreover, the distribution of binding loci rela- tive to TSS was visualized in Supplementary Fig. 6C. Additionally, the correlation between EBF1 and RBP7 was analyzed, which turned out that the expression of EBF1 was positively correlated with RBP7 (R = 0.71, p < 0.001) (Fig. 7B). What is more, multiple binding peaks in BLCA samples at promoters of EBF1 and RBP7, as well as vari-ous regulatory elements binding areas in the introns and in introns of neighboring genes were identified, which indicated that these regions might function as potential regulatory elements on upstream of EBF1 and RBP7 sequences (Fig. 7C,D).

ChIP-seq Validation
To further illustrate the co-expression patterns between EBF1 and RBP7, ENCODE and JASPAR algorithms were utilized. In EBF1 ChIP-seq data of activated GM12878, LCL and Mutul cell line (homo sapiens), binding peaks were all found in RBP7 sequences (Fig. 8A), and 2 motifs of EBF1 were identified (Fig. 8B,C). Consequently, the transcriptional regulation pattern between EBF1 and RBP7 could be determined.

scRNA-seq Validation
Furthermore, in the scRNA-seq analysis of BLCA tumor cell lines, 12 seurat clusters and bulk labels were shown in t-SNE dimension reduction analysis (Fig. 9A). Additionally, the top 5 marker genes of each cluster was demonstrated in a heatmap (Fig. 9B). In addition, up-and downregulated marker genes of each cluster were revealed in a point plot (Fig. 9C). More importantly, the distribution of 5 marker genes expression including EBF1, RBP7, CD24, MKI67, and CD44 was illustrated in Fig. 9D, which indicated that EBF1 and PBP7 had a co-expression relationship in clusters 0, 1, 2, 3 and 4, and they might also be coexpressed with tumor stemness and proliferation markers CD24, MKI67, and CD44. What'r more, the distribution of different stages in a cell cycle and relevant scores were visualized in Fig. 9E, which suggested that BLCA cells with high EBF1 and RBP7 expression had a relative high fraction of stages G2M and S cells, including clusters 2, 3 and 4. Last but not least, the cell communication among different clusters presented that clusters 1, 2 and 4 had the strongest interaction with other clusters (Fig. 9F). Moreover, in the scRNA-seq analysis of BLCA CD4 and CD8 T cell lines (Supplementary Fig. 7), a co-expression relationship between EBF1 and RBP7 was also identified.

External Validation with Multiple Online Databases
In order to access the external validation of our scientific hypothesis, multiple online databases were employed.
Three biomarkers of Th2 cells (IL1RL1, GATA3, and IL33) and the top five genes related to the oocyte meiosis pathway (ADCY3, ADCY5, ADCY9, MAPK3, and IGF1) were presented using the CellMarker and PathCards databases, respectively. All the external validation results are summarized in Table 2.

Discussion
BLCA is a common cancer globally, requiring further investigation since patients with metastatic lesions still lack good prognosis and effective treatments. Around 25% of new patients are diagnosed with either muscle invasiveness or metastasis [32]. Locally advanced and metastatic cases usually respond poorly to current treatments and have poorer survival outcomes [33,34]. High-throughput technologies have been widely used to explore the molecular mechanisms of BLCA. Several biomarkers with different clinical utilities have been identified, and some of them are already capable of capturing prominent tumors and their molecular biology characteristics [35]. Immune genes have been identified as prognostic biomarkers in BLCA patients, which provide new opportunities to explore the bonemetastatic mechanisms in BLCA [36].
In this study, we identified 31 DEIGs with prognostic value in bone metastatic BLCA samples. Based on the 31 prognosis-related DEIGs, we developed and validated a proportional hazards model with a high effectiveness in Kaplan-Meier survival analysis (p < 0.01) and ROC curve (AUC = 0.793). The Pearson/Spearman correlation analysis was applied between DEIGs and TFs, in which EBF1 (TF) positively regulated RBP7 (DEIG) represented the most significantly correlated relationship.
As a member of early B-cell factors (EBFs) family, EBF1 was reported to play a pivotal role in the development and differentiation of B cell [37]. Genetic alteration in EBF1 was found in many B-lineage acute leukemia patients [38]. Recent research have revealed that several clinical features of acute lymphoblastic leukemia, such as pathogenesis, are related with the mutation of EBF1 [39]. Additionally, genomic loss of EBF1 has been reported in solid tumors such as breast cancer and pancreatic cancer [39,40]. In this study, we found that EBF1 was low-expressed in BLCA patients, and the difference in expression levels be- tween normal and tumor tissue was statistically significant. Besides, high-expressed EBF1 in BLCA predicts a worse survival outlook. It is reasonable to believe that the dysfunction of EBF1 can clarify the mechanism of tumorigenesis and bone metastasis in BLCA to some extent. It has been documented that gene-based signatures related to immunology have significant prognostic value in patients diagnosed with papillary thyroid cancer and nonsquamous non-small cell lung cancer [41,42]. Retinol binding protein 7 (RBP7), is a member of the cellular retinol-binding protein (CRBP) family, whose members are required for vitamin A stability and metabolism [43,44]. More importantly, RBP7 has been reported by several recent studies to be a part of immune-related gene signatures to indicate clinical significance and predict prognosis of BLCA patients [36,45,46]. What is more, it was revealed that RBP7 was a significant biomarker whose high expression indicated poor prognosis of colon cancer patients, as well as being related to tumor invasion and epithelial mesenchymal transition [47]. Additionally, RBP7 was also discovered to be an immune-related gene signature in gastric cancer [48]. In this study, we used bioinformatics analysis and identified that RBP7 was statistically correlated with stage, bone metastasis, and overall survival in BLCA patients, which may provide us with a better understanding of bone metastasis in BLCA. Although no previous study has reported the relationship between EBF1 and RBP7, Pearson correlation analysis demonstrated that they are significantly correlated with each other (R = 0.677, p value < 0.001). What is more, the transcriptional regulation pattern between EBF1 and RBP7 was further confirmed by ATAC-seq and ChIP-esq analysis. Last but not least, utilizing scRNA-seq analysis in BLCA tumor cell lines and T cell lines, EBF1 and RBP7 was revealed to be co-expressed and related to tumor stemness and proliferation.
An increase in bladder cancer metastasis has been confirmed both in humans and mice by previous study [36]. Additionally, Th2 cells can also function as a promote factor in tumor growth [49]. In addition, aberrant high levels of cytokines were found within the immune microenvironment mediated by Th2 cells in several cancer types [50]. Moreover, one of the cytokines, IL-10, was correlated with the low efficacy of immunotherapy of Bacillus Calmette Guérin (BCG) in bladder cancer [51]. Employing CIBER-SORT, ssGSEA and Pearson/Spearman correlation analysis, Th2 cells were identified to be the immune cells which were actively involved in bone metastasis in BLCA. What is more, the biomarkers of Th2 cells, including IL1RL1, GATA3, and IL33 were also tested for the prognostic values by multiple online databases.
For further exploration of the mechanism of bone metastasis in BLCA, we performed GSVA and GSEA to find metastasis-related KEGG pathways, in which the oocyte meiosis pathway was identified. The oocyte meiosis pathway was also significantly correlated with RBP7 expression using Pearson/Spearman correlation analysis. Previous studies have demonstrated that the oocyte meiosis pathway might be related to BLCA development and bone metastasis [52,53]. Additionally, multiple online databases validated that biomarkers of the oocyte meiosis pathway, including ADCY3, ADCY5, ADCY9, MAPK3 and IGF1, were significantly correlated with bone metastasis and prognosis in BLCA patients. However, only in silico analysis was performed in this study, further validation of our hypothesis demands supplementary experiments. Unfortunately, for lack of corresponding clinical samples and with limited experimental conditions, we couldn't perform supplementary experiments to further validate our hypothesis immediately. Nevertheless, multi-omics data analysis was utilized to make up for it. More importantly, we are now working with several clinical departments to collect corresponding clinical samples, and it is believed that relevant experiments for the validation of our findings will soon be performed in our follow-up research.

Conclusions
Taken together, we applied an integrated bioinformatics analysis to identify DEIGs and related pathways involved in bone metastases of BLCA and improved our understanding of the molecular mechanisms. We proposed that the EBF1-RBP7 relationship has potential importance in bone metastatic BLCA through Th2 cells and the oocyte meiosis pathway.

Funding
This study was supported in part by the Shanghai Rising-Star Program (Sailing Special Program) (No. 23YF1458400); the Shanghai Municipal Health Commission (No.201940306); Henan medical science and technology research project (No. 201602031); Key project of provincial and ministerial co-construction of Henan Medical Science and Technology (No. SBGJ202002031). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.