Expression Pattern and Prognostic Analysis of Branched-Chain Amino Acid Catabolism-Related Genes in Non-Small Cell Lung Cancer

Background : The purpose of our study is to analyze the expression pattern and prognostic value of catabolism-related enzymes of branched-chain amino acids (BCAAs) in non-small cell lung cancer (NSCLC). Methods : Differential expression analysis, mutation, copy number variation (CNV), methylation analysis, and survival analysis of BCAAs catabolism-related enzymes in NSCLC were performed using the Cancer Genome Atlas (TCGA) database. Results : Six and seven differentially expressed genes were obtained in lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), respectively. IL4I1 was located at the core regulatory nodes in the gene co-expression networks of both LUAD and LUSC. The AOX1 mutation rate was the highest in both LUAD and LUSC. For CNV, IL4I1 was up-regulated in both LUAD and LUSC with an increase in copy number, whereas AOX1 and ALDH2 were differentially regulated in the two subtypes of lung cancer. In patients with NSCLC, high expression of IL4I1 was associated with lower overall survival (OS), and low expression of ALDH2 predicted shorter disease-free survival (DFS). ALDH2 expression was related with LUSC survival. Conclusions : This study explored the biomarkers of BCAAs catabolism related to the prognosis of NSCLC, which provided a theoretical foundation to guide the clinical diagnosis and treatment of NSCLC.


Introduction
Lung cancer is one of the malignant tumors with the highest morbidity and mortality worldwide and has become a major public health concern [1,2]. Despite considerable progress in therapeutic strategies, the 5-year survival rate of lung cancer in China has remained between 10% and 20% over the past decade [3]. Therefore, the key to improving the survival rate for lung cancer is not only improving the treatment but also improving the level of screening and using more abundant analytical methods to find biomarkers that are closely related to the development and prognosis of lung cancer.
Branched-chain amino acids (BCAAs) include leucine, isoleucine, and valine. Plasma levels of BCAAs and their metabolic enzymes are expressed to varying degrees in multiple cancers and have a very close relationship with tumor occurrence and development. They are considered important markers for early tumor screening and prognosis, and provide a very meaningful research prospect for the development of novel therapeutic drugs in the direction of targeted treatment of amino acid metabolism enzymes [4][5][6][7][8][9][10]. However, there have been no systematic studies on the expression pattern of the BCAAs catabolic enzyme in non-small cell lung cancer (NSCLC) and its correlation with prognosis. Therefore, it is crucial to screen for key BCAAs catabolic enzymes to identify new biomarkers for the prognosis of NSCLC.
In this study, sets of catabolic enzyme genes related to BCAAs were established using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Transcriptome and clinical data of lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) were obtained from the Cancer Genome Atlas (TCGA) database. Based on the multidimensional bioinformatic analysis, the expression pattern of the BCAAs catabolic enzyme in NSCLC and its correlation with prognosis were explored to identify novel biomarkers for the prognosis of NSCLC and to provide references for the future active exploration and development of new therapeutic targets for NSCLC.

Data Collection
The KEGG database (https://www.kegg.jp/kegg/pat hway.html) was used to search the catabolism pathways of human BCAAs. The valine, leucine and isoleucine degradation pathway is diaplayed in Supplementary Fig. 1. In total, 44 gene sets of related metabolic enzymes were identified as the main study objects (Supplementary Fig. 2). The transcriptome profiles and corresponding clinical information of LUAD, LUSC, and adjacent normal tissues were downloaded from the TCGA (http://tcga-data.nci.nih.gov/) dataset using the RTCGAToolbox 2.28.0 package in R 4.2 software.

Expression Pattern of BCAAs Catabolism-Related Enzymes in NSCLC
The data were transformed from fragments per kilobase of exon per million reads mapped (FPKM) value, and log 2 of its FPKM value was used as the measure of the gene expression level. Differentially expressed genes in NSCLC tissues compared to those in adjacent normal tissues were calculated using the edgeR package. The Benjamini and Hochberg multiple testing methods were applied to determine the false discovery rate (FDR). FDR <0.05 and |log 2 fold change| >1 were selected as the cutoff criteria. The heatmap.2 function of the R gplots 3.1.3 package was utilized to generate a hierarchical cluster analysis of differentially expressed genes. The Pearson correlation coefficient was used to obtain gene co-expression network pairs in LUAD and LUSC versus normal controls. The Cytoscape software was applied to visualize the gene coexpression network.

Mutation, Copy Number Variation (CNV) and Methylation Analysis of BCAAs Catabolism-Related Enzymes
The mutation data were processed and visualized using the maftools 2.14.0 R package (https://github.com/Poi sonAlien/maftools). For CNV, the loss and gain of copy numbers have been identified using the Genomic Identification of Significant Targets in Cancer (GISTIC) algorithm. The 5-valued spectrum (-2, -1, 0, 1, 2) was used to indicate changes in CNV. -2, -1, 0, 1, and 2 represent homozygous deletion of copy number, heterozygous deletion of copy number, no variation in copy number, amplification of low-dose copy numbers and amplification of high-dose copy numbers, respectively. Due to the noise of low-dose amplification or deletion, we mainly referred to copy number changes of 2 and -2 in the analysis, taking 2 as amplification and -2 as deletion, while others considered copy number unchanged. For methylation analysis, ChAMP 3.8 in the R package was used to filter the data, fill in the missing values, and calculate the differential methylation probe and the differential methylation region. Differential methylation sites between lung cancer and normal lung tissues were obtained using limma 3.38.2 in the R package. The Benjamini and Hochberg multiple testing methods were applied to acquire the FDR. The FDR of <0.05 was considered statistically significant. Hierarchical cluster analysis and waterfall plots of differential methylation sites were visualized through gplots in the R package.

Survival Analysis of BCAA Catabolism-Related Enzymes
SPSS 22.0 statistical software (IBM Corp., Armonk, NY, USA) was used for the survival analysis. The Kaplan-Meier curve and the logarithmic rank test were used to evaluate high and low gene expression in overall survival (OS) and disease-free survival (DFS). The Cox proportional haz-ard regression model was used to perform univariate and multivariate analyse of independent risk factors related to postoperative OS in lung cancer patients to calculate the risk ratio (HR) and 95% confidence interval (CI). Statistical results with FDR <0.05 were considered significant.

Expression Pattern of BCAAs Catabolism-Related Enzymes in NSCLC
Gene expression profiles and corresponding clinical data for NSCLC were obtained from the TCGA database. In this study, 505 LUAD tissues and 59 normal adjacent samples from patients with LUAD, and 501 LUSC tissues and 51 normal adjacent samples from patients with LUSC were included. We analyzed the expression of 44 BCAAs catabolic enzymes between NSCLC and normal lung tissues. Hierarchical clustering analysis of 44 BCAAs catabolism-related enzymes in LUAD and LUSC is shown in Fig. 1A,B, respectively. As shown in Fig. 2A,B, compared to normal lung tissues, there were six differentially expressed genes (ALDH1B1, ACAD8, IL4I1, OXCT2, ALDH2, and AOX1) and seven differentially expressed genes (OXCT1, EHHADH, IL4I1, ALDH2, ACAA2, AOX1 and HMGCS2) in LUAD and LUSC, respectively, with statistical significance (FDR <0.05 and |log2fold change| >1). ALDH1B1, ACAD8, IL4I1, and OXCT2 were upregulated, while ALDH2 and AOX1 were down-regulated in LUAD. OXCT1, EHHADH, and IL4I1 were up-regulated, whereas ALDH2, ACAA2, AOX1, and HMGCS2 were down-regulated in LUSC. Among these, IL4I1, ALDH2, and AOX1 were differentially expressed in both LUAD and LUSC (Fig. 2C). We also built the gene co-expression network for the differentially expressed genes. Three genes (IL4I1, ACAD8, and ALDH2) and four genes (IL4I1, HMGSC2, ACAA2, and OXCT1) were in the core regulatory nodes of the gene co-expression network in LUAD and LUSC, respectively (Fig. 3).

Mutation, CNV, and Methylation Analysis of BCAAs Catabolism-Related Enzymes
The mutation information for each gene in each sample is displayed in a waterfall plot, where various colors with annotations at the bottom represent the different types of mutations (Fig. 4). Somatic mutations were found in 66 (28.7%) of 230 LUAD samples, and somatic mutations occurred in 36 (81.8%) of 44 enzymes related to the catabolism of BCAAs in 66 patients with somatic mutations (Fig. 4A). Among these, the AOX1 mutation rate was the highest (3%, 7/230). As showed in Fig. 4B, somatic mutations were found in 59 (33.15%) of 178 LUSC samples, and somatic mutations occurred in 37 (84.1%) of the 44 BCAAs catabolism-related enzymes in 59 patients with somatic mutations. Among these, the AOX1 mutation rate was the highest (6.7%, 12/178).   We then evaluated the CNV patterns of the differentially expressed genes in LUAD and LUSC. We found that OXCT2, AOX1, ACAD8, ALDH2, and IL4I1 were mainly copy number amplifications in LUAD, whereas ALDH1B1 was mainly a copy number deletion (Table 1). In LUSC, EHHADH, OXCT1, AOX1, IL4I1, and HMGCS2 showed copy number amplification, whereas ACAA2 and ALDH2 displayed copy number deletions ( Table 2). According to the correlation analysis of CNV and gene expression levels, IL4I1, ACAD8, and OXCT2 were upregulated in LUAD with an increase in copy number, whereas ALDH2, ALDH1B1, and AOX1 were oppositely regulated (Fig. 5A). IL4I1, OXCT1, and EHHADH were upregulated in LUSC with increasing copy number; ACAA2 and ALDH2 were down-regulated with deletion of copy number; and HMGCS2 and AOX1 were down-regulated with increased copy number (Fig. 5B).

Survival Analysis of BCAAs Catabolism-Related Enzymes
To determine the prognostic value of enzymes related to BCAAs catabolism, we evaluated the effects of differentially expressed genes on the OS and DFS of patients with LUAD and LUSC. In LUAD patients, the expression of five of the six differentially expressed genes, including ALDH1B1 (p = 0.958), ALDH2 (p = 0.077), OXCT2 (p = 0.617), IL4I1 (p = 0.492), and AOX1 (p = 0.288), was not significantly correlated with DFS. The DFS of patients with LUAD in the high expression group of ACAD8 was significantly longer than that of patients with low ACAD8 expression (p < 0.001, Fig. 8A). Furthermore, as shown in Fig. 8B-E, high expression of ALDH1B1 (p = 0.029) and low expression of ACAD8 (p < 0.001), ALDH2 (p = 0.011) and OXCT2 (p = 0.017) were associated with poor OS in patients with LUAD. However, IL4I1 (p = 0.149) and AOX1 (p = 0.378) did not correlate with OS in patients with LUAD. In patients with LUSC, the expression of ALDH2 (p = 0.837), EHHADH (p = 0.359), AOX1 (p = 0.059), ACAA2 (p = 0.183) and OXCT1 (p = 0.779) was not significantly associated with DFS, whereas high expression of L4I1 (p = 0.012) and HMGCS2 (p = 0.010) was associated with poor DFS (Fig. 9A-B). Furthermore, as shown in Fig. 9C-E, in addition to the expression of ACAA2 (p = 0.805), OXCT1 Log 2 FC indicates the differential expression multiple after transformation. Cancer AVG represents the methylation value of the probe in cancer samples; Normal AVG indicates the methylation value of the probe in normal samples; Delta beta represents the absolute differential methylation value. * represents the maximum Beta value of absolute value of multiple probes corresponding to the same gene and the corresponding probe and Log 2 FC value. Log 2 FC indicates the differential expression multiple after transformation. Cancer AVG represents the methylation value of the probe in cancer samples; Normal AVG indicates the methylation value of the probe in normal samples; Delta beta represents the absolute differential methylation value. * represents the maximum Beta value of absolute value of multiple probes corresponding to the same gene and the corresponding probe and Log 2 FC value. In addition, we integrated the survival data of LUAD and LUSC and analyzed the effect of the expression of the catabolic enzyme gene of BCAAs, which were differentially expressed jointly in both LUAD and LUSC, on the OS and DFS of patients with NSCLC. The DFS of patients with  NSCLC and low ALDH2 expression was relatively poor (p < 0.001, Fig. 10A). However, IL4I1 (p = 0.059) and AOX1 (p = 0.322) were not significantly associated with DFS in NSCLC patients. ALDH2 (p = 0.077) and AOX1 (p = 0.228) were not significantly correlated with OS, whereas the high expression group of IL4I1 had a worse prognosis than the low expression group (Fig. 10B).
Finally, Cox regression analysis was performed for genes with significant effects on the OS of LUAD and LUSC. Univariate Cox regression analysis showed that ACAD8 and OXCT2 expression, lymph node metastasis, and Tumor Node Metastasis (TNM) stage were predictors of poor prognosis in patients with LUAD (Table 5). In multivariate Cox regression analysis, ACAD8 expression, lymph node metastasis, and TNM stage were independent predictors of prognosis in patients with LUAD ( Table 5). The expression of ALDH2 and the stage were related to LUSC survival in both the univariate and multivariate Cox regression analyse (Table 6).

Discussion
The occurrence and development of tumors are complex processes. Amino acid catabolic enzymes are overexpressed in a variety of cancers, providing not only cellular energy and metabolites for the anabolic process but also serving as a mechanism for cancer cells to escape immunity [6,11]. Multiple studies have indicated that branchedchain aminotransferase (BCAT), an enzyme that catalyzes the first step of BCAA catabolism, is over-expressed in many malignant tumors [7,10,12,13]. BCAT is highly expressed in lung cancer and promotes the proliferation of lung cancer cells [10]. Therefore, it is necessary to systematically study the expression patterns of BCAAs catabolic enzymes in NSCLC and their correlation with disease prognosis.
Through multidimensional bioinformatic analysis, we found that the expression of BCAAs' metabolic enzymes IL4I1, ALDH2, and AOX1 was specific to NSCLC and correlated with prognosis. IL4I1 is a secreted L-amino acid oxidase that is induced by interleukin 4. IL4I1 is highly expressed in lymphomas and associated with the prognosis of lymphomas [14]. IL4I1 is a novel immunomodulatory enzyme produced by mature dendritic cells, that inhibits the proliferation of effector T lymphocytes and promotes the development of regulatory T cells [15,16]. Local secretion of IL4I1 in the immune synaptic cleft and its binding to CD3 + lymphocytes may be important for the immunosuppressive mechanism of IL4I1 [15]. In the current study, IL4I1 was highly expressed in both LUAD and LUSC, with an increase in copy number. In the co-expression analysis network, IL4I1 was in a key regulatory position in both LUAD and LUSC, indicating that IL4I1 may play a key role in the development of lung cancer. In survival analysis, high expression of L4I1 in LUSC was associated with poor DFS. We integrated survival data from LUAD and LUSC and found that IL4I1 expression had a significant effect on OS, and its high expression indicated a lower OS. Our study revealed that IL4I1 is closely related to the occurrence, development, and prognosis of NSCLC and is expected to become an important biomarker in the field of NSCLC immunotherapy and an effective predictor of the prognosis of NSCLC.
AOX1 is a protein in the molybdoflavin family and an important enzyme involved in purine catabolism. More and more studies show that AOX1 is involved in the pathophysiology of many clinical diseases [17,18]. AOX1 promotes liver cell damage and fibrosis by increasing reactive oxygen species, which in turn may affect the metabolism and activity of drugs in the liver [19]. AOX1 expression is reduced in hepatocellular carcinoma and correlates with a higher tumor stage, distant metastasis, or lymph node positive status [20]. The beneficial role of Nrf2 in cancer prevention is essentially dependent on strict control of its activity, and relaxation of Nrf2 is a key determinant of tumorigenesis and is found in many types of cancer [21]. Previous studies have shown that AOX1 plays a critical role in the occurrence and development of tumors by regulating the Nrf2 pathway [22]. In the present study, AOX1 was poorly expressed in both LUAD and LUSC, indicating that AOX1 may play an inhibitory role in the development and progression of NSCLC. DNA mutations of AOX1 were highest in both LUAD and LUSC, especially LUSC, which has not been reported in previous studies. AOX1 was mainly amplified by the copy numbers of LUAD and LUSC, but the expression of AOX1 was negatively regulated with an increasing copy number. AOX1 may play an important role in the occurrence and progression of NSCLC, but the underlying mechanisms need to be determined in further clinical and basic research experiments. ALDH2 catalyzes the transformation of toxic methylmalonate semialdehyde into non-toxic methylmalonate via the valine catabolic pathway. ALDH2 is mainly involved in liver metabolism and has been reported to play a role in liver diseases, especially alcoholic liver disease [23][24][25]. Among the different subtypes of acetaldehyde dehydrogenase, only ALDH2 has better basic functions than other subtypes in the detoxification of acetaldehyde dehydrogenase [26]. Acetaldehyde, a substrate catalyzed by ALDH2 in the metabolic process, is closely related to a variety of tumors, and low expression of ALDH2 in lung and liver cancer is associated with a poor prognosis [26,27]. Furthermore, aldehyde dehydrogenase is differentially expressed in lung cancer, and ALDH2 was poorly expressed in lung cancer, while ALDH1A1 and ALDH3A1 were highly expressed in NSCLC [28]. Increasing evidence indicates that lung cancer may originate from tumor stem cells and that aldehyde dehydrogenase is a functional marker of lung cancer stem cells [29]. Several studies have reported that ALDH2 is also a functional marker of lung cancer stem cells [29][30][31]. In the present study, ALDH2 expression was down-regulated in both LUAD and LUSC. Low ALDH2 expression was associated with poor OS in LUAD patients, which is consistent with previous reports [26]. However, high expression of ALDH2 is associated with a poor prognosis for LUSC. ALDH2 expression was related to LUSC survival in both the univariate and multivariate Cox regression analyse. There were some differences in the level of ALDH2 expression in the survival of patients with different pathological subtypes of NSCLC, and there may be many unknown mechanisms and complex interference factors, which need to be confirmed in further studies.

Conclusions
Our study revealed the expression pattern and prognosis of differentially expressed BCAAs catabolism-related enzymes in NSCLC at multiple levels based on the TCGA database. First, we analyzed the expression of 44 BCAAs catabolic enzymes in NSCLC and normal lung tissues. A total of six differentially expressed genes (ALDH1B1, ACAD8, IL4I1, OXCT2, ALDH2 and AOX1) and seven differentially expressed genes (OXCT1, EHHADH, IL4I1, ALDH2, ACAA2, AOX1, and HMGCS2) were identified in LUAD and LUSC, respectively. Among them, IL4I1, ALDH2, and AOX1 were differentially expressed in both LUAD and LUSC. IL4I1 participated in the first step of the catabolic process of L-isoleucine, metabolizing L-isoleucine to (S)-3-Methyl-2-oxopentanoate, producing Ammonia and Hydrogen peroxide. In the process of valine catabolism, AOX1 mainly worked with aldehyde dehydrogenase protein family members ALDH2 and ALDH1B1 to oxidize Methylmalonate semialdehyde to Methylmalonate. Mutation, CNV, and methylation analyse of differentially expressed BCAAs catabolism-related enzymes were performed. Finally, a survival analysis of the differentially expressed BCAAs catabolism-related enzymes was performed. This study has some limitations. Our results were not verified in clinical NSCLC samples. This is a pilot study, and more experiments are needed to uncover the pathogenesis of differentially expressed BCAAs catabolism-related enzymes in NSCLC.

Availability of Data and Materials
The data sets used and analyzed during the present study are available from the Cancer Genome Atlas (TCGA) public database.