Bioinformatic Analysis and Experimental Validation of Ubiquitin-Proteasomal System-Related Hub Genes as Novel Biomarkers for Alzheimer’s Disease

Background : Alzheimer’s disease (AD) is a common progressive neurodegenerative disease. The Ubiquitin-Protease system (UPS), which plays important roles in maintaining protein homeostasis in eukaryotic cells, is involved in the development of AD. This study sought to identify differential UPS-related genes (UPGs) in AD patients by using bioinformatic methods, reveal potential biomarkers for early detection of AD, and investigate the association between the identified biomarkers and immune cell infiltration in AD. Methods : The differentially expressed UPGs were screened with bioinformatics analyses using the Gene Expression Omnibus (GEO) database. A weighted gene co-expression network analysis (WGCNA) analysis was performed to explore the key gene modules associated with AD. A Single-sample Gene Set Enrichment Analysis (ssGSEA) analysis was peformed to explore the patterns of immune cells in the brain tissue of AD patients. Real-time quantitative PCR (RT-qPCR) was performed to examine the expression of hub genes in blood samples from healthy controls and AD patients. Results : In this study, we identified four UPGs ( USP3 , HECW2 , PSMB7 , and UBE2V1 ) using multiple bioinformatic analyses. Furthermore, three UPGs ( USP3 , HECW2 , PSMB7 ) that are strongly correlated with the clinical features of AD were used to construct risk score prediction markers to diagnose and predict the severity of AD. Subsequently, we analyzed the patterns of immune cells in the brain tissue of AD patients and the associations between immune cells and the three key UPGs. Finally, the risk score model was verified in several datasets of AD and showed good accuracy. Conclusions : Three key UPGs are identified as potential biomarker for AD patients. These genes may provide new targets for the early identification of AD patients.


Introduction
Alzheimer's disease (AD) is a progressive neurodegenerative disease characterized by memory deficits and cognitive impairments.Approximately 55 million people worldwide were diagnosed with AD in 2021.Most AD cases occur after the age of 65 and the average living period of AD patients over 65 is 4 to 8 years.From the pathological point of view, AD is characterized by amyloid-beta (Aβ) accumulation and tau aggregates [1,2].Numerous studies suggest that Aβ accumulation in the brain starts more than ten years prior to the onset of AD symptoms [3][4][5].Thus, clearing Aβ deposits is a promising therapeutic strategy to mitigate AD [6][7][8].
The Ubiquitin-Protease system (UPS), which degrades dysfunctional/misfolded proteins, plays an important role in maintaining protein homeostasis in eukaryotic cells [9].The UPS is also involved in several physiological processes, such as cell survival, differentiation, and innate immunity [10][11][12].Dysfunction of the UPS triggers var-ious diseases including tumors, cardiac pathophysiology, Parkinson's disease, and AD [13][14][15][16].A previous study reported that UPS dysfunction promoted Aβ accumulation via increasing α-secretase in neurons of AD [17].Additionally, proteasomal activity in the AD brain is evidently decreased [18] and UPS dysfunction leads to synaptic plasticity impairments in AD [19,20].Thus, clarifying the molecular mechanism underlying UPS dysfunction in AD is essential for finding new diagnostic markers for AD and developing potential therapeutic drugs.
With the ability to quickly and accurately analyze large data sets, bioinformatics has been extensively used to analyze disease characteristics and identify early diagnostic markers of diseases [21,22].For example, through bioinformatic analysis, a prognostic risk model for head and neck squamous cell carcinoma based on eight UPS-related genes (UPGs) was established [23].A high-risk group of lung adenocarcinoma patients showed higher mutation and tumor mutation burden, analyzed by bioinformatics [24].However, no study has focused on which UPGs are critical for AD development.In addition, immune cell infiltration has been found in the brains of clinical AD patients [25] and depletion of natural killer (NK) cells relieved cognitive impairment in 3xTg-AD mice [26].However, the relationship between UPGs and immune cell infiltration in AD patients' brains is not clear.
In the present study, we explore the differential UPGs in AD patients using bioinformatic methods, identify potential biomarkers for early detection of AD, and analyze the correlation between potential biomarkers and immune cell infiltration in AD.

Analysis of Differentially Expressed Genes
Data was downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo).According to AD patients' ID, patients' clinical data was screened using the following criteria: x AD was diagnosed via NINCDS-ADRDA and DSM-IV criteria, y available expression profile, and z available year data.GSE33000 included the mRNA expression profiles of brain tissue from 310 AD patients and 157 healthy controls.GSE5281, GSE26972, GSE29378, GSE36980, GSE48350, and GSE63060 were used as validation sets.GSE5281 included the transcriptional profiles of 74 AD brain samples and 87 healthy control samples.GSE26972 included the transcriptional profiles of 3 AD brain samples and 3 healthy control samples.GSE29378 included the transcriptional profiles of 31 AD brain samples and 32 healthy control samples.GSE36980 included the transcriptional profiles of 33 AD brain samples and 47 healthy control samples.GSE48350 included the transcriptional profiles of 80 AD brain samples and 173 healthy control samples.GSE63060 included the blood RNA profiles of 49 AD samples and 64 healthy control samples.All of the above data was obtained from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/gds).We downloaded the genes of the Ubiquitin-Proteasome pathway from the Path-Cards (https://pathcards.genecards.org/) by using keywords "Ubiquitin".Raw data from GEO data sets were annotated in line with their respective platform files and probes were converted to gene symbols.The software Perl and R (version 4.0.2) (https://www.r-project.org/)were used for data preprocessing.The "limma" [27] R-package was utilized to determine the differentially expressed genes (DEGs) from GSE33000 using |logFC| >0.1 and Adj.p. Val < 0.05.Venn diagrams were used to analyze the UPS-related differentially expressed genes (UPDEGs).Volcano plots and heatmaps were used for visualization.

Weighted Gene Co-Expression Network Analysis (WGCNA)
The "Weighted Gene Co-Expression Network Analysis (WGCNA)" [28] R-package was utilized to analyze module identification.A topological overlap matrix [29] was performed to analyze the connection strength between genes (Dynamic Tree Cut: minModuleSize = 50; Cluster Cut: MEDissThres = 0.25).Candidate modules were defined as significantly correlated with AD.

Functional Enrichment Analysis
The "clusterProfiler" R-package was utilized to analyze the biologic processes (BP), cellular components (CC), and molecular functions (MF) enrichment analysis of DEGs.The potential enriched signals were analyzed via Kyoto Encyclopedia of Genes and Genomes (KEGG) (http s://www.kegg.jp/).False Discovery Rate (FDR) <0.05 was considered significant.

Construction and Validation of Predictive Model
Least absolute shrinkage and selection operator (LASSO) regression, random forest, and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) [30] were used to establish the diagnostic model using the most representative genes.The "glmnet","random Forest", "caret" packages were applied for this study.A Venn diagram was used to visualize the intersection of the above three methods.The screened genes were used for construction of the diagnostic model.Riskscores were obtained using the following formula: riskscore = Ʃ (βi × Expi), where βi represented the corresponding regression coefficients of each candidate prognostic gene and Expi was the candidate gene's expression value.The "pROC" R-package was utilized to analyze the receiver operating curve (ROC) curve.

Correlation of Clinical Characteristics and UPG
GSE106241 was used to study the correlation between clinical features and UPGs in AD patients.GSE106241 included the gene expression profiles of 60 AD brain samples with different clinical traits (Braak stages, alpha-, beta-, gamma-secretase activity, and amyloid-beta 42 levels).The R-packages "ggplot2" and "ggpubr" were used to visualize the correlation between clinical features and UPG on a violin plot.

Consensus Clustering
Key UPGs were selected to analyze the different subpopulations in AD by using "ConsensusClusterPlus" [31] packages.The cumulative distribution function (CDF) and total CDF curve area (delta area) were used to select the optimal cluster number.A Gene Set Variation Analysis (GSVA) analysis was performed using the "h.all.v7.5.1.symbols" files downloaded from the MSigDB online database (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp)."GSVA" packages was used to analyze the different pathways among the three clusters.The expression of 28 immune cell types in each sample were analyzed using single-sample Gene Set Enrichment Analysis (ssGSEA).

Sample Collection
The study protocol was approved by the Ethics Committee of the Guangdong Second Provincial Hospital (20191121-01-YXKXYJ-SZRLH2020) and adhered to the tenets of the Declaration of Helsinki.10 blood samples were collected from 5 healthy controls and 5 patients with AD.

Statistical Analysis
Spearman's correlation coefficient was performed for correlation analyses.The Kruskal-Wallis test was used to analyze the continuous variables between the three groups.The Wilcoxon test was performed to analyze the continuous variables between the two groups.A p < 0.05 was regarded as statistically significant.

Identification of UPDEGs in AD
Through the analysis of differentially expressed mRNA profiles from 310 AD patients and 107 healthy controls obtained from the GSE33000 dataset, 1593 significantly upregulated and 1974 significantly downregulated mRNAs were identified.Furthermore, by intersecting UPG and DEG, 5 significantly upregulated and 25 significantly downregulated UPDEGs were identified, as shown in Fig. 1A,B.

Identification of Key Modules through WGCNA
As shown in the Supplementary Fig. 1A, the soft threshold power value was set at 9 for scale-free network construction.A dendrogram was conducted to analyze module similarity via WGCNA (Supplementary Fig. 1B).The blue module showed a high connection with AD (r = -0.62,p = 2 × 10 −50 ) (Supplementary Fig. 1C).The cor-relationship of combined gene significance and the blue module membership is shown in Supplementary Fig. 1D.Therefore, the genes in the blue module were selected for subsequent analysis.

Enrichment Analysis of the Intersected Genes
Subsequently, 2410 genes were obtained through the intersection of the blue module and DEGs (Fig. 2A), and were further used for enrichment analysis.The BP analysis showed that signal transduction and nervous system development were enriched (Fig. 2B).The CC analysis showed that the genes were mainly related to the cytosol, cytoplasm, membrane, and extracellular exosome (Fig. 2B).The MF results showed that protein binding and calcium ion binding were enriched (Fig. 2B).The KEGG enrichment analysis revealed that the genes were predominantly enriched in metabolic, neurodegeneration-multiple diseases, cyclic adenosine monophosphate (cAMP) signaling, and oxidative phosphorylation (Fig. 2C).

Relationships of the UPS-Signature Genes and Clinical Characteristics
Next, we explored expression of the UPS-signature genes and found that only UPS3 was highly expressed in the AD group.The expression of HECW2, PSMB7, and UBE2V1 were all decreased in the AD group compared to the control group (Fig. 4A).Furthermore, we investigated the relationship between the UPS-signature genes and clinical characteristics in GSE106241 and found that USP3 was significantly associated with the Braak stages (Fig. 4B).While there were no significant differences among different Braak stages, the expression of HECW2 and PSMB7 were decreased in the high Braak stages (Fig. 4B).UBE2V1 expression showed no evident change in different Braak stages (Fig. 4B).With regards to clinical manifestations, USP3 was strongly correlated with amyloid-beta 42 levels and beta secretase activity, HECW2 was negatively correlated with beta secretase activity, and PSMB7 was negatively correlated with alpha secretase activity, gamma secretase activity, and beta secretase activity (Fig. 4C).Given that UBE2V1 was not associated with alpha secretase or beta secretase activity and that there was a lack of data on the correlation between UBE2V1 and amyloid-beta 42 and gamma secretase activity, we constructed the model with these three genes (USP3, HECW2 and PSMB7) through logistic regression.Riskscore = 0.1933 -9.4795 × USP3 + 5.8659 × HECW2 + 5.2057 × PSMB7.As shown in Fig. 4D, the riskscores were all negatively correlated to AD clinical features, including alpha secretase activity, amyloid-beta 42 level, gamma secretase activity, and Braak stages.

Evaluation of Immune Cell Infiltration
Considering that immune cell infiltration is associated with cognition and AD pathology [25,26], we then analyzed different immune cell populations identified between AD patients and healthy control from GSE33000.The expres-sion of most immune cells was up-regulated in AD patients, with the exception of activated CD4 T cells, effector memory CD4 T cells, and type 2 T helper cells (Fig. 5A).The expression of most immune cells was positively correlated with one another, with the exception of effector memory CD4 T cells (Fig. 5B).Furthermore, HECW2 and PSMB7 expression were negatively correlated with the expression of most immune cells and USP3 was positively correlated (Fig. 5C).

Consensus Clustering Analysis of UPG Clusters
A clustering analysis was performed based on the three UPGs (USP3, HECW2 and PSMB7).The optimal number of clusters was set as 3 via CDF and total CDF curve area (delta area) analyses (Fig. 6A-C).The principal component analysis (PCA) showed that the three clusters were distinct (Fig. 6D).The expression levels of USP3, HECW2, and PSMB7 are shown in the boxplot and heatmap (Fig. 6E,F).UPS3 had higher expression levels in cluster 2 than the other two clusters, while HECW2 and PSMB7 had higher expression levels in cluster 1 than the other two clusters (Fig. 6E,F).

GSVA of Pathways among Subclusters of UPGs
The GSVA analysis revealed several pathways with differential expression that were enriched.As shown in Supplementary Fig. 2A, TGF-β signaling, interferonα response, inflammatory response, interferon-γ response, TNF-α signaling via NF-κB, IL2-STAT5, and IL6-JAK-STAT3 signaling were higher in clusters 2 and 3, while DNA repair, glycolysis, and PI3K-AKT-mTOR signaling were lower.Furthermore, ssGSEA analysis demonstrated that the expression of most immune cells was low in cluster 1 and high in clusters 2 and 3. Macrophages and neutrophils were highly expressed in cluster 2, while eosinophils and mast cells were highly expressed in cluster 3 (Supplementary Fig. 2B).

Verification of the Diagnostic Efficacy of the Riskscore.
We further validated the diagnostic efficacy of the riskscores using GEO data sets.We found the area under the curve (AUC) of riskscores for GSE33000, GSE5281, GSE26972, GSE29378, GSE36980, and GSE63060 were 0.92, 0.8, 1.0, 0.734, 0.857, and 0.69, respectively.This was higher than that of age or gender (Fig. 7), suggesting the riskscore predictive model for AD was more advantageous than other clinical characteristics.

Validation of UPGs
RT-qPCR was used to detect the mRNA expression levels of the three UPGs in peripheral blood collected from the healthy controls and patients with AD.As shown in Fig. 8 and Supplementary Fig. 3, the expression level of USP3 was significantly increased.PSMB7 and UBE2V1 were decreased in the AD group in comparison to controls, confirming the accuracy of our study.

Discussion
AD is the most common cause of dementia, with a high incidence and a heavy burden on society and families [32].Unfortunately, there is no cure for AD.Thus, early prevention and diagnosis of AD is urgently needed.Some pathological features of AD have been identified, such as amyloid-beta (Aβ) peptide in cerebrospinal fluid [33], plasma p-Tau181 [34], and serum miR-24-3p [35].Emerging studies report that the UPS is abnormally expressed in AD patients, resulting in dysregulation of protein degradation, which may promote AD progression.In this study, we screened the differentially expressed UPGs that were specifically associated with AD in GEO data sets and identified four UPGs (USP3, HECW2, PSMB7, and UBE2V1) by multiple bioinformatic analyses.Subsequently, three UPGs (USP3, HECW2, PSMB7), which are strongly correlated with the clinical features of AD, were used to construct a riskscore prediction model to diagnose AD.Furthermore, we explored immune cell infiltration in the brain tissue of AD patients and the associations between immune cells and the three key UPGs.Finally, the riskscore model was verified in several data sets of AD and showed good accuracy.
UPS dysfunction has been identified in the AD brain [18] and is associated with the Aβ production [17] and synaptic plasticity impairments seen in AD patients [19,20].First, through KEGG analyses, we found that several pathways played important roles in the pathogenesis of AD.Recent studies have shown that HIF-1 signaling is a potential therapeutic target for AD [36].HIF-1 signaling inhibits tau hyperphosphorylation and nerve filament formation [37].Then, we identified three UPGs (USP3, HECW2, PSMB7), which are strongly correlated to the clinical features of AD.USP3 is responsible for protein deubiquitination and metabolism.It has also been identified to deubiquitinate and stabilize Kruppel-Like Factor 5 (KLF5), promoting the proliferation of breast cancer cells [38,39].KLF5 could bind to the Beta-Site APP Cleaving Enzyme 1 (BACE1) promoter to promote Aβ synthesis and accelerate the progression of AD [40].Consistently, our study found that USP3 is highly expressed in the brain of AD patients and is positively associated with amyloid-beta 42 levels.It is probable that USP3 may induce AD by deubiquitinating KLF5 to promote Aβ production.Thus, targeting USP3 may be a potential therapeutic strategy, which deserves further investigation.
HECW2, an E3 ubiquitin ligase and ubiquitin protein transferase, has an essential role in regulating neural crest cell development [41].Studies have found that HECW2 positively regulates the proliferation of intestinal neuroprecursors by Glial Cell Derived Neurotrophic Factor (GDNF) [42].GDNF could reduce the toxicity induced by amyloid beta [43], however the serum GDNF level in AD patients is significantly decreased [44].Recently, a HECW2 mutation was reported to be related to neurodevelopmental disorders associated with hypotonia, seizures, and absent language [45].In this study, we found that HECW2 was significantly decreased in the brain of AD patients and we speculate the reduction of HECW2 in brain may inhibit the expression of GDNF and aggravate the severity of AD.However, further investigation is needed.
PSMB7, a subunit of the proteasome, is reported to be involved in anthracycline resistance in breast cancer and bortezomib resistance in multiple myeloma [46,47].Downregulation of PSMA7 has been found to be involved in amyloid precursor protein-induced neural stem cell proliferation impairment, thereby promoting AD pathogenesis [48].Consistently, our study found that PSMA7 was significantly decreased in the brain of AD patients and was negatively related to alpha secretase and gamma secretase activity.Overall, our study identifies three key UPGs (USP3, HECW2, PSMB7) that are strongly correlated to the clinical features of AD and established a riskscore prediction model based on these three UPGs, which shows good accuracy for predicting the severity of AD.The detailed roles of these three UPGs in AD development is worth further investigation.
Based on the three key UPGs, we performed unsupervised cluster analyses to identify three distinct clusters.A GSVA analysis showed DNA repair, glycolysis, and PI3K-AKT-mTOR signaling pathways were enriched in cluster 1 and TGF-β signaling, interferon-α response, inflammatory response, interferon-γ response, TNF-α signaling via NF-κB, IL2-STAT5, and IL6-JAK-STAT3 signaling pathways were enriched in clusters 2 and 3.In addition, ss-GSEA analysis showed activated B cells, CD56dim natural killer cells, effector memory CD8 T cells, and Type 17 T helper cells had low expression in cluster 1. Macrophages and neutrophils [49,50] were highly expressed in cluster 2, and mast cells and eosinophils were highly expressed in cluster 3.By analyzing the relationship between the gene expression of the three clusters and clinical characteristics, cluster 1 was found to have the mildest clinical characteristics, while cluster 2 had the heaviest clinical characteristics.Previous studies found that AD patients' brain tissue had infiltrated macrophages, neutrophils, mast cells, and eosinophils [51,52].The pathological changes and inflammatory mechanism of AD caused by the infiltration of these immune cells need to be further studied.
This study had some limitations.For example, the Ubiquitin-Proteasomal system-related hub genes as novel biomarkers for AD were screened using the GEO public database with relatively small sample sizes.More samples are needed in the future studies.Second, the hub genes expressed in the blood of AD patients' needs to be validated with more clinical samples.Third, the cor-relationship between the expressions of hub genes and the severity of AD should be analyzed with patients' clinical characteristics and imaging data.Further large-scale basic studies could be carried out to verify the conclusions of this study.

Conclusions
Our study identified 3 key UPGs that are specifically associated with AD and established a nomogram to predict the probability of AD.Furthermore, we explored the patterns of immune cells in the brain tissue of AD patients and the associations between immune cells and three key UPGs as potential biomarkers of AD.However, the specific roles of these key UPGs in AD still needs to be further investigated through molecular experiments.

Fig. 2 .
Fig. 2. Analysis of enrichment.(A) Overlapping genes of the blue module and DEGs.(B) GO analyses of intersection genes.(C) KEGG analysis of intersection genes.WGCNA, weighted gene co-expression network analysis; DEGs, differentially expressed genes; BP, biologic processes; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; CC, cellular components; MF, molecular functions; FDR, False Discovery Rate.

Fig. 3 .
Fig. 3. Identification of key UPG.(A) Overlapping genes of the blue module and UPGs.(B-E) Screening the potential biomarkers via LASSO regression, SVM, and RF algorithm.(F) Venn diagram demonstrating the intersection of key UPGs obtained by three machine learning.LASSO, Least absolute shrinkage and selection operator; SVM, Support Vector Machine; SVM-RFE, Support Vector Machine-Recursive Feature Elimination; RF, Random forest.

Fig. 7 .
Fig. 7. Verification of the diagnostic efficacy of the riskscore.ROC curve for the riskscore.ROC, Receiver operating curve; TPR, true positive rate; FPR, false positive rate.