Disease Markers and Therapeutic Targets for Rheumatoid Arthritis Identified by Integrating Bioinformatics Analysis with Virtual Screening of Traditional Chinese Medicine

Objective : The aim of this study was to identify potentially important Rheumatoid arthritis (RA) targets related to immune cells based on bioinformatics analysis, and to identify small molecules of traditional Chinese medicine (TCM) associated with these targets that have potential therapeutic effects on RA. Methods : Gene expression profile data related to RA were downloaded from the Gene Expression Omnibus (GSE55235, GSE55457, and GSE77298), and datasets were merged by the batch effect removal method. The RA key gene set was identified by protein-protein interaction network analysis and machine learning-based feature extraction. Furthermore, immune cell infiltration analysis was carried out on all DEGs to obtain key RA markers related to immune cells. Batch molecular docking of key RA markers was performed on our previously compiled dataset of small molecules in TCM using AutoDock Vina. Moreover, in vitro experiments were performed to examine the inhibitory effect of screened compounds on the synovial cells of an RA rat model. Results : The PPI network and feature extraction with machine learning classifiers identified eight common key RA genes: MYH11 , CFP , LY96 , IGJ , LPL , CD48 , RAC2 , and CSK . RAC2 was significantly correlated with the infiltration and expression of five immune cells, with significant differences in these immune cells in the normal and RA samples. Molecular docking and in vitro experiments also showed that sanguinarine, sesamin, and honokiol could effectively inhibit the proliferation of RA rat synovial cells, also could all effectively inhibit the secretion of TNF-α and IL-1 β in synovial cells, and had a certain inhibitory effect on expression of the target protein RAC2 . Conclusions : The core gene set of RA was screened from a new perspective, revealing biomarkers related to immune cell infiltration. Using molecular docking, we screened out TCM small molecules for the treatment of RA, providing methods and technical support for the treatment of RA with TCM.


Introduction
Rheumatoid arthritis (RA) is a chronic, systemic, inflammatory progressive autoimmune disease occurring in the synovial joints and other organ systems.Although the underlying pathogenesis remains unclear, RA is generally considered to be caused by infections and inflammatory mediators [1,2].The incidence of RA increases with age, which most commonly manifests in adults aged 35-50 years.The global prevalence rate is approximately 0.5%-1.0%and the annual average incidence rate is 0.02%-0.05%[3], which is approximately 3 times greater in women than that in men [4].At present, western medicines are the main treatment for RA in clinical practice; however, these drugs are associated with high toxicity and side effects, as well as high treatment costs [5,6].Traditional Chinese medicine (TCM) has a long history of treating RA, and rich experience has accumulated in clinical application [7].
In recent years, studies have shown that anti-RA traditional Chinese medicinal materials and compound prescriptions have anti-inflammatory, analgesic, immunomodulatory, as well as multi-level and multi-link therapeutic effects, with additional advantages such as high safety, less adverse reactions, and low price [8,9].
With recent developments of molecular biology, immunology, and genetics, the pathogenesis of RA has been shown to be related to the changes and influences of genetic, bacterial, and viral factors, as well as associated immune cells (e.g., T lymphocytes, B lymphocytes) and cytokines.Indeed, immune cell infiltration plays an important role in the development and progression of RA.For example, CD4+ T cells account for the main inflammatory cell type invading the synovial tissue, which participate in the pathological process of RA [10].Therefore, from the viewpoint of the immune system, it is of great value to clarify the molecular mechanism of RA and identify new therapeutic targets by evaluating the infiltration degree of immune cells and determining the differences in components among infiltrating immune cells.
Moreover, given the above-mentioned advantages of many TCMs and their components, it is of great practical significance to identify potential effective anti-RA components from a large number of TCMs reported to have pharmacological effects, and directly or indirectly correlate these components with RA immune cell infiltration therapy.In other words, the core problem to be solved at present is to identify new key therapeutic targets related to immune infiltration in RA, and to screen out the effective components of TCMs that act on these targets with potential therapeutic effects.
Toward this end, in this study, we analyzed public gene expression profile data from the Gene Expression Omnibus (GEO), and screened out a differentially expressed gene (DEG) set for RA.We further applied bioinformatics approaches, including protein-protein interaction (PPI) network analysis, machine learning, and immune cell infiltration calculation, to identify new potential targets of RA.Based on our previously compiled active ingredient set of TCMs, associated small molecules in TCM were screened out according to molecular fingerprint similarity and molecular docking technology.Finally, these targets were verified and analyzed by in vitro cell experiments using synovial cells of an RA rat model.The flow chart of the study design is provided in Fig. 1.

DEGs Analysis of RA
The search term "Rheumatoid Arthritis" was queried in the GEO to obtain expression profile data of genes related to RA from three datasets GSE55235, GSE55457, and GSE77298 based on 10 normal samples and 10 RA samples, 10 normal and 13 RA samples, and 7 normal and 16 RA samples, respectively.
The raw gene expression profile data of all three RA GEO datasets were downloaded and subjected to preprocessing, such as standardization, correction, and gene name annotation using the limma package (version 3.42.2).The data were combined, and the sva package (version 3.36.0)was used to perform batch-removal correction on all chip data.DEGs in RA were screened according to |log 2 fold change (FC)| >1.0 and adjusted p value < 0.05.The ggplot2 package (version 3.3.1)and pheatmap package (version 1.0.12)were used to plot the heat map and volcano map corresponding to the screened DEGs.All the data analysis was done in R language (version 4.0.2,https://www.r-project.org/).

Functional Enrichment Analysis of DEGs in RA
The clusterProfiler package (version 3.14.3)was used for Gene Ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of all DEGs according to p < 0.05, q < 0.05, and Cytoscape (version 3.7.2,https://cytoscape.org/)software was used to construct an RA gene-pathway network.

Key RA Genes Based on PPI Network Analysis
All DEGs identified in RA were imported into the STRING (version 11.0, https://string-db.org/)database to construct a PPI network of RA targets, with the parameters set as the following: organism, Homo sapiens; combined score threshold, 0.7.The PPI network for RA was plotted using Gephi (version 0.9.2, https://gephi.org/).
To identify key RA genes in the PPI network, we used two methods for considering the network node degree, which differ from previous methods in that they focus on examining the efficiency of nodes in the disease network.
The maximum connected reduction (MCR i ) refers to the relative maximum connected reduction of the network after deletion of node v i in the network, which can be used as an index for measuring the importance of the deleted node, calculated as follows: where N represents the total number of nodes connected to the network before node deletion, and N i represents the number of nodes in the maximum connected branch after deletion of node v i .Similarly, the efficiency relative reduction (ERR i ) of node v i in the network is calculated as: where E is the average value of the reciprocal of the shortest distance between network nodes, d ij is the shortest distance between two nodes in the network, E i is the efficiency of the remaining networks after deleting node v i and k i edges connected to v i .Thus, ERR i represents the relative reduction of network efficiency after deleting node v i ; the larger the ERR i , the greater the disturbance degree of node v i to the PPI network.We included the top genes in terms of the above two indicators into the RA key gene set based on PPI network efficiency analysis.

Key RA Gene Discovery Based on Feature Extraction
According to the sample classification of the normal group and RA group, we used the RFECV function in the sklearn package (version 0.24.2) and adopted three different machine learning algorithms, namely least absolute shrinkage and selection operator (lasso) regression, random forest (RF) classifier, and linear support vector machine (SVM) classification, to perform feature extraction on the gene expression data of RA samples to screen out the key genes based on classification features.The gene sets screened out by these three different machine learning classifiers were intersected, and the characteristic genes common to more than two classifiers were selected as the key RA gene set based on the feature extraction method.
The key RA gene sets based on PPI network analysis and feature extraction were intersected again to obtain the core gene set of RA.To verify the disease classification effect of these core genes, the three different machine learning classifiers mentioned above were used to predict RA classification, which was evaluated by the area under the receiver operating characteristic (ROC) curve (AUC) value.All classification prediction analyses were performed on the Anaconda3 (https://www.anaconda.com/).

Immune Cell Infiltration Evaluation and RA Gene Correlation Analysis
Based on GEO gene expression profile data, CIBER-SORT Toolkit (version 1.03) was used to analyze immune cell infiltration in RA, and the abundances of 22 immune cell types in RA were calculated.The abundance difference heat map, correlation heat map, and violin plot of immune cells in each sample were plotted using the pheatmap (version 1.0.12),corrplot (version 0.84), and vioplot (version 0.3.5)packages, respectively.Spearman correlation analysis was used to analyze the correlations between 22 types of infiltrating immune cells and key genes to identify genes closely correlated with immune cell infiltration.

Screening of Small Molecules in TCMs Based on Molecular Fingerprint Similarity
The chemical information and data of each component in our previously compiled active ingredient set of natural traditional Chinese medicines proven to have clear pharmacological effects, including 432 small molecules [11], were obtained through the PubChem database (https://pubc hem.ncbi.nlm.nih.gov/),including the English name, CAS, PubChem CID, molecular formula, canonical Simplified Molecular-Input Line-Entry System (SMILES), and SDF file.
Before molecular docking of TCM small molecules according to known target receptors, the receptor file containing the original ligands was downloaded from the PDB database (https://www.rcsb.org/);molecules with a similar structure to the original ligand were considered as the potential and optimal candidates for further screening using molecular fingerprint similarity.
According to canonical SMILES, the molecular fingerprint of each compound was calculated using DRAGON (version 7.0, Kode, Pisa, Italy) software, and the similarity between the original ligands and TCM small molecules was calculated by the Tanimoto coefficient: Among them, a is the number of descriptors of the basic fragment in compound A, b is the number of descriptors of the basic fragment in compound B, and c is the number of descriptors of the basic fragment shared by compounds A and B.
Further, the molecular fingerprint similarity coefficient S(A, B) between any pair of original ligand and TCM small molecule was obtained.Taking the overall Tanimoto coefficient value T (A, B) as the background similarity value, the Z value was calculated according to Eqn. 5 to measure and screen whether the molecular fingerprint similarity between the original ligand and TCM small molecule was significantly greater than the mean value of overall background similarity, as follows: where S (A, B) is the overall mean and sd(S(A, B)) is the overall standard deviation.

Screening Active Components in TCMs Based on Molecular Docking
The ligand SDF file of all collected TCM small molecules was downloaded from the PubChem, wherein the three-dimensional (3D) structure file was directly converted into a PDB file using PyMOL (version 1.6, https: //pymol.org/2/)and the two-dimensional file was converted into a PDB file in 3D format using Chem3D (version 19.0, PerkinElmer, Waltham, Massachusetts, USA).The PDB file of target receptors was downloaded from the PDB database, and key target proteins were pre-processed using PyMOL to remove impurities such as water molecules.The PDB file of receptors and ligands was then converted to a PDBQT file using AutoDockTools (version 1.5.6,https: //autodock.scripps.edu/).
The coordinates of the central position for molecular docking were determined by referring to the binding site of the protein receptor and the original ligand, and the docking center parameter was defined as a box size of 30 × 30 × 30.AutoDock Vina (version 1.1.2,https://vina.scripps.edu/) software was used for semi-flexible molecular docking, the affinity values (kcal/mol) of all small molecules with the key targets were calculated, and judgment and analysis were based on the affinity values and docking conformations.

In Vitro Verification in RA Rat Synovial Cells 2.8.1 Preparation for Normal Membrane Cells and RA Synovial Cells
Eight female clean-grade Wistar rats (each weigh = 120 ± 10 g), were purchased from Shanghai Model Organisms Center, Inc (Shanghai, China).And they were kept in the company.Rats were divided into 2 groups of 4 animals each, one for normal and the other was for CIA model.
3 mg/mL of bovine CII in 0.1 M acetic acid was emulsified with an equal volume of complete Freund's adjuvant (CFA) to create a stable CII/CFA emulsion (10 mg/mL).Four rats were injected (i.d.) with the CII/CFA emulsion in the back (0.1 mL) and tail (0.1 mL); the day of administration was referred to as "day-0".On day-7, all rats received 0.1 mL of a CII/CFA booster via tail injection (i.d.).
After 14 days of modeling, the degree of posterior plantar redness and swelling of CIA rats was significantly different from that of the normal group according to measurement of the plantar volume and Arthritis Score (p < 0.05).These observations were in accordance with the standard required for an arthritis model, which suggested that modeling had been successful.
All rats were killed, and soaked by 75% alcohol for 2 minutes.Fur was removed from the rats knee and muscle was separated.The synovial layer tissue was removed and was placed in physiological saline with Penicillin-Streptomycin for 5 minutes.The synovial layer tissue from different group was rinsed by PBS, and then cut into pieces on petri dish.We used collagenase II mixed with 10% FCS to digest the pieces of synovial layer, and then added DMEM to mix all evenly.After the synovial tissue passing through 200-mesh sieve, we added 4 mL DMEM to mix these synovial cells well.After Synovial cells were cultured in the incubator, cells were harvested and adjusted to a concentration of 1 × 10 7 /mL/well.

Influence of Small Molecules on the Proliferation of Normal Membrane Cells and RA Synovial Cells
Rat RA synovial cells were cultured in RPMI-1640 medium containing 10% fetal bovine serum (Bio IND, Kibbutz Beit Haemek, Israel), inoculated in 96-well culture plates at 2000 cells/well, and then various compounds containing sanguinarine, chelidonine, sesamin, honokiol, and tripterygium glycoside (National Institutes for Food and Drug Control, Beijing, China) were added at different concentrations (1000, 500, 250, 125, 62.5, 31.25,0 µg/mL), followed by incubation at 37 °C for 72 h.After discarding the culture medium, 10 µL of Cell Counting Kit-8 (CCK-8, DOJINDO, Kumamoto, Japan) reagent was added to each well for incubation at 37 °C for 4 h, and the absorption values at 450 nm were measured on an EPOCH multifunctional microplate reader (BioTek, Winooski, Vermont, USA).The half-maximal inhibitory concentration (IC 50 ) values were calculated using GraphPad Prism (version 8, GraphPad Software, San Diego, CA, USA).

Determination of TNF-α and IL-1β Contents
The cells of CIA rats were subjected to maintenance culture in RPMI-1640 medium containing 10% FBS on 6well plates at 4 × 10 5 cells/well, and then RPMI-1640 medium containing 10% FBS was added at 1800 µL per well for culture, followed by the addition of 200 µL of the compound solutions: sanguinarine (350 µg/mL), che-lidonine (500 µg/mL), sesamin (750 µg/mL), and honokiol (350 µg/mL).Tripterygium glycoside was used as the control drug and the non-treated group was used as the arthritis model cell control group.The cells were cultured in an incubator (Model 3111 CO 2 incubator; Thermo Fisher, USA) with 5% CO 2 at 37 °C for 72 h.The culture medium was collected and centrifuged at 400 × g for 5 min, and the supernatant was collected for detection of rat TNF-α and IL-1β by enzyme-linked immunosorbent assay (ELISA) kits (Multi Sciences) according to the manufacturer instructions.

Changes in Cell Target Proteins Detected by Western Blotting
The cell samples treated with the various compounds as described above were collected, subjected to protein determination by the Bradford method, and boiled at 100 °C for 5 min in loading buffer.The samples were then loaded onto a 10% sodium dodecyl sulfate-polyacrylamide gel electrophoresis gel at equivalent volumes for electrophoresis, and the proteins on the gel were transferred onto cellulose acetate membranes using transfer buffer solution.The samples were blocked with 5% skimmed milk powder for 1 h, and rabbit derived anti rat RAC2 antibody (Abcam, Cambridge, England) was added at 1:1000.The samples were placed in a refrigerator at 4 °C overnight, and the goat anti rabbit IgG (H+L) (Beyotime, ShangHai, China) labeled with horseradish peroxidase was added at 1:1000.After washing in Tris-buffered saline with Tween, the samples were subjected to color development by enhanced chemiluminescence and images were captured and analysis using Fluorescence imaging system (CLiNX, Shanghai, China).In addition, Image J (version 1.80, https://imagej.nih.gov/ij/) was used to measure and analyze the gray level of WB expression.

Statistical Analysis
One-way ANOVA for multiple comparisons, and Student's t test for two groups comparisons were utilized to analyze the significance of differences.Data were analyzed by SPSS software (version 18.0, IBM Corp., Chicago, IL, USA), results were considered as statistically significant if the p-value was < 0.05.

DEGs in RA
After the combination and batch removal correction of the GSE55235, GSE55457, and GSE77298 datasets, a total of 527 DEGs in RA were screened out, including 348 upregulated genes and 179 down-regulated genes.The heat map and volcano plot for the expression levels of the DEGs are shown in Fig. 2.

Functional Annotation and Enrichment Analysis Results of DEGs in RA
Functional annotation and enrichment analysis suggested that the DEGs in RA are mainly involved in various immune-related functions and pathways.Among the 1233 associated GO biological functions, the DEGs were mainly concentrated in 24 GO functions, as shown in Fig. 3A., namely leukocyte migration (GO:0050900), T cell activation (GO:0042110), immune response-activating cell surface receptor signaling pathway (GO:0002429), immune response-activating signal transduction (GO:0002757), etc.
KEGG pathway enrichment analysis suggested that the DEGs are mainly involved in 39 signaling pathways, including cytokine-cytokine receptor interaction (hsa04060), chemokine signaling pathway (hsa04062), Epstein-Barr virus infection (hsa05169), viral protein interaction with cytokine and cytokine receptor (hsa04061), and cell adhesion molecules (hsa04514); the top 30 main signaling pathways with enrichment of the highest number of genes are shown in Fig. 3B and the gene-enrichment pathway network is shown in Fig. 3C.

Identification and Analysis of Key RA Genes
We imported the 527 DEGs into the STRING to obtain the PPI network of RA, as shown in Fig. 4A.Two types of importance degree values of all genes in the PPI network were obtained using the MCR and ERR algorithms, respectively, as shown in Fig. 4B.
Using the RFECV tool to extract features and combining with each of the three machine learning algorithms, 486 genes related to classification were screened out by the lasso regression model, 111 were obtained by the RF classification model, and 35 were extracted by the linear SVM model.After taking the intersection of the genes extracted by the three classification models, a total of 134 genes with classification characteristics were obtained (Fig. 5A).After further intersecting with the core gene set obtained by PPI network analysis, eight common genes were found: MYH11, CFP, LY96, IGJ, LPL, CD48, RAC2, and CSK (Table 1).
Further, we took these eight genes as the classification feature indicators of data samples, used the three machine learning classification models (Lasso, RF, and SVM) in the sklearn package again for 50% cross-validation, and calculated and plotted AUC values and ROC curves of the three classifier models, as shown in Fig. 5B.The average classification prediction AUC value of these eight genes in the  three classifiers was 0.63, and the highest value was 0.69.Even though the sample size of RA data included in this analysis was relatively small, the AUC values of the three classification models were all greater than 0.6, indicating that these eight core genes do have certain classification potential for the existing RA samples, suggesting that these genes have a certain correlation with RA.

Results of Immune Cell Infiltration Evaluation and Gene Correlation Analysis
The results of immune cell infiltration analysis are shown in Fig. 6.Among the 22 types of immune cells in RA, correlations were found between neutrophils and activated dendritic cells (ρ = 0.58), neutrophils and resting natural killer (NK) cells (ρ = 0.59), and resting NK cells and CD4+ naïve T cells (ρ = 0.50); however, the correlation coefficients of the infiltration degrees between most other immune cells were generally low.

Among the 22 types of immune cells, the infiltration of memory B cells, plasma cells, CD8+ T cells, CD4+ resting memory T cells, CD4+ activated memory T cells, follicular helper T cells, gamma delta T cells, activated NK cells, M1
macrophages, resting dendritic cells, activated mast cells, and eosinophils was significantly different between the normal group and RA group (p < 0.05).
Moreover, the five types of immune cells correlated with RAC2 showed significant differences between the nor-mal and RA samples.Therefore, we selected RAC2 protein for the subsequent screening and verification of TCM small molecules.

Screening of TCM Small Molecules Based on Molecular Fingerprint Similarity
A PDB file (PDB ID: 2W2V) [12] of RAC2 protein was downloaded from the PDB database.All four chains of RAC2 protein had the known ligand guanosine-5'triphosphate (GTP).Therefore, we screened for TCM small molecules with high similarity to GTP in terms of the chemical structure.We collected one SMILES string of GTP (PubChem CID: 135398633) from the PubChem.The TCM small molecules and the SMILES strings of GTP were imported into DRAGON software to calculate the 1024-dimensional extended-connectivity molecular fingerprints of all small molecules.
A total of 31 TCM small molecules with similar structures to GTP were screened out according to a threshold of Z > 1.0 for subsequent molecular docking screening.

Screening Results Based on Molecular Docking
Since RAC2 protein has four chains (A, B, C, and D) and each chain has a corresponding ligand-binding site, we performed batch molecular docking on four sites in the four chains of RAC2.Among them, the A-chain binding site coordinate is (5.711, -4.855, 41.735), the B-chain binding site coordinate is (-10.673,-5.506, 23.999), and the C-chain binding site coordinate is (10.654,9.806, -11.671), the Dchain binding site coordinate is (14.897,58.007, 29.835).According to the scores of docking the 31 small molecules with the molecules on the different chains of RAC2, the top 14 TCM small molecules with average affinity values less than -7.0 kcal/mol and with relatively good scores were as follows: jatrophon, sanguinarine, miroestrol, withaferin A, pristimerin, chelidonine, picroside I, sesamin, gnidimacrin, cephalotaxine, honokiol, oleuropein, atractylenolide I, and dracorhodin.
Finally, based on the molecular docking results and the reported literature, we selected four TCM small molecular compounds for subsequent experimental verification: sanguinarine, chelidonine, sesamin, and honokiol.It should be noted that since Jatrophon was out of stock, we chosen sanguinarine with the second-ranked molecular docking score.The other three components were all reported small molecules of TCM with potential to treat RA (Table 3, Ref. ).

Verification with RA Rat Model Synovial Cells
The in vitro CCK-8 assays showed that none of the screened out TCM compounds had a significant effect on the proliferation of normal rat synovial cells; however, the proliferation of RA rat synovial cells to sanguinarine, chelidonine, sesamin, and honokiol was significantly decreased, with IC 50 values of 344.3 ± 3.2 µg/mL, 511.7 ± 4.7 µg/mL, 756.8 ± 7.0 µg/mL, and 345.1 ± 3.2 µg/mL, respectively (Supplementary Material: The curves used to determine IC 50 of these 4 small molecules).
Compared with the normal control group, the levels of TNF-α and IL-1β produced by rat RA synovial cells significantly increased, whereas the levels of TNF-α and IL-1β in cell the culture supernatants of the sanguinarine-, sesamin-, and honokiol-treated cells significantly decreased, suggesting that these compounds could significantly inhibit the hypersecretion of TNF-α and IL-1β in rat RA synovial cells (Fig. 8A).
Western blotting confirmed that the expression of RAC2 protein in RA rat synovial cells was significantly higher than that in normal synovial cells; however, treatment with sanguinarine, sesamin, and honokiol could significantly suppress the expression of RAC2 protein (Fig. 8B), and the significant differences in the WB gray values of the RA group and the three small molecules were shown in Fig. 8C.

Discussion
RA is a chronic inflammatory autoimmune disease characterized by infiltration of immune cells into the affected synovium, release of inflammatory cytokines and degradation of mediators, and subsequent joint damage [35,36].With the rapid development of molecular biology and analysis tools, bioinformatics has provided powerful strategic markers for the screening of molecules [37,38], and the CIBERSORT tool has promoted disease analysis with patterns of immune cell infiltration [39].
Given the limited sequencing data available on RA patients in existing public databases, we performed DEG analysis by combining RA chip data from three datasets.Further, to ensure the reliability of the results, batch correction was first performed.Nearly 66% of the DEGs in RA were up-regulated and 34% were down-regulated.Among the  Many reports have shown that nodes with relatively large degree values in some networks often do not truly reflect their actual roles in PPI networks.Moreover, if the efficiency of the entire disease network will be substantially reduced after a node is removed from the network, this can indicate that the node plays a more significant role in the network.If drugs can interfere with these targets, the protein regulation relationship within the disease would be destroyed, offering an effective treatment target.Therefore, in this study, we used two methods for analysis of network node degree values to identify important nodes in the PPI networks, which differ from the methods conventionally used for such analyses.The MCR and ERR algorithms adopted in this study were used to measure the actual efficiency of nodes in the networks to evaluate their importance.
Based on the combined data of the RA gene expression profiles from 66 samples, we used feature selection and adopted three machine-learning classification models (Lasso, RF, and SVM) to identify the gene set with a significant classification effect on RA samples.Finally, through these two analysis methods, we could ensure that the key genes obtained are not only the important nodes in the disease protein regulatory network but also characteristic genes in RA sample classification.
According to the six key genes we finally obtained with this approach, CD48 is a member of the signaling lymphocyte activation molecule family, with a primary role in the adhesion and activation of immune cells [40].Other studies revealed that CD48 is a key immunomodulatory molecule that affects the prognosis of glioma [41], and it was also suggested as a potential new target for asthma treatment [42].Clinical research has shown that another one of our key genes in RA, IGJ, is a marker of antibodysecreting plasma blasts, and 25% of the treated subjects had an increased mRNA expression level of IGJ at baseline [43].Polymorphism of LY96 was correlated with susceptibility to inflammatory bowel disease in the Danish population [44], and also emerged as a significant predictor of a poorly induced treatment response and the worst survival rate.A single nucleotide repeat (C8) in the coding sequence of MYH11 was proposed as a mutation target of cancer showing microsatellite instability [45].CSK plays an important role in regulating intestinal epithelial barrier function and preventing colitis [46], and can also enhance the innate immune response to DNA viruses by phosphorylating MITA [47].
At present, there have been some reports on the identification of potential targets of RA based on bioinformatics methods [48,49].In the work of [41], it was identified STAT1, RAC2 and KYNU as hub proteins.Consistent with our findings, RAC2 was also found to be a potential therapeutic target in RA.
RAC2 activation is necessary for macrophages to polarize to the fibrosis phenotype and is associated with the progression of pulmonary fibrosis [50].RAC2 can regulate the calcification in atherosclerosis by regulating the production of IL-1β in macrophages [51].RAC2 mutation was reported to promote immune dysfunction by promoting RAC2 overactivation, changing GEF specificity, and impairing GAP function, while retaining key effector in-teractions [52].RAC2 was also identified as a prognostic biomarker and was found to promote the progression of clear cell renal cell carcinoma [53].Another study showed that RAC2 participates in bleomycin-induced lung inflammation contributing to pulmonary fibrosis [54].Moreover, some reports have indicated the involvement of RAC2 in the inflammation-related process of RA [55], and the interactions between STAT1 and RAC2 in nitric oxide regulation were proposed as candidate therapeutic targets for RA [56].
In the present study, the enrichment analysis showed that the GO functions of RAC2 mainly involved processes related to T cell function, such as T cell activation, regulation of immune effector process, regulation of T cell activation, and T cell proliferation.In line with this finding, previous studies showed that RAC2 plays a unique role in the production of common lymphoid progenitors, plays diverse and indispensable roles in the later stage of T cell development by regulating survival and proliferation signals, and plays a very important role in mediating transcription and cytoskeleton changes during T cell activation [57].
In addition, our KEGG pathway enrichment analysis showed that RAC2 mainly participates in regulation of the chemokine signaling pathway, B cell receptor signaling pathway, viral myocarditis, leukocyte transendothelial migration, NK cell-mediated cytotoxicity, and other signaling pathways.Among these pathways, previous studies have confirmed that the chemokine signaling pathway is involved in the expression of CCL2 in patients with RA [58], and that the B cells and the B cell signaling pathway are associated with the pathogenesis and therapeutic targets of RA [59,60].Collectively, these findings indicate that RAC2 may be a potential new therapeutic target for RA.
We therefore screened for TCM small molecules as RAC2 targets focusing on GTP small molecules in the original ligands based on molecular fingerprint similarity, followed by molecular docking based on the reference positions on the four different chains of the original ligands to screen candidates with high treatment potential.Based on this analysis, four small molecules were screened out with high affinity: sanguinarine, sesamin, and honokiol, which were verified to have anti-inflammatory effects using in vitro assays with synovial cells of a collagen-induced RA rat model.
To our knowledge, this is the first study that a multidisciplinary approach has been integrated to discover potential targets of RA and screen small molecules of TCM for their treatment, and we have established a feasible anti-RA drug discovery process.However, due to the complexity of RA and the limitations of experimental conditions, we did not conduct a comprehensive verification of the molecular mechanisms and functions of all eight potential RA targets, but only selected RAC2 for in vitro experimental verification.Therefore, the precise functions of targets should be further elucidated in the future.Although this study found four small molecules of TCM targeting RAC2 through in vitro experiments, we did not perform the corresponding in vivo verification.Further studies are warranted to the actual therapeutic effect of these small molecules in RA by in vivo experiments.

Conclusions
In summary, we attempted to identify new potential diagnostic markers of RA with the combination of PPI network analysis and machine learning, and further used computer-aided drug design methods such as chemoinformatics and molecular docking to screen out therapeutic TCM small molecules with therapeutic potential for RA.The results of this study can provide strong support for further appropriate development of TCMs in the treatment of RA.

Fig. 2 .
Fig. 2. DEGs analysis results of RA.Heat map (A) and volcano plot (B) of expression analysis of 527 differentially expressed genes in rheumatoid arthritis (RA) compared with normal control (NC) samples after combination and batch removal of three groups of GEO chip datasets (GSE55235, GSE55457, and GSE77298).

Fig. 3 .
Fig. 3. Functional annotation and enrichment analysis results of DEGs in RA. (A) GO functional enrichment analysis of differentially expressed genes in RA. (B) KEGG pathway enrichment analysis of differentially expressed genes in RA. (C) Gene-pathway enrichment network; the dots represent genes and the squares represent pathways.

Fig. 4 .
Fig. 4. Identification key RA genes by PPI network.(A) Protein-protein interaction network of RA. (B) Screening of key RA genes based on the MCR (top) and ERR (bottom) algorithms.

Fig. 5 .
Fig. 5. Identification key RA genes by machine learning.(A) Results of key RA gene screening based on the lasso regression model, random forest classifier (RFC), and support vector classifier (SVC) feature extraction, and RA core gene discovery based on PPI network analysis and machine learning feature extraction.(B) Receiver operating characteristic (ROC) curves of disease prediction classification of the eight core genes.

Fig. 6 .
Fig. 6. Results of immune cell infiltration evaluation and gene correlation analysis.(A) Infiltration and abundance of 22 types of immune cells in all RA samples.(B) Heat map of the infiltration and abundance of 22 types of immune cells in RA. (C) Correlation heat map of 22 types of immune cells in RA. (D) Differences in the infiltration of 22 types of immune cells between RA and normal samples.

Fig. 7 .
Fig. 7.The molecular docking results of these 4 small molecules and RAC2.The 2D conformation diagram of (A) Sanguinarine, (B) Chelidonine, (C) Sesamin, and (D) Honokiol in the 4 chains of RAC2, from left to right shown the ligand binding poses in chain A, chain B, chain C, and chain D.