Academic Editor

Article Metrics

  • Fig. 1.

    View in Article
    Full Image
  • Fig. 2.

    View in Article
    Full Image
  • Fig. 3.

    View in Article
    Full Image
  • Fig. 4.

    View in Article
    Full Image
  • Fig. 5.

    View in Article
    Full Image
  • Fig. 6.

    View in Article
    Full Image
  • Fig. 7.

    View in Article
    Full Image
  • Information

  • Download

  • Contents

Abstract

Background:

Hepatocellular carcinoma (HCC) is the leading cause of cancer-related mortality worldwide. Despite advances in therapeutic approaches, the lack of effective biomarkers continues to limit early detection and prognostic evaluation. Pseudogenes, once considered nonfunctional, have emerged as regulators of biological processes in tumors and as potential biomarkers. This study aimed to identify and validate BMS1 Pseudogene 8 (BMS1P8) as a liver-specific, clinically relevant diagnostic and prognostic biomarker in HCC.

Methods:

A comprehensive survey of pseudogene expression across different stages of liver disease was performed and validated using clinical HCC samples. Correlation, enrichment, and competing endogenous RNA (ceRNA) analyses integrating matched microRNA (miRNA)-seq and mRNA-seq were used to explore the functional networks surrounding BMS1P8. Public RNA-seq datasets (GSE114564, The Cancer Genome Atlas-Liver Hepatocellular Carcinoma (TCGA_LIHC)) were used to delineate differentially expressed pseudogenes, and 98 paired tumor and non-tumor tissues were assessed using quantitative reverse transcription polymerase chain reaction. Diagnostic and prognostic performances were evaluated using receiver operating characteristic curves and Kaplan–Meier statistics.

Results:

BMS1P8 was markedly upregulated in HCC and was overexpressed in 25 other cancer types. Receiver operating characteristics analysis yielded an area under the curve of 0.81, underscoring the diagnostic utility. High BMS1P8 expression and enrichment of cell cycle pathways were associated with poor survival. ceRNA screening revealed an inverse BMS1P8–miR-30c-2-3p correlation and concordant NME/NM23 nucleoside diphosphate kinase 6 (NME6) upregulation, with the BMS1P8/miR-30c-2-3p/NME6 triad further stratifying patient outcomes.

Conclusion:

Our findings highlight BMS1P8 as a novel liver-specific biomarker with substantial diagnostic and prognostic value in HCC. Its diagnostic utility suggests its potential application in early detection and personalized treatment strategies, contributing to improved patient outcomes.

1. Introduction

Hepatocellular carcinoma (HCC) is one of the most common and lethal malignancies worldwide, representing a major global health burden owing to its high incidence and mortality rates [1]. It typically develops in the context of chronic liver disease, including hepatitis B virus (HBV) or hepatitis C virus (HCV) infection, alcoholic liver disease, and non-alcoholic steatohepatitis (NASH) [2, 3, 4]. Despite advances in the management of early-stage HCC with surgical resection, transplantation, and local ablative therapies, most cases are diagnosed at an advanced stage, contributing to poor long-term outcomes [5].

Thus, the early and accurate detection of HCC is imperative to improve patient survival. Although several diagnostic modalities, including imaging and serological tests, exist, these approaches have inherent limitations. Alpha-fetoprotein (AFP), the most commonly used serum biomarker, lacks the sensitivity and specificity required for reliable screening in the general population [6]. This shortcoming underscores the urgent need for more effective molecular biomarkers that can reliably detect HCC at an early stage and provide prognostic insights.

Recently, pseudogenes have attracted increasing interest as potential cancer biomarkers. Originally viewed as nonfunctional remnants of gene duplication or retrotransposition events, pseudogenes can be transcribed and exhibit regulatory functions like non-coding RNAs [7]. Growing evidence suggests that pseudogenes are involved in diverse biological processes, including cell cycle regulation, signal transduction, and epigenetic control [8]. For example, the phosphatase and tensin homolog pseudogene 1 (PTENP1) has gained attention for its capacity to regulate the tumor-suppressor gene PTEN by functioning as a microRNA (miRNA) decoy, thereby influencing cancer cell proliferation and survival in several malignancies [9, 10, 11]. Similarly, POU class 5 homeobox 1B (POU5F1B), a pseudogene of POU5F1/OCT4, is upregulated in gastric cancer and contributes to oncogenic behaviors [12]. Above these and other studies have underscored how pseudogenes can actively shape tumor biology and serve as potential diagnostic or prognostic markers across diverse cancer types.

In this study, we conducted a comprehensive analysis of pseudogene expression across multiple stages of liver disease and HCC, using publicly available RNA-seq datasets (GSE114564 and The Cancer Genome Atlas - Liver Hepatocellular Carcinoma (TCGA_LIHC)) and clinical samples. Our findings revealed that BMS1 Pseudogene 8 (BMS1P8) is highly upregulated in HCC with strong diagnostic performance and potential prognostic relevance. We further investigated its functional relationships using correlation analyses and pathway enrichment, which implicated BMS1P8 in cell cycle regulation and underscored its importance as a candidate biomarker for HCC. Moreover, competing endogenous RNA (ceRNA) analysis suggested that BMS1P8 may act as a molecular sponge for tumor-suppressive miR-30c-2-3p, thereby reducing the post-transcriptional repression of the oncogenic effector NME/NM23 nucleoside diphosphate kinase 6 (NME6). These findings provide a foundation for future studies to validate BMS1P8’s clinical utility and explore its mechanism of action in hepatocarcinogenesis.

2. Materials and Methods
2.1 Expression and Prognosis in Public Omics Data

We developed the GSE114564 dataset (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE114564) by integrating RNA-seq data from previously published HCC-related studies, capturing a range of phenotypes, including normal liver tissues and various HCC subtypes [13, 14, 15]. This comprehensive dataset, comprising 39,864 genes and 7913 pseudogenes, was used to characterize the progression from normal liver (NL) to advanced HCC (aHCC), encompassing intermediate stages such as chronic hepatitis (CH), liver cirrhosis (LC), dysplastic nodules (DN), and early-stage HCC (eHCC). Differential expression analysis was performed to identify pseudogenes significantly upregulated (log2 fold change [FC] 0.5 and *p < 0.05) during the progression from NL to HCC.

Expression levels in non-tumor (NT) versus tumor (T) samples were further assessed using TCGA_LIHC dataset (https://xenabrowser.net/datapages/?cohort=GDC%20TCGA%20Liver%20Cancer%20(LIHC)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443), followed by overall survival (OS) and disease-free survival (DFS) analyses to evaluate prognostic relevance. Here, OS was defined as the time from HCC diagnosis to death from any cause, and DFS was defined as the time from curative treatment to disease recurrence. Additionally, pan-cancer expression data from the TCGA database were examined to assess whether BMS1P8 expression was specific to liver cancer or also present in other malignancies. These data were downloaded from the Genomic Data Commons (GDC) hub of UCSC Xena (https://xena.ucsc.edu/) to ensure comprehensive coverage and accessibility [16].

2.2 Quantitative Reverse Transcription PCR (qRT-PCR)

Total RNA was extracted from frozen tissues using QIAzol Reagent (Qiagen, Cat# 79306, Hilden, Germany), following the manufacturer’s instructions. cDNA was synthesized from 500 ng of total RNA using 5× PrimeScript™ RT Master Mix (Takara Bio, Cat# RR036A, Shiga, Japan) under the following conditions: 37 °C for 15 min, 85 °C for 5 s, and 4 °C. qRT-PCR was performed with amfiSure qGreen Q-PCR Master Mix (GenDEPOT, Barker, Cat# Q5602, TX, USA) on a CFX Connect Real-Time PCR Detection System (Bio-Rad Laboratories, Hercules, CA, USA). Expression levels were normalized to hydroxymethylbilane synthase (HMBS) as an internal control. The amplification BMS1P8 primer sequences were 5-GCACATTCCAAAAGCCTTGC-3 (forward) and 5-TGTGCACCATACTCAGTGCA-3 (reverse); and the HMBS primer sequences were 5-ACGGCTCAGATAGCATACAAGAG-3 (forward) and 5-GTTACGAGCAGTGATGCCTACC-3 (reverse). The PCR conditions were as follows: 95 °C for 2 min, 40 cycles of 95 °C for 15 s, 58 °C for 34 s, and 72 °C for 30 s, followed by a dissociation stage at 95 °C for 10 s, 65 °C for 5 s, and 95 °C for 5 s. The relative standard curve method (2-Δ⁢Δ⁢Ct) was used to determine the relative expression. All experiments were performed at least three times.

2.3 Clinical Sample Collection

To validate the BMS1P8 expression patterns identified from public omics data, 98 paired HCC and corresponding non-cancerous liver tissues were obtained from the Biobank of Ajou University Hospital (Suwon, South Korea). qRT-PCR analysis was carried out as described above. Demographic and clinical information, including age, sex, etiology of liver disease, body mass index (BMI), platelet count, serum albumin, total bilirubin, international normalized ratio (INR), creatinine, sodium, aspartate aminotransferase (AST), alanine aminotransferase (ALT), AFP, protein induced by vitamin K absence-II (PIVKA-II), hemoglobin, glucose, total cholesterol levels, and the presence of ascites, was recorded (Table 1).

Table 1. Clinical characteristics of the HCC cohort.
Variables HCC (n = 98)
Age (years), mean ± SD (range) 56 ± 10.2 (29–75)
Male sex, n (%) 74 (75.5)
Etiology, n (%) ––
HBV 91 (92.9)
HCV 4 (4.1)
Alcohol 2 (2)
HCV + Alcohol 1 (1)
Cirrhosis, n (%) 71 (72.4)
Ascites, n (%) 15 (15.3)
BMI (kg/m2), mean ± SD (range) 24.1 ± 3.8 (16.6–36.0)
Platelet, ×109/L, mean ± SD (range) 178.4 ± 68.5 (58–534)
Albumin (g/dL), mean ± SD (range) 4.5 ± 0.6 (0.3–5.4)
Total bilirubin (mg/dL), mean ± SD (range) 0.8 ± 1.3 (0.1–12.9)
INR, mean ± SD (range) 1.1 ± 0.1 (0.91–1.94)
Creatinine (mg/dL), mean ± SD (range) 0.92 ± 0.2 (0.5–2.01)
Sodium (mmol/L), mean ± SD (range) 139.3 ± 2.2 (130–143)
AST (U/L), mean ± SD (range) 43.5 ± 57.8 (15–541)
ALT (U/L), mean ± SD (range) 40.5 ± 42.1 (8–332)
AFP (ng/mL), mean ± SD (range) 3185.2 ± 9989.2 (1.3–60,500)
PIVKA-II (mAU/mL), mean ± SD (range) 8711.3 ± 34,086.5 (13–300,000)
Hemoglobin (g/dL), mean ± SD (range) 14.4 ± 1.6 (8.2–17.1)
Glucose (mg/dL), mean ± SD (range) 123.0 ± 53.5 (72–411)
Total cholesterol (mg/dL), mean ± SD (range) 170.4 ± 41.0 (97–289)

AFP, alpha-fetoprotein; ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; HBV, hepatitis B virus; HCC, hepatocellular carcinoma; HCV, hepatitis C virus; INR, international normalized ratio; PIVKA-II, protein induced by vitamin K absence-II.

2.4 Enrichment Analysis Using Databases

Gene Ontology (GO) analyses of the enriched genes were performed using the enrichGO function in the R clusterProfiler package (v3.18.1, Bioconductor; https://bioconductor.org/packages/clusterProfiler). Enrichment analysis of the MSigDB Hallmark 2020 database (Broad Institute, Cambridge, MA, USA; https://www.gsea-msigdb.org/gsea/msigdb) was conducted using the enrichr function in the R enrichR package (v3.0; https://cran.r-project.org/package=enrichR). Additional pathway analyses, including the Kyoto Encyclopedia of Genes and Genomes (KEGG) 2021 Human (Kanehisa Laboratories, Kyoto, Japan; https://www.genome.jp/kegg/) and Reactome Pathways 2024 (Ontario Institute for Cancer Research, Toronto, Canada; https://reactome.org), were also carried out with clusterProfiler to identify biological processes and signaling pathways potentially associated with BMS1P8.

2.5 Risk Score Calculation for the 7-Gene Signature

A composite risk score was calculated by standardizing expression levels of the seven genes (BMS1P8, CCNB2, CDC20, CDC45, ESPL1, PLK1, and PTTG1) using Z-score transformation across the TCGA_LIHC cohort. Z-scores were computed for each gene as follows:

Z = ( X - μ ) σ

where X is the individual gene expression, µ is the mean expression, and σ is the standard deviation across the cohort. The individual risk score for each patient was then defined as the arithmetic mean of the Z-scores of the seven genes:

Risk score = ( 1 7 ) i = 1 7 Z i

This composite metric integrates the combined expression pattern into a single prognostic variable. Receiver operating characteristic (ROC) curve analysis was subsequently performed using these risk scores to evaluate the diagnostic and prognostic performance of the 7-gene signature.

2.6 miRNA Expression Profiling and ceRNA Network Construction

Raw mature-miRNA read counts for TCGA_LIHC were downloaded from the GDC using the TCGAbiolinks pipeline (v2.30.1, Bioconductor; https://bioconductor.org/packages/TCGAbiolinks) [17, 18]. Counts were filtered to retain miRNAs expressed at 1 count per million (CPM) in 30% of samples, then TMM-normalized and transformed to log2-CPM with the edgeR package (v4.2, Bioconductor; https://bioconductor.org/packages/edgeR). Differential expression between non-tumor (NT) and tumor (T) tissues was assessed with limma-voom (v3.58, Bioconductor; https://bioconductor.org/packages/limma); miRNAs with |log2 FC| 0.5 and false discovery rate (FDR) < 0.05 were considered significantly deregulated.

For ceRNA screening, Pearson correlation coefficients (r) were calculated between BMS1P8 and each miRNA across matched mRNA- and miRNA-seq tumor samples. Candidate tumor-suppressive miRNAs were defined as (i) downregulated in tumors (log2 FC –0.5, FDR < 0.05) and (ii) inversely correlated with BMS1P8 (r –0.20, *p < 0.05). Putative mRNA targets of each candidate miRNA were retrieved from miRDB (https://mirdb.org/) with a target score 70 [19]. Predicted targets were filtered to keep transcripts that were (i) upregulated in tumors (log2 FC 0.5, FDR < 0.05) and (ii) positively correlated with BMS1P8 (r 0.20, *p < 0.05) while showing a negative correlation with the cognate miRNA (r –0.20, *p < 0.05).

2.7 Statistical Analysis

All results are expressed as mean ± standard deviation (SD). Unpaired Student’s t-tests (GraphPad Software, version 10.0, San Diego, CA, USA) were used to determine differences between groups. Kaplan–Meier survival curves were generated for both OS and DFS, with significance assessed via the log-rank test. ROC curve analyses were conducted using MedCalc (MedCalc Software Ltd., version 22.0, Ostend, Belgium), providing the area under the curve (AUC), 95% confidence intervals (CIs), sensitivity, and specificity. Statistical significance was defined as p < 0.05. All experiments were repeated at least three times.

3. Results
3.1 Comprehensive Analysis of Pseudogene Expression in Liver Disease and HCC Progression

Using the multistage liver disease and cancer dataset GSE114564, RNA sequencing analysis was conducted to systematically evaluate gene expression changes across different stages of liver disease and cancer progression. A total of 39,864 genes were analyzed, classified into eight categories based on their coding potential: protein-coding genes (46.20%), long non-coding RNA (lncRNA) genes (13.09%), pseudogenes (19.85%), antisense transcripts (10.35%), miscellaneous RNA (2.58%), sense intronic RNA (2.16%), and processed transcripts (1.10%) (Fig. 1A, left pie chart). Among these, 7913 pseudogenes were identified and further categorized based on their biogenesis and characteristics. The majority, 71.06%, were processed pseudogenes, which are retro-transposed copies of functional genes that have lost their coding potential. The second-largest group, 11.03%, included unprocessed pseudogenes that retain introns and resemble their parent genes. Additionally, 6.46% were transcribed unprocessed pseudogenes, and 4.40% were transcribed processed pseudogenes. Smaller fractions included 3.48% generic pseudogenes, 1.58% IG_V_pseudogenes, and 1.36% unitary pseudogenes. The remaining 0.63% were classified as etc., encompassing pseudogenes with ambiguous or uncommon features (Fig. 1A, right pie chart).

Fig. 1.

Identification of differentially expressed pseudogenes in HCC. (A) Gene composition of the GSE114564 dataset, including 39,864 genes, with 7913 pseudogenes. (B) Heatmap of 171 pseudogenes showing differential expression across liver disease stages including non-cancerous liver (NL, normal liver; CH, chronic hepatitis; LC, liver cirrhosis; DN, dysplastic nodule) and tumor (eHCC, early-stage HCC; aHCC, advanced HCC) tissues, with 98 pseudogenes significantly upregulated during HCC progression. (C) Key upregulated pseudogenes, including BMS1P8, RP11-390F10.3, RP11-443P15.2, and ZNF192P1, identified as potential HCC diagnostic markers.

Based on these classifications, the hepatic tissue samples were categorized into six groups to represent the different stages of liver disease and cancer progression: NL, CH, LC, DN, eHCC, and aHCC (Fig. 1B). Heatmap analysis of 171 differentially expressed pseudogenes revealed dynamic expression patterns across these stages, with 98 pseudogenes significantly upregulated in advanced HCC (Fig. 1B). Notably, BMS1P8, RP11-390F10.3, RP11-443P15.2, and ZNF192P1 showed consistent upregulation during disease progression, highlighting their potential role in driving HCC development (Fig. 1C). Although PTGES3P1 clustered with other upregulated pseudogenes in early- and late-stage HCC (Fig. 1C), further analysis indicated it may not be a robust liver cancer–specific biomarker. PTGES3P1 showed elevated expression in the chronic hepatitis (CH) group, suggesting non-specific upregulation. Analysis of variance (ANOVA) with Tukey’s multiple comparisons test revealed significant differences only between normal liver and aHCC and between LC and aHCC, without consistent stepwise increases in HCC progression. Moreover, ROC analysis yielded an AUC below 0.7 for distinguishing tumor from non-tumor tissues, indicating limited diagnostic potential. Based on these findings, we concluded that PTGES3P1 does not meet the criteria for a promising HCC-specific biomarker (Supplementary Fig. 1A). These findings underscore the importance of pseudogene expression in the molecular landscape of liver cancer progression.

3.2 Diagnostic and Prognostic Significance of the Candidate Pseudogenes in HCC Progression

The expression patterns of four pseudogenes, BMS1P8, RP11-390F10.3, RP11-443P15.2, and ZNF192P1, were analyzed across six stages of liver disease progression (NL, CH, LC, DN, eHCC, and aHCC) using the GSE114564 dataset. Among these, BMS1P8, RP11-443P15.2, and ZNF192P1 exhibited progressive and statistically significant increases in expression as the disease advanced, reaching peak levels in advanced HCC. In contrast, RP11-390F10.3 showed a less pronounced and statistically non-significant increase across the stages (Fig. 2A, left panels for each pseudogene).

Fig. 2.

Diagnostic and prognostic significance of four pseudogenes in hepatocellular carcinoma (HCC) progression. (A) Relative expression levels of BMS1P8, RP11-390F10.3, RP11-443P15.2, and ZNF192P1 in NL, CH, LC, DN, eHCC, and aHCC, based on the GSE114564 dataset (the left panels for each pseudogene). Receiver operating characteristic (ROC) curves demonstrate the diagnostic performance of each pseudogene in distinguishing HCC from non-tumor liver tissue, with area under the curve (AUC) values and 95% confidence intervals (CIs) (the right panels for each pseudogene). (B) Validation of tumor-specific expression for the four pseudogenes in paired non-tumor (NT) and tumor (T) tissues from The Cancer Genome Atlas - Liver Hepatocellular Carcinoma (TCGA_LIHC) dataset (n = 421). The y-axis represents fragments per kilobase of transcript per million mapped reads (FPKM) on a log2(x + 1) scale, highlighting fold changes (FC). (C) Kaplan–Meier overall survival (OS) analyses for BMS1P8 and RP11-443P15.2 in the TCGA_LIHC dataset. High-expression groups (purple) show worse survival compared to low-expression groups (green). Hazard ratios (HRs), 95% CIs, and log-rank p values are presented. Statistically significant differences were determined using the log-rank test; *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. Data are shown as mean ± SD.

The diagnostic potential was assessed through ROC curve analysis. Among the four pseudogenes, BMS1P8 showed strong diagnostic performance with an AUC of 0.81 (95% CI: 0.73–0.89, p < 0.0001), and RP11-443P15.2 exhibited the highest AUC value of 0.84 (95% CI: 0.77–0.92, p < 0.0001). ZNF192P1 also demonstrated strong diagnostic capability with an AUC of 0.81 (95% CI: 0.73–0.89, p < 0.0001), while RP11-390F10.3 had moderate diagnostic potential (AUC = 0.66, 95% CI: 0.56–0.76, p = 0.003) (Fig. 2A, right panels for each pseudogene).

To validate tumor-specific expression, these pseudogenes were analyzed in T (n = 371) versus NT (n = 50) tissues using the TCGA_LIHC dataset. BMS1P8 and RP11-443P15.2 were significantly upregulated in tumor tissues, with fold changes (FCs) of 2.1 and 2.3, respectively (p < 0.001 for both). ZNF192P1 showed a mild but statistically significant increase (FC = 1.1, p < 0.001), whereas RP11-390F10.3 did not show significant differential expression (Fig. 2B).

Kaplan–Meier survival analysis revealed the prognostic significance of BMS1P8 and RP11-443P15.2. High expression of BMS1P8 was significantly associated with poorer OS, with a hazard ratio (HR) of 1.64 (95% CI: 1.16–2.31, p = 0.005) (Fig. 2C, left). While RP11-443P15.2 showed a trend toward worse survival, its result was not statistically significant (HR = 1.36, 95% CI: 0.96–1.91, p = 0.08) (Fig. 2C, right). These findings highlight BMS1P8 as a strong candidate for further investigation owing to its diagnostic and prognostic value in liver disease and HCC progression.

3.3 Comparative Analysis of BMS1 Pseudogenes in HCC and Identification of BMS1P8 as a Leading Diagnostic Candidate

To evaluate the diagnostic potential of BMS1-derived pseudogenes in HCC, matched T and NT tissue pairs (n = 50) were examined to identify the pseudogenes that were significantly upregulated in cancerous liver tissues. Among the 17 tested genes, eight, including BMS1P1, BMS1P2, BMS1P4, BMS1P8, BMS1P10, BMS1P16, BMS1P17, and BMS1P20, exhibited markedly higher expression levels in tumor tissues than those in their NT counterparts (Fig. 3A).

Fig. 3.

Comparative expression and diagnostic evaluation of BMS1 pseudogenes in matched HCC and non-tumor liver tissues. (A) Expression patterns of 17 BMS1 pseudogenes in paired non-tumor (NT) and tumor (T) samples (n = 50). Each connected line represents an individual patient sample. The y-axis shows fragments per kilobase of transcript per million mapped reads (FPKM) on a log2(x + 1) scale. (B) Receiver operating characteristic (ROC) curves evaluating the diagnostic performance of the eight most upregulated BMS1 pseudogenes (BMS1P1, BMS1P2, BMS1P4, BMS1P8, BMS1P10, BMS1P16, BMS1P17, and BMS1P20) in distinguishing tumor from non-tumor tissues. Area under the curve (AUC), 95% confidence interval (CI), and corresponding p values are shown for each pseudogene. Higher AUC values indicate stronger diagnostic potential. Statistical significance levels (*p < 0.05, **p < 0.01, ***p < 0.001) are indicated where applicable.

ROC curve analyses were then performed to assess the diagnostic performance of each pseudogene. While several BMS1 pseudogenes demonstrated moderate diagnostic capabilities (AUC = 0.58–0.74), both BMS1P8 and BMS1P20 showed particularly high AUC values of 0.80 (95% CI: 0.71–0.89) and 0.84 (95% CI: 0.76–0.92), respectively, indicating strong potential for HCC detection (Fig. 3B).

Although BMS1P20 appeared to be a strong candidate based on the preliminary findings, further validation using a dataset encompassing multistage liver disease and HCC progression (GSE114564) revealed limited diagnostic relevance. BMS1P20 did not exhibit a statistically significant differential expression between non-cancerous and cancerous tissues, and its AUC value for distinguishing these groups was relatively low (AUC = 0.60, p = 0.06) (Supplementary Fig. 1). In contrast, BMS1P8 consistently demonstrated robust diagnostic performance across both datasets and maintained significant differences in expression between the T and NT samples (Fig. 2A). Collectively, these findings highlight BMS1P8 as the most promising BMS1 pseudogene marker for HCC diagnosis, underscoring its potential utility for early detection and guiding future investigations into the clinical implications of pseudogene-based biomarkers.

3.4 BMS1P8 is Overexpressed in HCC and Exhibits Liver-Specific Diagnostic Potential

To investigate the potential cancer-specific expression of BMS1P8, we initially performed a broad pan-cancer survey using TCGA database, spanning 25 different tumor types. BMS1P8 expression was largely undetectable or remained at very low levels in most cancer types. In contrast, LIHC samples exhibited a pronounced increase in BMS1P8 expression relative to NT tissues (Fig. 4A). This discrepancy underscores the possibility that BMS1P8 plays a role in the oncogenic processes of the liver rather than in a broad spectrum of malignancies.

Fig. 4.

BMS1P8 exhibits liver-specific overexpression and robust diagnostic potential in an independent HCC cohort. (A) Pan-cancer analysis of BMS1P8 expression across 25 tumor types in the TCGA dataset. The y-axis indicates FPKM on a log2 scale, highlighting that BMS1P8 is predominantly overexpressed in LIHC compared with other malignancies. (B) Fold change (log2) of BMS1P8 expression in 98 paired HCC and non-tumor liver tissues from the Ajou University cohort. Bars above the x-axis indicate samples with upregulated BMS1P8, while those below represents downregulation. (C) ROC curve evaluating BMS1P8 as a diagnostic marker for distinguishing HCC from NT tissues in the Ajou University cohort. The area under the curve (AUC) is 0.81 (95% CI: 0.74–0.89, p < 0.001), underscoring its potential clinical utility. Data are shown as mean ± SD. LIHC, Liver hepatocellular carcinoma; GBM, Glioblastoma multiforme; SARC, Sarcoma; CESC, Cervical squamous cell carcinoma and endocervical adenocarcinoma; KIRC, Kidney renal clear cell carcinoma; KIRP, Kidney renal papillary cell carcinoma; ESCA, Esophageal carcinoma; LUSC, Lung squamous cell carcinoma; LUAD, Lung adenocarcinoma; THCA, Thyroid carcinoma; CHOL, Cholangiocarcinoma; BLCA, Bladder urothelial carcinoma; PAAD, Pancreatic adenocarcinoma; SKCM, Skin cutaneous melanoma; OV, Ovarian serous cystadenocarcinoma; THYM, Thymoma; PCPG, Pheochromocytoma and paraganglioma; BRCA, Breast invasive carcinoma; HNSC, Head and neck squamous cell carcinoma; PRAD, Prostate adenocarcinoma; STAD, Stomach adenocarcinoma; COAD, Colon adenocarcinoma; UCEC, Uterine corpus endometrial carcinoma; READ, Rectum adenocarcinoma; KICH, Kidney chromophobe.

To confirm these pan-cancer observations at the clinical level, paired tumor and NT liver tissues were collected from 98 patients with HCC undergoing hepatectomy. The relevant clinical information is detailed in Table 1. qRT-PCR revealed that 80 of 98 (82%) patient samples demonstrated significantly elevated BMS1P8 expression (Fig. 4B), thus reinforcing the findings from both the TCGA and GSE114564 datasets. The high proportion of overexpressing cases suggests that BMS1P8 may be functionally relevant to HCC pathogenesis or tumor progression.

In alignment with these expression data, BMS1P8 exhibited robust diagnostic performance in distinguishing HCC tissues from non-tumor tissues. Specifically, ROC curve analysis yielded an AUC of 0.81 (95% CI: 0.74–0.89, p < 0.001) (Fig. 4C), indicating a high degree of sensitivity and specificity. This diagnostic potential complements the observed overexpression patterns, indicating that BMS1P8 may serve as a practical biomarker for the early detection or clinical monitoring of HCC.

3.5 BMS1P8 May Regulate HCC Prognosis Through Interactions With Cell Cycle–Related Genes

To elucidate the functional importance of BMS1P8 in HCC, correlation analysis in the TCGA_LIHC dataset identified 1784 genes (|r| 0.2), including 1175 protein-coding genes, that were associated with BMS1P8. Pathway enrichment analysis using EnrichR and referencing the MSigDB Hallmark 2020, KEGG 2021 Human, and Reactome Pathways 2024 databases consistently highlighted the G2–M checkpoint and cell cycle pathways among the top-ranked categories (Fig. 5A–C). This overarching pattern suggests that BMS1P8 orchestrates pivotal cell-cycle processes involved in HCC progression.

Fig. 5.

BMS1P8 and its cell cycle–related 7-gene signature predict prognosis and accurately distinguish HCC tissues. Pathway enrichment analyses of BMS1P8-correlated genes (|r| 0.2) using three different databases: (A) MSigDB Hallmark 2020, (B) KEGG 2021 Human, and (C) Reactome Pathways 2024. Bar graphs depict the top enriched pathways based on combined scores and adjusted p values. Of note, G2–M checkpoint and cell cycle processes consistently emerge among the highest-ranked categories. (D) Venn diagram analysis illustrating the overlap of cell cycle–associated genes correlated with BMS1P8 across the three databases. Six genes, including CCNB2, CDC20, CDC45, ESPL1, PLK1, and PTTG1, are commonly identified. (E) Scatter plots showing significant positive correlations (r > 0.2, p < 0.001) between BMS1P8 and the six overlapping cell cycle genes in the TCGA_LIHC dataset, suggesting BMS1P8 as a regulatory node in cell cycle control. (F) Kaplan–Meier curves for overall survival (OS) and disease-free survival (DFS) in patients stratified by a 7-gene signature (comprising BMS1P8 plus the six correlated genes including CCNB2, CDC20, CDC45, ESPL1, PLK1, and PTTG1). High 7-gene signature expression (purple) is associated with markedly poorer OS and DFS. Log-rank p values and hazard ratios (HR) are presented, highlighting the prognostic relevance of BMS1P8 and its cell cycle–related partners. (G) Receiver operating characteristic (ROC) curves illustrating the diagnostic accuracy of the 7-gene signature, calculated as the mean of standardized expression (Z-scores) of BMS1P8 and six correlated cell cycle genes (CCNB2, CDC20, CDC45, ESPL1, PLK1, PTTG1). (Top) ROC curve comparing all non-tumor samples (NT, n = 50) to all tumor samples (T, n = 371) in the TCGA_LIHC cohort. (Bottom) ROC curve for 50 paired NT and T tissues from the same patients.

A Venn diagram analysis focusing on cell cycle–related genes correlated with BMS1P8 across the three databases uncovered six overlapping genes, such as CCNB2, CDC20, CDC45, ESPL1, PLK1, and PTTG1 (Fig. 5D). Subsequent validation within the TCGA_LIHC dataset confirmed significant positive correlations (r > 0.2) between BMS1P8 and each of these genes (Fig. 5E), reinforcing the notion that BMS1P8 functions as a central node in cell cycle regulation.

To assess prognostic implications, BMS1P8 was combined with the six correlated cell cycle genes to form a 7-gene signature (7 sigs), comprising BMS1P8 plus the six correlated genes including CCNB2, CDC20, CDC45, ESPL1, PLK1, and PTTG1. Kaplan–Meier analyses of OS and DFS revealed significantly worse outcomes in patients exhibiting high expression of this signature (Fig. 5F). Specifically, OS analysis yielded a log-rank p = 6.2 × 10-6 with a HR = 3, while DFS analysis produced a log-rank p = 9.2 × 10-5 and an HR = 2.3. Collectively, these results underscore the critical role of BMS1P8 in modulating cell cycle–associated pathways and highlight the potential of this 7-gene signature as a robust prognostic marker for HCC. To further evaluate the diagnostic capability of the 7-gene signature, ROC curve analyses were performed in two clinically relevant comparisons. When comparing all available non-tumor samples (NT, n = 50) with all tumor samples (T, n = 371) in the TCGA_LIHC cohort, the signature achieved an AUC of 0.92 (95% CI: 0.89–0.95, p < 0.0001). Additionally, analysis of 50 paired NT and T tissues from the same patients demonstrated an AUC of 0.98 (95% CI: 0.95–1.00, p < 0.0001), indicating strong discriminatory performance even in matched samples (Fig. 5G). When comparing the 7-gene signature with BMS1P8 alone, the 7-gene signature showed significantly improved diagnostic performance, with a higher AUC of 0.92 compared to BMS1P8’s AUC of 0.80 (Fig. 3B). Collectively, these results underscore the critical role of BMS1P8 in modulating cell cycle–associated pathways and highlight the potential of this 7-gene signature as both a robust prognostic marker and a promising diagnostic biomarker panel for HCC.

3.6 BMS1P8 Functions as a ceRNA Modulating the miR-30c-2-3p–NME6 Axis in HCC

Because pseudogenes, like lncRNAs, are well known to function as ceRNAs modulating miRNA availability and downstream gene expression, we investigated whether BMS1P8 might engage in a ceRNA regulatory network influencing HCC progression [10]. This analysis aimed to explore a potential mechanistic link by identifying miRNAs that could interact with BMS1P8 and affect expression of relevant oncogenic targets. Following the analytical workflow (Fig. 6A), miRNAs sharing complementary sequences with BMS1P8 were identified using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) and miRNA-target prediction from miRDB. Among the seven downregulated miRNAs that showed negative correlations with BMS1P8 in HCC (Supplementary Table 1, Supplementary Fig. 2), we focused on hsa-miR-30c-2-3p, which showed the most significantly reduced expression in HCC and the strongest negative correlation with BMS1P8. miRDB prediction yielded 267 target genes with a target score 70, and among them, 46 genes were found to be upregulated in tumors and positively correlated with BMS1P8 in the TCGA_LIHC cohort (Supplementary Table 2). Among these, the only gene that showed both a significant negative correlation with hsa-miR-30c-2-3p (r –0.2, p < 0.05) and a significant positive correlation with BMS1P8(r 0.3, p < 0.05) was NME6 (Fig. 6B). Expression analysis showed that hsa-miR-30c-2-3p was significantly downregulated in HCC (p < 0.0001), with a diagnostic AUC of 0.79 (95% CI: 0.75–0.84), while NME6 was markedly upregulated (p < 0.0001) with an AUC of 0.97 (95% CI: 0.95–0.99) (Fig. 6C). Furthermore, to validate these findings in an independent in-house cohort, we analyzed RNA-seq and miRNA-seq data from the KOSIN cohort. Consistent with TCGA results, hsa-miR-30c-2-3p expression was significantly reduced in HCC (p = 0.004), yielding a diagnostic AUC of 0.66 (95% CI: 0.56–0.76), while NME6 expression was significantly elevated (p = 0.01) with a diagnostic AUC of 0.64 (95% CI: 0.53–0.74) (Fig. 6D). Correlation analysis in the TCGA_LIHC dataset revealed significant inverse associations between miR-30c-2-3p and both NME6 (r = –0.26, p < 0.001) and BMS1P8 (r = –0.29, p < 0.001), along with a positive correlation between BMS1P8 and NME6 (r = 0.33, p < 0.001). These findings suggest a potential ceRNA regulatory network among the three molecules. To independently validate these associations, we performed correlation analysis using RNA-seq and miRNA-seq data from the KOSIN cohort, which confirmed similar trends: miR-30c-2-3p was inversely correlated with both NME6 (r = –0.36, p < 0.001) and BMS1P8 (r = –0.20, p = 0.03), while BMS1P8 showed a significant positive correlation with NME6 (r = 0.29, p < 0.001) (Fig. 6E). Although correlation coefficients in the KOSIN cohort were comparable to those observed in the TCGA dataset, this additional analysis provides robust, independent support for the proposed ceRNA axis involving BMS1P8, miR-30c-2-3p, and NME6 in HCC. In survival analysis, high expression of NME6 was significantly associated with worse OS (HR = 2.02, 95% CI: 1.41–2.89, p < 0.001), while low expression of miR-30c-2-3p showed a non-significant trend toward poor prognosis (HR = 1.27, p = 0.19) (Fig. 6F). Notably, patients with high expression of both BMS1P8 and NME6 showed markedly poorer OS and DFS compared to other groups (OS: HR = 2.75, 95% CI: 1.74–4.34; DFS: HR = 2.28, 95% CI: 1.53–3.39, both p < 0.001) (Fig. 6G). A mechanistic model summarizing these findings is presented in Fig. 7, where in normal hepatocytes, miR-30c-2-3p is sufficiently expressed to repress NME6 mRNA via RNA-induced silencing complex (RISC) complex formation, whereas in HCC, increased BMS1P8 sequesters miR-30c-2-3p, preventing RISC-mediated repression and thereby releasing NME6 from post-transcriptional inhibition, potentially contributing to tumor progression.

Fig. 6.

A BMS1P8–miR-30c-2-3p–NME6 competing-endogenous-RNA (ceRNA) axis in HCC. (A) Analytical workflow showing the five-step in-silico pipeline that identified miR-30c-2-3p and its target NME6 within the BMS1P8 ceRNA network. (B) Funnel diagram summarizing progressive filtering of miR-30c-2-3p targets: 267 predicted targets 46 targets upregulated in tumors and positively correlated with BMS1P8 NME6, the only transcript also inversely correlated with miR-30c-2-3p (r –0.2, p < 0.05). (C) Box plot (left) and receiver operating characteristic (ROC) curve (right) for miR-30c-2-3p and NME6 expression in TCGA cohort. (D) Box plot (left) and ROC curve (right) for miR-30c-2-3p and NME6 expression in KOSIN cohort. Box plots show log2-transformed transcripts per million (TPM) levels comparing NT and T tissues; ROC curves show sensitivity versus 1 – specificity for distinguishing T from NT samples. (E) Pair-wise Pearson correlation scatterplots in both TCGA (n = 368) and KOSIN (n = 112) cohort: miR-30c-2-3p versus BMS1P8, BMS1P8 versus NME6, and miR-30c-2-3p versus NME6; correlation coefficients (r) and two-tailed p values are indicated. (F) Kaplan–Meier overall survival (OS) curves comparing low- versus high-expression groups (median split) for miR-30c-2-3p and NME6; hazard ratio (HR) with 95 % CI and log-rank p values are shown. (G) Kaplan–Meier curves for OS (left) and disease-free survival (DFS, right) stratified by combined BMS1P8 and NME6 expression status (bothLow, eitherHigh, bothHigh). Statistical comparisons of expression used unpaired Student’s t-tests; survival differences used log-rank tests. *p < 0.05, **p < 0.01, ***p < 0.001. Data are presented as mean ± SD unless otherwise stated.

Fig. 7.

Proposed mechanistic model of the BMS1P8–miR-30c-2-3p–NME6 ceRNA axis in HCC. In normal hepatocytes (left), miR-30c-2-3p binds to NME6 mRNA, recruiting the RNA-induced silencing complex (RISC) complex and leading to translational repression and mRNA degradation of NME6. In HCC cells (right), elevated BMS1P8 sequesters miR-30c-2-3p, preventing RISC formation on NME6 mRNA, thereby relieving translational repression and resulting in increased NME6 expression. The figure was created using BioRender.com (Agreement number: MG28JOSDOF). The original figure source is available at https://BioRender.com/5bss1pr.

4. Discussion

Pseudogenes are traditionally considered junk DNA or nonfunctional remnants arising from gene duplication or retrotransposition events [20]. Their sequences typically harbor premature stop codons or frame shifts, preventing the production of functional proteins [21]. However, with the advent of high-throughput technologies, including next-generation RNA sequencing, single-cell transcriptomics, and clustered regularly interspaced short palindromic repeat-based functional screens, our understanding of these genetic elements has evolved substantially [22, 23]. This paradigm shift has been supported by emerging research data: many pseudogenes are transcriptionally active and can exert regulatory functions analogous to non-coding RNAs, influencing fundamental cellular processes including proliferation, apoptosis, and metastasis [8]. For instance, the pseudogene-expressed lncRNA small ubiquitin-like modifier 1 pseudogene 3 (SUMO1P3) has been identified as upregulated in gastric cancer tissues compared with adjacent nontumorous tissues, and its expression associated with tumor size, differentiation, lymphatic metastasis, and invasion [24]. In parallel, recent work has delineated additional pseudogene-driven circuits that shape the key hallmarks of HCC. Double homeobox A pseudogene 8 (DUXAP8) sponges miR-490-5p, releasing budding uninhibited by benzimidazoles 1 (BUB1) and intensifying phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta (PI3K)/AKT serine/threonine kinase 1 (AKT)-driven proliferation, while misato family member 2, pseudogene (MSTO2P) simultaneously boosts E-cadherin and activates the PI3K/AKT/mechanistic target of rapamycin kinase (mTOR) axis to sustain tumor growth [25, 26]. Also, methyltransferase 3, N6-adenosine-methyltransferase complex catalytic subunit (METTL3)-mediated N6-methyladenosine modification stabilizes glucosylceramidase beta 1 like, pseudogene (GBAP1); the resultant GBAP1 overexpression sequesters miR-22-3p, up-regulates bone morphogenetic protein receptor type 1A (BMPR1A), and activates both BMP/Smad family member and PI3K/AKT cascades, promoting hepatocarcinogenesis [27, 28]. Small nuclear ribonucleoprotein polypeptide F pseudogene 1 (SNRPFP1) is markedly up-regulated in HCC, associates with poor prognosis, and promotes proliferation, motility, and apoptosis resistance by sponging the tumor-suppressive miR-126-5p [29], whereas oncogenic ubiquitin conjugating enzyme E2 M pseudogene 1 (UBE2MP1) promotes proliferation and apoptosis resistance by sponging miR-145-5p to de-repress regulator of G protein signaling 3 (RGS3) [30]. Collectively, these mechanistic insights—highlighting the emerging theme that pseudogene exert their influence primarily through miRNA-sponge activity—have driven the development of multi-pseudogene prognostic signatures that outperform conventional clinicopathologic variables, thereby reinforcing the regulatory and clinical relevance of pseudogenes in HCC and broader cancer biology.

In this study, we identified BMS1P8 as a liver cancer–specific pseudogene with diagnostic and prognostic implications. By conducting a comprehensive analysis of RNA-seq datasets and validating the findings in clinical samples, we observed that BMS1P8 expression was minimal in most other tumor types but prominently upregulated in HCC. Notably, BMS1P8 remained barely detectable in CH and LC but rose modestly in DN and surged in both early- and advanced-stage HCC (Fig. 2A), indicating that its induction is tumor-specific rather than a generic response to chronic liver injury. The robust diagnostic performance of BMS1P8 underscores its potential as a biomarker for early detection or patient stratification. Moreover, our correlation and pathway enrichment analysis implicated BMS1P8 in cell cycle regulation, a critical pathway often dysregulated in HCC progression. These insights were supported by the positive associations between BMS1P8 expression and multiple cell cycle–related genes as well as the adverse prognostic outcomes linked to a BMS1P8-based 7-gene signature including CCNB2, CDC20, CDC45, ESPL1, PLK1, and PTTG1. Although the threshold of correlation coefficient (r |0.2|) applied in this study may appear relatively low, it reflects a practical consideration when analyzing pseudogenes or non-coding RNAs with inherently low basal expression levels and high variability in large-scale datasets like TCGA. Prior study of ceRNA networks has also demonstrated that modest correlations (r = 0.2–0.3) can reveal biologically meaningful interactions, especially when these findings are further supported by independent pathway enrichment results [31]. This integrated approach provides additional confidence in the biological relevance of BMS1P8–cell cycle associations despite the moderate r values Additionally, our ceRNA analysis suggested that BMS1P8 sponge the tumor-suppressive miR-30c-2-3p, thereby derepressing NME6. This interaction could disrupt the normal regulatory network of miR-30c-2-3p, thereby promoting the expression of NME6, which is involved in tumor progression. Also, this axis was tightly linked to poorer OS and DFS in TCGA, suggesting a dual contribution of BMS1P8 to HCC progression through both cell-cycle promotion and post-transcriptional regulation of oncogenic transcripts.

NME/NM23 nucleoside diphosphate kinase 6 (NME6) is a member of the NME gene family, which plays important roles in cellular processes such as nucleoside diphosphate kinase activity, maintenance of nucleotide pools, and regulation of cell proliferation and differentiation [32, 33]. While several NME family members, such as NME1 and NME2, have been extensively studied as metastasis suppressors in various cancers, the specific biological functions of NME6 remain less well characterized [34]. Nonetheless, emerging studies have reported that NME6 expression is elevated in multiple malignancies, including breast, colorectal, and lung cancer, where it has been associated with increased tumor cell proliferation, enhanced metastatic potential, and poorer clinical outcomes [35, 36, 37]. Although direct experimental research on NME6 in HCC is limited, data mining analyses have indicated that NME6 expression is elevated in HCC tissues and inversely correlated with patient survival, suggesting a potential oncogenic role for NME6 in liver cancer progression [38]. In our analysis, we observed significant upregulation of NME6 in HCC tumor tissues compared to non-tumor samples, accompanied by a strong association with poor prognosis. Furthermore, our ceRNA network analysis indicated that BMS1P8 may sequester miR-30c-2-3p, leading to derepression of NME6, thereby implicating this axis as a potential driver of HCC progression through dysregulation of cell cycle–related and oncogenic pathways.

One of the most notable findings is the liver specificity of BMS1P8, which helps distinguish it from other pseudogenes that may be broadly upregulated across multiple malignancies. This tissue specificity may allow BMS1P8 to serve as a more targeted biomarker for HCC, potentially reducing false-positive results that can occur with conventional markers such as AFP [6]. Furthermore, the strong prognostic value observed for BMS1P8 supports its clinical utility not only in diagnosing HCC but also in risk stratification and treatment decision-making. Despite these promising results, our study has certain limitations. Although the correlation and enrichment data suggest a functional role for BMS1P8 in cell cycle regulation, we did not perform in vitro or in vivo functional assays to validate the mechanistic underpinnings of how BMS1P8 might drive tumorigenesis.

In conclusion, our findings demonstrate the diagnostic and prognostic importance of BMS1P8 in HCC and highlight its potential tissue specificity. Moreover, our identification of links between BMS1P8 expression and cell cycle-related pathways provides a foundation for understanding its role in liver cancer progression. Although this study derived key insights primarily from tissue-based RNA analysis, we recognize that tissue sampling is invasive and may limit clinical applicability. However, with the recent advancements in liquid biopsy technologies, including extracellular vesicle (EV)-based RNA analysis, there is potential for BMS1P8 to be detected in serum or plasma-derived EVs. This raises the possibility of applying BMS1P8 as a non-invasive biomarker in clinical practice. Therefore, further studies evaluating the detectability of BMS1P8 in patient blood samples and its correlation with tissue expression will be essential. Future mechanistic studies of this pseudogene could lead to improved early diagnosis and therapeutic strategies for HCC patients.

5. Conclusion

This study identified BMS1P8 as a liver-specific pseudogene biomarker with a strong diagnostic and prognostic value in HCC. Its distinct upregulation in liver cancer compared with other malignancies underscores its potential clinical utility for early detection and patient stratification. Furthermore, the correlation between BMS1P8 and cell cycle–related pathways highlight its relevance to disease progression. Our results also revealed a potential BMS1P8/miR-30c-2-3p/NME6 ceRNA circuit that may amplify oncogenic signaling in HCC. Elucidating and targeting this newly defined axis could broaden the therapeutic possibilities for BMS1P8. These findings not only enhance our understanding of the molecular landscape of HCC but also provide a foundation for future translational research aimed at integrating BMS1P8 into diagnostic workflows and exploring its potential as a target for therapeutic intervention.

Abbreviations

AFP, alpha-fetoprotein; ALT, alanine aminotransferase; AST, aspartate aminotransferase; AUC, area under curve; aHCC, advanced hepatocellular carcinoma; BMI, body mass index; cDNA, complementary deoxyribo nucleic acid; CH, chronic hepatitis; CI, confidence interval; DFS, disease-free survival; DN, dysplastic nodule; eHCC, early staged hepatocellular carcinoma; FC, fold changes; GO, gene ontology; HBV, hepatitis B virus; HCC, hepatocellular carcinoma; HCV, hepatitis C virus; HMBS, hydroxymethylbilane synthase; HR, hazard ratio; INR, international normalized ratio; LC, liver cirrhosis; LIHC, liver hepatocellular carcinoma; lncRNA, long non-coding RNA; MSigDB, Molecular Signatures Database; NASH, non-alcoholic steatohepatitis; NL, normal liver; NT, non-tumor; OS, overall survival; PIVKA-II, prothrombin-induced by vitamin K absence or antagonist-II; qRT-PCR, quantitative real-time polymerase chain reaction; RNA-seq, RNA-sequencing; ROC, receiver operating characteristics; SD, standard deviation; T, tumor tissue; TCGA, The Cancer Genome Atlas.

Availability of Data and Materials

Data supporting the findings of this study are presented in the paper and supplementary files. All other data are available from the corresponding authors upon reasonable request.

Author Contributions

Conceptualization: HSK, JWE; data curation: HSK, JYChoi, GOB, SSK, JYCheong; methodology: MGY, JYJ; investigation: SHJ, JEH, JWE; resources: SSK, JYCheong; visualization: HSK, JYChoi; funding acquisition: HSK, JYCheong, JYJ, JWE; project administration: JYJ, JWE; supervision: JYJ, JWE; writing–original draft: HSK, JWE; writing – review & editing: JYCheong, JWE. All authors contributed to editorial changes in the manuscript. All authors read and approved the final manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.

Ethics Approval and Consent to Participate

The study was carried out in accordance with the guidelines of the Declaration of Helsinki. This study was approved by the Institutional Review Board of the Ajou University Hospital (Suwon, South Korea) (approval number: AJOUIRB-EX-2024-332 and AJOUIRB-EX-2024-389). The requirement for informed consent was waived.

Acknowledgment

The biospecimens and data used in this study were provided by the Biobank of Ajou University Hospital, a member of the Korea Biobank Network. In addition, we thank all the members of MOAGEN (Daejeon, South Korea) for providing bioinformatics analysis and guidance. Fig. 7 was created using BioRender.com (Agreement number: MG28JOSDOF). The original figure source is available at https://BioRender.com/5bss1pr.

Funding

This work was supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (grant number HR21C1003); by the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT (MSIT), Republic of Korea (grant numbers NRF-2022R1A2C1092155, RS-2022-NR070489, RS-2023-00210847, RS-2025-00562556 and RS-2025-00521818).

Conflict of Interest

The authors declare no conflict of interest.

Declaration of AI and AI-Assisted Technologies in the Writing Process

ChatGPT (version 1o pro) was used for error checking and language refinement during the manuscript preparation. After using this tool, the authors reviewed and edited the content as needed and took full responsibility for the content of the publication.

Supplementary Material

Supplementary material associated with this article can be found, in the online version, at https://doi.org/10.31083/FBL41684.

References

Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.