IMR Press / FBL / Volume 28 / Issue 12 / DOI: 10.31083/j.fbl2812346
Open Access Original Research
im5C-DSCGA: A Proposed Hybrid Framework Based on Improved DenseNet and Attention Mechanisms for Identifying 5-methylcytosine Sites in Human RNA
Show Less
1 School of Information Engineering, Jingdezhen Ceramic University, 333403 Jingdezhen, Jiangxi, China
*Correspondence: jjh163yx@163.com (Jianhua Jia); lulu9825@163.com (Lulu Qin)
Front. Biosci. (Landmark Ed) 2023, 28(12), 346; https://doi.org/10.31083/j.fbl2812346
Submitted: 26 May 2023 | Revised: 14 August 2023 | Accepted: 28 August 2023 | Published: 26 December 2023
Copyright: © 2023 The Author(s). Published by IMR Press.
This is an open access article under the CC BY 4.0 license.
Abstract

Background: 5-methylcytosine (m5C) is a key post-transcriptional modification that plays a critical role in RNA metabolism. Owing to the large increase in identified m5C modification sites in organisms, their epigenetic roles are becoming increasingly unknown. Therefore, it is crucial to precisely identify m5C modification sites to gain more insight into cellular processes and other mechanisms related to biological functions. Although researchers have proposed some traditional computational methods and machine learning algorithms, some limitations still remain. In this study, we propose a more powerful and reliable deep-learning model, im5C-DSCGA, to identify novel RNA m5C modification sites in humans. Methods: Our proposed im5C-DSCGA model uses three feature encoding methods initially—one-hot, nucleotide chemical property (NCP), and nucleotide density (ND)—to extract the original features in RNA sequences and ensure splicing; next, the original features are fed into the improved densely connected convolutional network (DenseNet) and Convolutional Block Attention Module (CBAM) mechanisms to extract the advanced local features; then, the bidirectional gated recurrent unit (BGRU) method is used to capture the long-term dependencies from advanced local features and extract global features using Self-Attention; Finally, ensemble learning is used and full connectivity is used to classify and predict the m5C site. Results: Unsurprisingly, the deep-learning-based im5C-DSCGA model performed well in terms of sensitivity (Sn), specificity (SP), accuracy (Acc), Matthew’s correlation coefficient (MCC), and area under the curve (AUC), generating values of 81.0%, 90.8%, 85.9%, 72.1%, and 92.6%, respectively, in the independent test dataset following the use of three feature encoding methods. Conclusions: We critically evaluated the performance of im5C-DSCGA using five-fold cross-validation and independent testing and compared it to existing methods. The MCC metric reached 72.1% when using the independent test, which is 3.0% higher than the current state-of-the-art prediction method Deepm5C model. The results show that the im5C-DSCGA model achieves more accurate and stable performances and is an effective tool for predicting m5C modification sites. To the authors’ knowledge, this is the first time that the improved DenseNet, BGRU, CBAM Attention mechanism, and Self-Attention mechanism have been combined to predict novel m5C sites in human RNA.

Keywords
RNA
5-methylcytosine site identification
DenseNet
BGRU
improved CBAM attention
self-attention
deep learning
ensemble learning
Funding
61761023/National Natural Science Foundation of China
62162032/National Natural Science Foundation of China
31760315/National Natural Science Foundation of China
20202BABL202004/Natural Science Foundation of Jiangxi Province
20202BAB202007/Natural Science Foundation of Jiangxi Province
GJJ190695/Scientific Research Plan of the Department of Education of Jiangxi Province
Figures
Fig. 1.
Share
Back to top