IMR Press / FBL / Volume 27 / Issue 6 / DOI: 10.31083/j.fbl2706177
Open Access Original Research
Sequence-Based Prediction with Feature Representation Learning and Biological Function Analysis of Channel Proteins
Show Less
1 Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 610054 Chengdu, Sichuan, China
2 School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, 518055 Shenzhen, Guangdong, China
3 Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 324022 Quzhou, Zhejiang, China
4 Genetics Department, Faculty of Agriculture, Beni-Suef University, 62511 Beni-Suef, Egypt
5 School of Electronic and Communication Engineering, Shenzhen Polytechnic, 518055 Shenzhen, Guangdong, China
6 Department of Dentistry, Beidahuang Industry Group General Hospital, 150088 Harbin, Heilongjiang, China
*Correspondence: sunmingai6@126.com (Mingai Sun); c7zlj@szpt.edu.cn (Lijun Zhang)
These authors contributed equally.
Academic Editor: Graham Pawelec
Front. Biosci. (Landmark Ed) 2022, 27(6), 177; https://doi.org/10.31083/j.fbl2706177
Submitted: 30 December 2021 | Revised: 7 April 2022 | Accepted: 19 April 2022 | Published: 2 June 2022
(This article belongs to the Special Issue Computational biomarker detection and analysis)
Copyright: © 2022 The Author(s). Published by IMR Press.
This is an open access article under the CC BY 4.0 license.
Abstract

Background: Channel proteins are proteins that can transport molecules past the plasma membrane through free diffusion movement. Due to the cost of labor and experimental methods, developing a tool to identify channel proteins is necessary for biological research on channel proteins. Methods: 17 feature coding methods and four machine learning classifiers to generate 68-dimensional data probability features. Then, the two-step feature selection strategy was used to optimize the features, and the final prediction Model M16-LGBM (light gradient boosting machine) was obtained on the 16-dimensional optimal feature vector. Results: A new predictor, CAPs-LGBM, was proposed to identify the channel proteins effectively. Conclusions: CAPs-LGBM is the first channel protein machine learning predictor was used to construct the final prediction model based on protein primary sequences. The classifier performed well in the training and test sets.

Keywords
channel protein
computational prediction
light gradient boosting machine
PPI network
feature selection
Figures
Fig. 1.
Share
Back to top