IMR Press / FBL / Volume 27 / Issue 1 / DOI: 10.31083/j.fbl2701037
Open Access Original Research
Annotating whole genome variants and constructing a multi-classifier based on samples of ADNI
Show Less
1 School of Software, East China Jiaotong University, 330013 Nanchang, Jiangxi, China
*Correspondence: lixiong@ecjtu.edu.cn; lx_hncs@163.com (Xiong Li)
Academic Editor: Leyi Wei
Front. Biosci. (Landmark Ed) 2022, 27(1), 37; https://doi.org/10.31083/j.fbl2701037
Submitted: 11 June 2021 | Revised: 29 November 2021 | Accepted: 21 December 2021 | Published: 19 January 2022
Copyright: © 2022 The Author(s). Published by IMR Press.
This is an open access article under the CC BY 4.0 license.
Abstract

Introduction: Alzheimer’s disease (AD) is the most common progressive neurodegenerative disorder in the elderly, which will eventually lead to dementia without an effective precaution and treatment. As a typical complex disease, the mechanism of AD’s occurrence and development still lacks sufficient understanding. Research design and methods: In this study, we aim to directly analyze the relationship between DNA variants and phenotypes based on the whole genome sequencing data. Firstly, to enhance the biological meanings of our study, we annotate the deleterious variants and mapped them to nearest protein coding genes. Then, to eliminate the redundant features and reduce the burden of downstream analysis, a multi-objective evaluation strategy based on entropy theory is applied for ranking all candidate genes. Finally, we use multi-classifier XGBoost for classifying unbalanced data composed with 46 AD samples, 483 mild cognitive impairment (MCI) samples and 279 cognitive normal (CN) samples. Results: The experimental results on real whole genome sequencing data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) show that our method not only has satisfactory classification performance but also finds significance correlation between AD and RIN3, a known susceptibility gene of AD. In addition, pathway enrichment analysis was carried out using the top 20 feature genes, and three pathways were confirmed to be significantly related to the formation of AD. Conclusions: From the experimental results, we demonstrated that the efficacy of our proposed method has practical significance.

Keywords
Unbalanced data
Multi-class classification
Multi-objective optimization
Funding
20192ACB21004/Jiangxi Provincial natural science fund
20204BCJL23035/Jiangxi Provincial natural science fund
20212ABC03A32/Jiangxi Provincial natural science fund
20202BAB212004/Jiangxi Provincial natural science fund
20212BAB202007/Jiangxi Provincial natural science fund
20YJAZH142/MOE (Ministry of Education in China) Project of Humanities and Social Sciences
GJJ190356/Technological Research Project of Education Department in Jiangxi Province
GJJ210624/Technological Research Project of Education Department in Jiangxi Province
Figures
Fig. 1.
Share
Back to top