IMR Press / FBL / Volume 13 / Issue 2 / DOI: 10.2741/2711

Frontiers in Bioscience-Landmark (FBL) is published by IMR Press from Volume 26 Issue 5 (2021). Previous articles were published by another publisher on a subscription basis, and they are hosted by IMR Press on imrpress.com as a courtesy and upon agreement with Frontiers in Bioscience.

Article
Emergent unsupervised clustering paradigms with potential application to bioinformatics
Show Less
1 Dept of Electrical Engineering, Pennsylvania State University, University Park, PA 16802
2 Department of ECE, Virginia Polytechnic Institute and State University, Arlington, VA 22203
3 Departments of EE and CSE, Pennsylvania State University, University Park, PA 16802
Front. Biosci. (Landmark Ed) 2008, 13(2), 677–690; https://doi.org/10.2741/2711
Published: 1 January 2008
Abstract

In recent years, there has been a great upsurge in the application of data clustering, statistical classification, and related machine learning techniques to the field of molecular biology, in particular analysis of DNA microarray expression data. Clustering methods can be used to group co-expressed genes, shedding light on gene function and co-regulation. Alternatively, they can group samples or conditions to identify phenotypical groups, disease subgroups, or to help identify disease pathways. A rich variety of unsupervised techniques have been applied, including partitional, hierarchical, graph-based, model-based, and biclustering methods. While a number of machine learning problems and tools have found mainstream applications in bioinformatics, in this article we identify some challenging problems which, though clearly relevant to bioinformatics, have not been extensively investigated in this domain. These include i) unsupervised clustering with unsupervised feature selection, ii) semisupervised learning, iii) unsupervised learning (and supervised learning) in the presence of confounding variables, and iv) stability of clustering solutions. We review recent methods which address these problems and take the position that these methods are well-suited to addressing some common scenarios that occur in bioinformatics.

Share
Back to top