IMR Press / FBL / Volume 27 / Issue 6 / DOI: 10.31083/j.fbl2706188
Open Access Original Research
An Inverse QSAR Method Based on Linear Regression and Integer Programming
Show Less
1 Department of Applied Mathematics and Physics, Kyoto University, 606-8501 Kyoto, Japan
2 Graduate School of Advanced Integrated Studies in Human Survavibility (Shishu-Kan), Kyoto University, 606-8306 Kyoto, Japan
3 Bioinformatics Center, Institute for Chemical Research, Kyoto University, 611-0011 Uji, Japan
*Correspondence: (Jianshen Zhu)
These authors contributed equally.
Academic Editors: Agnieszka Kaczor and Graham Pawelec
Front. Biosci. (Landmark Ed) 2022, 27(6), 188;
Submitted: 16 February 2022 | Revised: 28 March 2022 | Accepted: 7 April 2022 | Published: 10 June 2022
Copyright: © 2022 The Author(s). Published by IMR Press.
This is an open access article under the CC BY 4.0 license.

Background: Drug design is one of the important applications of biological science. Extensive studies have been done on computer-aided drug design based on inverse quantitative structure activity relationship (inverse QSAR), which is to infer chemical compounds from given chemical activities and constraints. However, exact or optimal solutions are not guaranteed in most of the existing methods. Method: Recently a novel framework based on artificial neural networks (ANNs) and mixed integer linear programming (MILP) has been proposed for designing chemical structures. This framework consists of two phases: an ANN is used to construct a prediction function, and then an MILP formulated on the trained ANN and a graph search algorithm are used to infer desired chemical structures. In this paper, we use linear regression instead of ANNs to construct a prediction function. For this, we derive a novel MILP formulation that simulates the computation process of a prediction function by linear regression. Results: For the first phase, we performed computational experiments using 18 chemical properties, and the proposed method achieved good prediction accuracy for a relatively large number of properties, in comparison with ANNs in our previous work. For the second phase, we performed computational experiments on five chemical properties, and the method could infer chemical structures with around up to 50 non-hydrogen atoms. Conclusions: Combination of linear regression and integer programming is a potentially useful approach to computational molecular design.

machine learning
linear regression
integer programming
materials informatics
molecular design
Fig. 1.
Back to top