Obtaining transferable chemical insight from solving machine-learning classification problems: Thermodynamical properties prediction, atomic composition as good as Coulomb matrix
Leon Alday-Toledo, Roberto Bernal-Jaquez, Saul, Zapotecas-Martinez, Jose L. Mendoza-Cortes

TL;DR
This paper introduces a classification-based methodology to extract transferable chemical insights and evaluate molecular representations for predicting thermodynamical properties, demonstrating that atomic composition can rival Coulomb matrix descriptors.
Contribution
The authors propose a simple, transferable classification approach to explore molecular problems, test descriptors, and gain physicochemical insights, with applications to the QM9 database.
Findings
Atomic composition descriptor achieves >90% classification accuracy, comparable to Coulomb matrix.
The methodology aids in estimating prediction difficulty and refining molecular representations.
Physicochemical insights can be derived from classification experiments.
Abstract
Machine learning (ML) can be used to construct surrogate models for the fast prediction of a property of interest. ML can thus be applied to chemical projects, where the usual experimentation or calculation techniques can take hours or days for just one sample. In this manner, the most promising candidate samples could be extracted from an extensive database and subjected to further in-depth analysis. Despite their broad applicability, it can be challenging to apply ML methods to a given chemical problem since a multitude of design decisions must be made, such as the molecular descriptor to use or the optimizer to train the model. Here we present a methodology for the meaningful exploration of a given molecular problem through classification experiments. This conceptually simple methodology results in transferable insight on the selected problem and can be used as a platform from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Various Chemistry Research Topics
