Improved decision making with similarity based machine learning: Applications in chemistry
Dominik Lemm, Guido Falk von Rudorff, O. Anatole von Lilienfeld

TL;DR
This paper introduces similarity-based machine learning (SML) for chemistry, enabling effective decision making in data-scarce scenarios by selecting relevant training data dynamically, thus reducing data requirements.
Contribution
The paper presents a novel SML approach that adapts training data selection on-the-fly for specific queries, improving performance in chemical applications with limited data.
Findings
SML achieves competitive performance with less data
Application to quantum chemistry and synthesis planning demonstrates effectiveness
Derived relationship between feature space properties and model accuracy
Abstract
Despite the fundamental progress in autonomous molecular and materials discovery, data scarcity throughout chemical compound space still severely hampers the use of modern ready-made machine learning models as they rely heavily on the paradigm, 'the bigger the data the better'. Presenting similarity based machine learning (SML), we show an approach to select data and train a model on-the-fly for specific queries, enabling decision making in data scarce scenarios in chemistry. By solely relying on query and training data proximity to choose training points, only a fraction of data is necessary to converge to competitive performance. After introducing SML for the harmonic oscillator and the Rosenbrock function, we describe applications to scarce data scenarios in chemistry which include quantum mechanics based molecular design and organic synthesis planning. Finally, we derive a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Analytical Chemistry and Chromatography
