Genetic optimization of training sets for improved machine learning models of molecular properties
Nicholas J. Browning, Raghunathan Ramakrishnan, O. Anatole von, Lilienfeld, Ursula R\"othlisberger

TL;DR
This paper demonstrates that genetic algorithms can optimize training set composition for molecular property prediction models, significantly improving their accuracy over randomly selected training sets.
Contribution
It introduces a genetic algorithm-based method for selecting training molecules, leading to more accurate machine learning models of molecular properties.
Findings
Reduced mean absolute errors by ~25% for some properties
Optimized training sets outperform random sampling
Class-based training sets enable transferability to similar molecules
Abstract
The training of molecular models of quantum mechanical properties based on statistical machine learning requires large datasets which exemplify the map from chemical structure to molecular property. Intelligent a priori selection of training examples is often difficult or impossible to achieve as prior knowledge may be sparse or unavailable. Ordinarily representative selection of training molecules from such datasets is achieved through random sampling. We use genetic algorithms for the optimization of training set composition consisting of tens of thousands of small organic molecules. The resulting machine learning models are considerably more accurate with respect to small randomly selected training sets: mean absolute errors for out-of-sample predictions are reduced to ~25% for enthalpies, free energies, and zero-point vibrational energy, to ~50% for heat-capacity, electron-spread,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Various Chemistry Research Topics
