Random projections and Kernelised Leave One Cluster Out Cross-Validation: Universal baselines and evaluation tools for supervised machine learning for materials properties
Samantha Durdy, Michael Gaultois, Vladimir Gusev, Danushka Bollegala, and Matthew J. Rosseinsky

TL;DR
This paper evaluates how kernelized leave-one-cluster-out cross-validation improves the assessment of machine learning models in materials science, providing a universal framework that enhances the evaluation process across various representations and algorithms.
Contribution
It introduces a kernelized LOCO-CV framework that improves the evaluation of machine learning models for materials properties, regardless of the representation or algorithm used.
Findings
Radial basis function improves data separability in all tested datasets.
Kernel approximation functions enhance LOCO-CV performance.
Domain knowledge does not significantly improve prediction accuracy in most cases.
Abstract
With machine learning being a popular topic in current computational materials science literature, creating representations for compounds has become common place. These representations are rarely compared, as evaluating their performance - and the performance of the algorithms that they are used with - is non-trivial. With many materials datasets containing bias and skew caused by the research process, leave one cluster out cross validation (LOCO-CV) has been introduced as a way of measuring the performance of an algorithm in predicting previously unseen groups of materials. This raises the question of the impact, and control, of the range of cluster sizes on the LOCO-CV measurement outcomes. We present a thorough comparison between composition-based representations, and investigate how kernel approximation functions can be used to better separate data to enhance LOCO-CV applications.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · X-ray Diffraction in Crystallography
