TL;DR
This paper introduces a framework to compare and analyze different feature spaces used in atomistic machine learning, focusing on how transformations and choices affect information retention and structure.
Contribution
It provides diagnostic tools to evaluate the equivalence and distortion of feature spaces, and assesses the impact of various descriptors, kernels, and metrics in atomistic learning.
Findings
Low-order features lead to information loss.
Non-linear kernels and Wasserstein metrics significantly alter feature space structure.
Basis function choices and hyperparameters impact descriptor effectiveness.
Abstract
Eficient, physically-inspired descriptors of the structure and composition of molecules and materials play a key role in the application of machine-learning techniques to atomistic simulations. The proliferation of approaches, as well as the fact that each choice of features can lead to very different behavior depending on how they are used, e.g. by introducing non-linear kernels and non-Euclidean metrics to manipulate them, makes it difficult to objectively compare different methods, and to address fundamental questions on how one feature space is related to another. In this work we introduce a framework to compare different sets of descriptors, and different ways of transforming them by means of metrics and kernels, in terms of the structure of the feature space that they induce. We define diagnostic tools to determine whether alternative feature spaces contain equivalent amounts of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
