Investigating Data Hierarchies in Multifidelity Machine Learning for   Excitation Energies

Vivin Vinod; Peter Zaspel

arXiv:2410.11392·physics.chem-ph·March 26, 2025

Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation Energies

Vivin Vinod, Peter Zaspel

PDF

Open Access 1 Repo

TL;DR

This paper explores how modifying the scaling factor in multifidelity machine learning affects the prediction of excitation energies, introducing new metrics and concepts to optimize data efficiency and model accuracy.

Contribution

It introduces QC compute time informed scaling factors and a novel error metric, enhancing the understanding of data hierarchies in multifidelity ML for quantum chemistry.

Findings

01

High accuracy achieved with only 2 target fidelity samples

02

Larger lower fidelity datasets improve model performance

03

The $ ext{Gamma}$-curve demonstrates efficiency gains in training data use

Abstract

Recent progress in machine learning (ML) has made high-accuracy quantum chemistry (QC) calculations more accessible. Of particular interest are multifidelity machine learning (MFML) methods where training data from differing accuracies or fidelities are used. These methods usually employ a fixed scaling factor, $γ$ , to relate the number of training samples across different fidelities, which reflects the cost and assumed sparsity of the data. This study investigates the impact of modifying $γ$ on model efficiency and accuracy for the prediction of vertical excitation energies using the QeMFi benchmark dataset. Further, this work introduces QC compute time informed scaling factors, denoted as $θ$ , that vary based on QC compute times at different fidelities. A novel error metric, error contours of MFML, is proposed to provide a comprehensive view of model error contributions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SM4DA/MFML_DataHierarchy
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science