Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation Energies
Vivin Vinod, Peter Zaspel

TL;DR
This paper explores how modifying the scaling factor in multifidelity machine learning affects the prediction of excitation energies, introducing new metrics and concepts to optimize data efficiency and model accuracy.
Contribution
It introduces QC compute time informed scaling factors and a novel error metric, enhancing the understanding of data hierarchies in multifidelity ML for quantum chemistry.
Findings
High accuracy achieved with only 2 target fidelity samples
Larger lower fidelity datasets improve model performance
The $ ext{Gamma}$-curve demonstrates efficiency gains in training data use
Abstract
Recent progress in machine learning (ML) has made high-accuracy quantum chemistry (QC) calculations more accessible. Of particular interest are multifidelity machine learning (MFML) methods where training data from differing accuracies or fidelities are used. These methods usually employ a fixed scaling factor, , to relate the number of training samples across different fidelities, which reflects the cost and assumed sparsity of the data. This study investigates the impact of modifying on model efficiency and accuracy for the prediction of vertical excitation energies using the QeMFi benchmark dataset. Further, this work introduces QC compute time informed scaling factors, denoted as , that vary based on QC compute times at different fidelities. A novel error metric, error contours of MFML, is proposed to provide a comprehensive view of model error contributions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science
