Diagnostics of Data-Driven Models: Uncertainty Quantification of PM7 Semi-Empirical Quantum Chemical Method
James Oreluk, Zhenyuan Liu, Arun Hegde, Wenyu Li, Andrew Packard,, Michael Frenklach, Dmitry Zubarev

TL;DR
This paper evaluates the uncertainty quantification of the semi-empirical quantum chemical method PM7, revealing its limitations and conditions under which it maintains chemical accuracy in predictions.
Contribution
It applies an uncertainty quantification framework to assess PM7's consistency and uncertainty propagation, highlighting its limitations across the entire dataset.
Findings
PM7 is inconsistent with the full dataset of experimental heats of formation.
PM7 is consistent for certain subsets of the training data.
Uncertainty propagation maintains chemical accuracy for molecules of similar size.
Abstract
We report an evaluation of a semi-empirical quantum chemical method PM7 from the perspective of uncertainty quantification. Specifically, we apply Bound-to-Bound Data Collaboration, an uncertainty quantification framework, to characterize a) variability of PM7 model parameter values consistent with the uncertainty in the training data, and b) uncertainty propagation from the training data to the model predictions. Experimental heats of formation of a homologous series of linear alkanes are used as the property of interest. The training data are chemically accurate, i.e., they have very low uncertainty by the standards of computational chemistry. The analysis does not find evidence of PM7 consistency with the entire data set considered as no single set of parameter values is found that captures the experimental uncertainties of all training data. Nevertheless, PM7 is found to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
