Critical Benchmarking of the G4(MP2) Model, the Correlation Consistent Composite Approach and Popular Density Functional Approximations on a Probabilistically Pruned Benchmark Dataset of Formation Enthalpies
Sambit Kumar Das, Sabyasachi Chakraborty, Raghunathan Ramakrishnan

TL;DR
This paper benchmarks various quantum chemistry methods, including G4(MP2), ccCA, CBS-QB3, and 23 DFAs, on a large, diverse dataset of over 1600 molecules to evaluate their accuracy in predicting formation enthalpies.
Contribution
It introduces a new extensive benchmark dataset, PPE1694, and systematically compares the accuracy of multiple computational methods for larger, diverse molecules.
Findings
G4(MP2) and ccCA show high accuracy for diverse molecules.
Systematic errors increase with molecular size.
Higher-level corrections improve G4(MP2) predictions.
Abstract
First-principles calculation of the standard formation enthalpy, (298K), in such large scale as required by chemical space explorations, is amenable only with density functional approximations (DFAs) and some composite wave function theories (cWFTs). Alas, the accuracies of popular range-separated hybrid, `rung-4' DFAs, and cWFTs that offer the best accuracy-vs.-cost trade-off have as yet been established only for datasets predominantly comprising small molecules, hence, their transferability to larger datasets remains vague. In this study, we present an extended benchmark dataset of over 1600 values of for structurally and electronically diverse molecules. We apply quartile-ranking based on boundary-corrected kernel density estimation to filter outliers and arrive at Probabilistically Pruned Enthalpies of 1694 compounds (PPE1694). For this dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
