THEMol dataset: Torsion, Hessian, and Energy of Molecules
Jiashu Liang, Tianze Zheng, Yu Xia, Xingyuan Xu, Xu Han, Zhi Wang, Siyuan Liu, Ailun Wang, Yu Liu, Shiqian Tan, Dongfei Liu, Zhichen Pu, Yuanheng Wang, Qiming Sun, Xiaojie Wu, Wen Yan

TL;DR
THEMol is a comprehensive open-source dataset of quantum mechanical properties for organic molecules, including geometries, Hessians, torsion scans, and electron densities, designed to advance molecular potential modeling.
Contribution
It introduces a large, diverse dataset with extensive conformational sampling and Hessian information, filling a gap in resources for molecular potential development.
Findings
Contains over 3 million Hessian matrices at relaxed geometries.
Includes nearly 100 million constrained relaxed geometries with energies and forces.
Encompasses about 3 billion DFT calculations across diverse molecular architectures.
Abstract
We present THEMol (Torsion, Hessian, Energy of Molecules), a massive open-source collection of quantum mechanical properties tailored for closed-shell organic molecules, with up to 50 heavy atoms. THEMol includes a Hessian subset with more than 3 million relaxed geometries with Hessian matrices, a TorsionScan subset with nearly 100 million constrained relaxed geometries with energies and forces, and relaxation-trajectory subsets (HessianRelax and TorsionScanRelax) that together comprise about 3 billion DFT calculations. The chemical space sampling is comprehensive, spanning twelve essential elements and diverse molecular architectures relevant to drug discovery, electrolytes, ionic liquids, and beyond. The dataset also features exhaustive conformational sampling through the TorsionScan and TorsionScanRelax subsets, including comprehensive in-ring and non-ring torsional scans.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
