Endowing Molecular Language with Geometry Perception via Modality Compensation for High-Throughput Quantum Hamiltonian Prediction
Zhenzhong Wang, Yongjie Hou, Chenggong Huang, Yuxuan Du, Dacheng Tao, Min Jiang

TL;DR
This paper introduces a geometry-aware molecular language model that predicts quantum Hamiltonians efficiently using SMILES, employing modality compensation and weak supervision to bypass expensive geometric data, achieving significant speedups with maintained accuracy.
Contribution
The novel modality compensation strategy enables accurate Hamiltonian prediction from SMILES alone, reducing reliance on costly geometric data and improving data efficiency through weak supervision.
Findings
Achieves up to 100x speedup over traditional methods
Maintains comparable accuracy with reduced geometric data dependence
Proves theoretical bounds on prediction error without explicit geometry
Abstract
The quantum Hamiltonian is a fundamental property that governs a molecule's electronic structure and behavior, and its calculation and prediction are paramount in computational chemistry and materials science. Accurate prediction is highly reliant on extensive training data, including precise molecular geometries and the Hamiltonian matrices, which are expensive to acquire via either experimental or computational methods. Towards a fast yet accurate method for Hamiltonian prediction, we first introduce a geometry information-aware molecular language model to bypass the use of expensive molecular geometries by only using the readily available molecular language -- simplified molecular input line entry system (SMILES). Our method employs multimodal alignment to bridge the relationship between SMILES strings and their corresponding molecular geometries. Recognizing that the molecular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Computational Drug Discovery Methods
