Augmenting Molecular Graphs with Geometries via Machine Learning Interatomic Potentials

Cong Fu; Yuchao Lin; Zachary Krueger; Haiyang Yu; Maho Nakata; Jianwen Xie; Emine Kucukbenli; Xiaofeng Qian; Shuiwang Ji

arXiv:2507.00407·physics.chem-ph·February 25, 2026

Augmenting Molecular Graphs with Geometries via Machine Learning Interatomic Potentials

Cong Fu, Yuchao Lin, Zachary Krueger, Haiyang Yu, Maho Nakata, Jianwen Xie, Emine Kucukbenli, Xiaofeng Qian, Shuiwang Ji

PDF

TL;DR

This paper develops machine learning interatomic potential models trained on a large dataset to predict molecular geometries and properties, offering a faster alternative to expensive quantum chemistry methods with promising accuracy and transferability.

Contribution

It introduces a large-scale dataset and pre-trained MLIP models that can generate approximate geometries and enhance molecular property predictions, advancing the integration of geometry prediction in ML workflows.

Findings

01

Pre-trained models can generate low-energy geometries improving property prediction.

02

Geometry fine-tuning enhances downstream molecular property accuracy.

03

Models trained on relaxation data transfer well to various molecular tasks.

Abstract

Accurate molecular property predictions require 3D geometries, which are typically obtained using expensive methods such as density functional theory (DFT). Here, we attempt to obtain molecular geometries by relying solely on machine learning interatomic potential (MLIP) models. To this end, we first curate a large-scale molecular relaxation dataset comprising 3.5 million molecules and 300 million snapshots. Then MLIP pre-trained models are trained with supervised learning to predict energy and forces given 3D molecular structures. Once trained, we show that the pre-trained models can be used in different ways to obtain geometries either explicitly or implicitly. First, it can be used to obtain approximate low-energy 3D geometries via geometry optimization. While these geometries do not consistently reach DFT-level chemical accuracy or convergence, they can still improve downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.