LightPFP: A Lightweight Route to Ab Initio Accuracy at Scale

Wenwen Li; Nontawat Charoenphakdee; Yong-Bin Zhuang; Ryuhei Okuno; Yuta Tsuboi; So Takamoto; Junichi Ishida; Ju Li

arXiv:2510.23064·cond-mat.mtrl-sci·November 7, 2025

LightPFP: A Lightweight Route to Ab Initio Accuracy at Scale

Wenwen Li, Nontawat Charoenphakdee, Yong-Bin Zhuang, Ryuhei Okuno, Yuta Tsuboi, So Takamoto, Junichi Ishida, Ju Li

PDF

TL;DR

LightPFP introduces a data-efficient distillation framework that creates accurate, lightweight machine learning interatomic potentials, enabling large-scale atomistic simulations with DFT-level accuracy at a fraction of the computational cost.

Contribution

The paper presents a novel knowledge distillation approach that leverages universal MLIPs to efficiently generate high-quality training data for task-specific MLIPs without extensive DFT calculations.

Findings

01

Achieves three orders of magnitude faster model development than DFT-based methods.

02

Maintains accuracy comparable to first-principles predictions across various materials.

03

Enables efficient correction of systematic errors with minimal high-accuracy data.

Abstract

Atomistic simulation methods have evolved through successive computational levels, each building upon more fundamental approaches: from quantum mechanics to density functional theory (DFT), and subsequently, to machine learning interatomic potentials (MLIPs). While universal MLIPs (u-MLIPs) offer broad transferability, their computational overhead limits large-scale applications. Task-specific MLIPs (ts-MLIPs) achieve superior efficiency but require prohibitively expensive DFT data generation for each material system. In this paper, we propose LightPFP, a data-efficient knowledge distillation framework. Instead of using costly DFT calculations, LightPFP generates a distilled ts-MLIP by leveraging u-MLIP to generate high-quality training data tailored for specific materials and utilizing a pre-trained light-weight MLIP to further enhance data efficiency. Across a broad spectrum of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.