Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

Yuanchang Zhou; Hongyu Wang; Yiming Du; Yan Wang; Mingzhen Li; Siyu Hu; Xiangyu Zhang; Weijian Liu; Chen Wang; Zhuoqiang Guo; Long Wang; Jingde Bu; Yutong Lu; Guangming Tan; Weile Jia

arXiv:2604.15821·cs.DC·April 20, 2026

Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

Yuanchang Zhou, Hongyu Wang, Yiming Du, Yan Wang, Mingzhen Li, Siyu Hu, Xiangyu Zhang, Weijian Liu, Chen Wang, Zhuoqiang Guo, Long Wang, Jingde Bu, Yutong Lu, Guangming Tan, Weile Jia

PDF

TL;DR

This paper presents MatRIS-MoE and Janus, enabling efficient billion-parameter uMLIP training on supercomputers, significantly reducing training time and setting new standards for AI foundation models in scientific research.

Contribution

Introduction of MatRIS-MoE and Janus frameworks that facilitate scalable, hardware-aware training of billion-parameter uMLIPs on Exascale supercomputers.

Findings

01

Achieved 1.2/1.0 EFLOPS peak performance with over 90% efficiency.

02

Reduced training time of billion-parameter uMLIPs from weeks to hours.

03

Set new benchmarks for AI4S foundation models at Exascale.

Abstract

Universal Machine Learning Interatomic Potentials (uMLIPs), pre-trained on massively diverse datasets encompassing inorganic materials and organic molecules across the entire periodic table, serve as foundational models for quantum-accurate physical simulations. However, uMLIP training requires second-order derivatives, which lack corresponding parallel training frameworks; moreover, scaling to the billion-parameter regime causes explosive growth in computation and communication overhead, making its training a tremendous challenge. We introduce MatRIS-MoE, a billion-parameter Mixture-of-Experts model built upon invariant architecture, and {Janus}, a pioneering high-dimensional distributed training framework for uMLIPs with hardware-aware optimizations. Deployed across two Exascale supercomputers, our code attains a peak performance of 1.2/1.0 EFLOPS (24\%/{35.5\%} of theoretical peak)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.