JanusPipe: Efficient Pipeline Parallel Training for Machine Learning Interatomic Potentials
Hongyu Wang, Weijian Liu, Hongtao Xu, Yan Wang, Mingzhen Li, Weile Jia, Guangming Tan

TL;DR
JanusPipe is a novel 3D-parallel training system designed to efficiently scale conservative machine learning interatomic potentials for molecular dynamics simulations, overcoming existing system limitations.
Contribution
It introduces SymFold and WaveK to enable memory-efficient pipeline parallelism and reduce pipeline bubbles specifically for conservative MLIPs.
Findings
JanusPipe improves throughput by 1.51x on 32 GPUs.
It achieves a 1.45x throughput increase over previous methods.
Experimental results demonstrate enhanced scalability for MLIPs.
Abstract
Discovering atom-level phenomena requires molecular dynamics (MD) simulations with ab initio accuracy. Machine learning interatomic potentials (MLIPs) enable stable, high-accuracy MD simulations, and their models exhibit scaling-law trends similar to large language models. However, the lack of scalable and efficient distributed training systems for conservative MLIPs makes them difficult to scale. This is because conservative MLIPs inherently follow a double-backward execution pattern, which involves computing gradients during the forward pass. This pattern creates a mismatch with existing distributed training systems, especially for pipeline parallelism. Therefore, we present JanusPipe, an efficient 3D-parallel (PP/DP/GP) training system tailored for conservative MLIPs. It integrates SymFold to enable memory-efficient pipeline parallelism for conservative MLIPs, and WaveK to reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
