DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
Kevin Han, Bowen Deng, Amir Barati Farimani, Gerbrand Ceder

TL;DR
DistMLIP is a novel distributed inference platform that enables efficient multi-device inference of machine learning interatomic potentials, significantly increasing simulation scale and speed for atomistic modeling.
Contribution
The paper introduces DistMLIP, a graph partitioning-based distributed inference platform for MLIPs, allowing scalable, multi-GPU atomistic simulations with flexible model architectures.
Findings
Simulates 3.4x larger atomic systems
Achieves up to 8x faster inference on multiple GPUs
Performs near-million-atom calculations in seconds
Abstract
Large-scale atomistic simulations are essential to bridge computational materials and chemistry to realistic materials and drug discovery applications. In the past few years, rapid developments of machine learning interatomic potentials (MLIPs) have offered a solution to scale up quantum mechanical calculations. Parallelizing these interatomic potentials across multiple devices poses a challenging, but promising approach to further extending simulation scales to real-world applications. In this work, we present DistMLIP, an efficient distributed inference platform for MLIPs based on zero-redundancy, graph-level parallelization. In contrast to conventional spatial partitioning parallelization, DistMLIP enables efficient MLIP parallelization through graph partitioning, allowing multi-device inference on flexible MLIP model architectures like multi-layer graph neural networks. DistMLIP…
Peer Reviews
Decision·ICLR 2026 Poster
### High-Impact Problem: The paper tackles a critical and timely bottleneck in computational science. Scaling MLIPs to the meso-scale (millions of atoms) is essential for bridging quantum-accurate simulations with real-world applications in materials science, chemistry, and biology. ### Sound and Novel Method: The graph-level parallelization approach is fundamentally better suited for GNN-based MLIPs than traditional spatial partitioning. The paper clearly articulates the "zero-redundancy"
### Single-Node Limitation: The paper states the current implementation only supports "single-node multi-GPU inference". This is a significant limitation for scaling to truly massive systems (tens of millions+ atoms), which would require a multi-node, multi-GPU setup. The paper would be stronger if it discussed the roadmap and key challenges (e.g., managing communication overhead of border node features across a network interconnect) for a multi-node implementation. ### Clarification of "8x F
- Scaling MLIP simulations to biologically and materially relevant sizes (millions of atoms) is a major challenge. DistMLIP provides a much-needed solution specifically tailored for efficient distributed inference of modern GNN-based MLIPs. - DistMLIP is designed as a model-agnostic, plug-in platform . This allows researchers to apply it to their existing, pre-trained MLIPs with minimal adaptation (as demonstrated with four different models), significantly lowering the barrier to large-scale sim
- Graph partitioning inherently requires communication between GPUs after each message-passing layer to exchange border node information. The paper acknowledges scaling isn't always ideal (Fig 2b, 2c) partly due to overheads, but a more detailed analysis of communication cost vs. computation cost, and how it scales with the number of GPUs, graph density, and partition quality, would be valuable. - The paper appears to be lacking comparisons against several relevant baselines. For instance, a cri
1. Works with popular MLIPs (MACE, TensorNet, CHGNet, eSEN) with minimal adaptation, so we don’t need model-specific rewrites. 2. The “vertical” partition rule is reported up to 8x faster than standard graph partitioners (e.g., METIS/RCMK). And against SevenNet’s distributed inference, DistMLIP has up to 10x higher max capacity and is 4x faster.
Majors: 1. The authors say the design keeps backprop intermediates in their contribution claims, but they only benchmark inference; there’s no distributed-training result or accuracy/stability study over long MD runs. 2. All inference timing is on one cluster of 8x A100-80GB; there’s no multi-node or NVLink study to justify capability of large scale simulation. Minors: 1. Line 39: "CHARM" -> "CHARMM" 2. Line 44: "coupled clustering" -> "coupled cluster" 3. Line 132: Citation format 4. Line 201:
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
