DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials

Kevin Han; Bowen Deng; Amir Barati Farimani; Gerbrand Ceder

arXiv:2506.02023·cs.DC·February 3, 2026·3 cites

DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials

Kevin Han, Bowen Deng, Amir Barati Farimani, Gerbrand Ceder

PDF

Open Access 1 Repo 3 Reviews

TL;DR

DistMLIP is a novel distributed inference platform that enables efficient multi-device inference of machine learning interatomic potentials, significantly increasing simulation scale and speed for atomistic modeling.

Contribution

The paper introduces DistMLIP, a graph partitioning-based distributed inference platform for MLIPs, allowing scalable, multi-GPU atomistic simulations with flexible model architectures.

Findings

01

Simulates 3.4x larger atomic systems

02

Achieves up to 8x faster inference on multiple GPUs

03

Performs near-million-atom calculations in seconds

Abstract

Large-scale atomistic simulations are essential to bridge computational materials and chemistry to realistic materials and drug discovery applications. In the past few years, rapid developments of machine learning interatomic potentials (MLIPs) have offered a solution to scale up quantum mechanical calculations. Parallelizing these interatomic potentials across multiple devices poses a challenging, but promising approach to further extending simulation scales to real-world applications. In this work, we present DistMLIP, an efficient distributed inference platform for MLIPs based on zero-redundancy, graph-level parallelization. In contrast to conventional spatial partitioning parallelization, DistMLIP enables efficient MLIP parallelization through graph partitioning, allowing multi-device inference on flexible MLIP model architectures like multi-layer graph neural networks. DistMLIP…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

### High-Impact Problem: The paper tackles a critical and timely bottleneck in computational science. Scaling MLIPs to the meso-scale (millions of atoms) is essential for bridging quantum-accurate simulations with real-world applications in materials science, chemistry, and biology. ### Sound and Novel Method: The graph-level parallelization approach is fundamentally better suited for GNN-based MLIPs than traditional spatial partitioning. The paper clearly articulates the "zero-redundancy"

Weaknesses

### Single-Node Limitation: The paper states the current implementation only supports "single-node multi-GPU inference". This is a significant limitation for scaling to truly massive systems (tens of millions+ atoms), which would require a multi-node, multi-GPU setup. The paper would be stronger if it discussed the roadmap and key challenges (e.g., managing communication overhead of border node features across a network interconnect) for a multi-node implementation. ### Clarification of "8x F

Reviewer 02Rating 4Confidence 3

Strengths

- Scaling MLIP simulations to biologically and materially relevant sizes (millions of atoms) is a major challenge. DistMLIP provides a much-needed solution specifically tailored for efficient distributed inference of modern GNN-based MLIPs. - DistMLIP is designed as a model-agnostic, plug-in platform . This allows researchers to apply it to their existing, pre-trained MLIPs with minimal adaptation (as demonstrated with four different models), significantly lowering the barrier to large-scale sim

Weaknesses

- Graph partitioning inherently requires communication between GPUs after each message-passing layer to exchange border node information. The paper acknowledges scaling isn't always ideal (Fig 2b, 2c) partly due to overheads, but a more detailed analysis of communication cost vs. computation cost, and how it scales with the number of GPUs, graph density, and partition quality, would be valuable. - The paper appears to be lacking comparisons against several relevant baselines. For instance, a cri

Reviewer 03Rating 4Confidence 3

Strengths

1. Works with popular MLIPs (MACE, TensorNet, CHGNet, eSEN) with minimal adaptation, so we don’t need model-specific rewrites. 2. The “vertical” partition rule is reported up to 8x faster than standard graph partitioners (e.g., METIS/RCMK). And against SevenNet’s distributed inference, DistMLIP has up to 10x higher max capacity and is 4x faster.

Weaknesses

Majors: 1. The authors say the design keeps backprop intermediates in their contribution claims, but they only benchmark inference; there’s no distributed-training result or accuracy/stability study over long MD runs. 2. All inference timing is on one cluster of 8x A100-80GB; there’s no multi-node or NVLink study to justify capability of large scale simulation. Minors: 1. Line 39: "CHARM" -> "CHARMM" 2. Line 44: "coupled clustering" -> "coupled cluster" 3. Line 132: Citation format 4. Line 201:

Code & Models

Repositories

AegisIK/DistMLIP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies