Placement Semantics for Distributed Deep Learning: A Systematic Framework for Analyzing Parallelism Strategies

Deep Pankajbhai Mehta

arXiv:2601.02311·cs.DC·January 6, 2026

Placement Semantics for Distributed Deep Learning: A Systematic Framework for Analyzing Parallelism Strategies

Deep Pankajbhai Mehta

PDF

Open Access

TL;DR

This paper introduces placement semantics, a systematic framework for analyzing and predicting the behavior of various parallelism strategies in distributed deep learning, unifying multiple approaches and providing precise memory and communication estimates.

Contribution

It formalizes placement semantics for distributed training, enabling prediction of memory and communication costs without implementation details, and unifies multiple parallelism strategies under a common framework.

Findings

01

Predictions match published results exactly.

02

ZeRO-3 uses 8x less memory than data parallelism at 1.5x communication cost.

03

Necessary and sufficient conditions for distributed training to match single-device results.

Abstract

Training large language models requires distributing computation across many accelerators, yet practitioners select parallelism strategies (data, tensor, pipeline, ZeRO) through trial and error because no unified systematic framework predicts their behavior. We introduce placement semantics: each strategy is specified by how it places four training states (parameters, optimizer, gradients, activations) across devices using five modes (replicated, sharded, sharded-with-gather, materialized, offloaded). From placement alone, without implementation details, we derive memory consumption and communication volume. Our predictions match published results exactly: ZeRO-3 uses 8x less memory than data parallelism at 1.5x communication cost, as reported in the original paper. We prove two conditions (gradient integrity, state consistency) are necessary and sufficient for distributed training to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Advanced Neural Network Applications · Topic Modeling