Staleness-Centric Optimizations for Parallel Diffusion MoE Inference

Jiajun Luo; Lizhuo Luo; Jianru Xu; Jiajun Song; Rongwei Lu; Chen Tang; Zhi Wang

arXiv:2411.16786·cs.DC·December 1, 2025

Staleness-Centric Optimizations for Parallel Diffusion MoE Inference

Jiajun Luo, Lizhuo Luo, Jianru Xu, Jiajun Song, Rongwei Lu, Chen Tang, Zhi Wang

PDF

Open Access

TL;DR

This paper introduces DICE, a staleness-centric optimization framework for parallel diffusion MoE inference that reduces communication bottlenecks and improves speed with minimal quality loss.

Contribution

DICE combines interweaved parallelism, selective synchronization, and conditional communication to effectively reduce staleness in expert-parallel diffusion models.

Findings

01

Achieves 1.26x speedup in diffusion inference.

02

Reduces staleness-related quality degradation.

03

Provides scalable optimization for MoE-based diffusion models.

Abstract

Mixture-of-Experts-based (MoE-based) diffusion models demonstrate remarkable scalability in high-fidelity image generation, yet their reliance on expert parallelism introduces critical communication bottlenecks. State-of-the-art methods alleviate such overhead in parallel diffusion inference through computation-communication overlapping, termed displaced parallelism. However, we identify that these techniques induce severe *staleness*-the usage of outdated activations from previous timesteps that significantly degrades quality, especially in expert-parallel scenarios. We tackle this fundamental tension and propose DICE, a staleness-centric optimization framework with a three-fold approach: (1) Interweaved Parallelism introduces staggered pipelines, effectively halving step-level staleness for free; (2) Selective Synchronization operates at layer-level and protects layers vulnerable from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications