TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

Zhiben Chen; Youpeng Zhao; Yang Sui; Jun Wang; Yuzhang Shang

arXiv:2605.20179·cs.CL·May 20, 2026

TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

Zhiben Chen, Youpeng Zhao, Yang Sui, Jun Wang, Yuzhang Shang

PDF

1 Repo

TL;DR

TIDE is a novel, lossless inference system for diffusion LLMs that optimizes expert offload and scheduling, significantly improving throughput on resource-constrained devices without requiring model retraining.

Contribution

It introduces an I/O-aware expert refresh strategy and mathematical scheduling optimization for efficient, lossless diffusion LLM inference without model training.

Findings

01

Achieves up to 1.4× and 1.5× throughput improvements on LLaDA2.0 models.

02

Leverages temporal stability of expert activations for resource-efficient inference.

03

Provides a lossless, no-training-required acceleration method.

Abstract

Diffusion Large Language Models (dLLMs) have emerged as a competitive alternative to autoregressive (AR) models, offering better hardware utilization and bidirectional context through parallel block-level decoding. However, as dLLMs continue to scale up with mixture-of-experts (MoE) architectures, their deployment on resource-constrained devices remains an open challenge. Existing AR-based methods often incur either prohibitive I/O overhead or significant compute bottlenecks. In this work, we propose TIDE, a novel resource-efficient inference system that leverages the temporal stability of expert activations during the diffusion process within the block. Specifically, we leverage the temporal stability of expert activations during the diffusion process within the block and introduce an interval-based expert refresh strategy that updates the expert placement in an I/O-aware fashion. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ims-kdks/TIDE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.