FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity

Jian Tang; Jiawei Fan; Qingbin Liu; Zheng Wei

arXiv:2605.11869·cs.CV·May 13, 2026

FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity

Jian Tang, Jiawei Fan, Qingbin Liu, Zheng Wei

PDF

TL;DR

FIS-DiT introduces a training-free, frame interleaved sparsity method to significantly accelerate video diffusion transformer inference, especially in few-step regimes, without substantial quality loss.

Contribution

The paper proposes a novel, training-free framework that exploits frame-wise sparsity and structural consistency to enhance inference speed in video diffusion transformers.

Findings

01

Achieves 2.11--2.41× speedup on benchmark datasets.

02

Maintains negligible quality degradation across key metrics.

03

Provides a scalable approach for real-time high-definition video generation.

Abstract

While the overall inference latency of Video Diffusion Transformers (DiTs) can be substantially reduced through model distillation, per-step inference latency remains a critical bottleneck. Existing acceleration paradigms primarily exploit redundancy across the denoising trajectory; however, we identify a limitation where these step-wise strategies encounter diminishing returns in few-step regimes. In such scenarios, the scarcity of temporal states prevents effective feature reuse or predictive modeling, creating a formidable barrier to further acceleration. To overcome this, we propose Frame Interleaved Sparsity DiT (FIS-DiT), a training-free and operator-agnostic framework that shifts the optimization focus from the temporal trajectory to the latent frame dimension. Our approach is motivated by an intrinsic duality within this dimension: the existence of frame-wise sparsity that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.