SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing

Xuanyi Zhou; Qiuyang Mang; Shuo Yang; Haocheng Xi; Jintao Zhang; Huanzhi Mao; Joseph E. Gonzalez; Kurt Keutzer; Ion Stoica; and Alvin Cheung

arXiv:2603.08982·cs.CV·March 11, 2026

SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing

Xuanyi Zhou, Qiuyang Mang, Shuo Yang, Haocheng Xi, Jintao Zhang, Huanzhi Mao, Joseph E. Gonzalez, Kurt Keutzer, Ion Stoica, and Alvin Cheung

PDF

Open Access

TL;DR

SVG-EAR introduces a parameter-free, error-aware linear compensation method for sparse video generation that recovers information lost due to sparse attention, significantly improving efficiency without sacrificing quality.

Contribution

The paper proposes SVG-EAR, a novel parameter-free approach that uses centroid-based approximation and error-aware routing to recover skipped attention contributions in sparse video diffusion models.

Findings

01

Achieves up to 1.93× speedup in video generation.

02

Maintains high fidelity with PSNR up to 31.043.

03

Establishes a Pareto frontier over prior sparse attention methods.

Abstract

Diffusion Transformers (DiTs) have become a leading backbone for video generation, yet their quadratic attention cost remains a major bottleneck. Sparse attention reduces this cost by computing only a subset of attention blocks. However, prior methods often either drop the remaining blocks, which incurs information loss, or rely on learned predictors to approximate them, introducing training overhead and potential output distribution shifting. In this paper, we show that the missing contributions can be recovered without training: after semantic clustering, keys and values within each block exhibit strong similarity and can be well summarized by a small set of cluster centroids. Based on this observation, we introduce SVG-EAR, a parameter-free linear compensation branch that uses the centroid to approximate skipped blocks and recover their contributions. While centroid compensation is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Video Coding and Compression Technologies · Generative Adversarial Networks and Image Synthesis