SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing
Xuanyi Zhou, Qiuyang Mang, Shuo Yang, Haocheng Xi, Jintao Zhang, Huanzhi Mao, Joseph E. Gonzalez, Kurt Keutzer, Ion Stoica, and Alvin Cheung

TL;DR
SVG-EAR introduces a parameter-free, error-aware linear compensation method for sparse video generation that recovers information lost due to sparse attention, significantly improving efficiency without sacrificing quality.
Contribution
The paper proposes SVG-EAR, a novel parameter-free approach that uses centroid-based approximation and error-aware routing to recover skipped attention contributions in sparse video diffusion models.
Findings
Achieves up to 1.93× speedup in video generation.
Maintains high fidelity with PSNR up to 31.043.
Establishes a Pareto frontier over prior sparse attention methods.
Abstract
Diffusion Transformers (DiTs) have become a leading backbone for video generation, yet their quadratic attention cost remains a major bottleneck. Sparse attention reduces this cost by computing only a subset of attention blocks. However, prior methods often either drop the remaining blocks, which incurs information loss, or rely on learned predictors to approximate them, introducing training overhead and potential output distribution shifting. In this paper, we show that the missing contributions can be recovered without training: after semantic clustering, keys and values within each block exhibit strong similarity and can be well summarized by a small set of cluster centroids. Based on this observation, we introduce SVG-EAR, a parameter-free linear compensation branch that uses the centroid to approximate skipped blocks and recover their contributions. While centroid compensation is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Video Coding and Compression Technologies · Generative Adversarial Networks and Image Synthesis
