TL;DR
This paper introduces ISA, a sparse attention framework for ICL video editing, significantly reducing computational costs while maintaining high visual fidelity, and presents LIVEditor, a new efficient video editing model built on ISA.
Contribution
The paper proposes ISA, a novel sparse attention method for ICL video editing, and develops LIVEditor, achieving substantial speedups with minimal quality loss.
Findings
LIVEditor reduces attention-module latency by approximately 60%.
ISA achieves near-lossless acceleration in video editing tasks.
The framework outperforms state-of-the-art methods on multiple benchmarks.
Abstract
Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing. Our design is grounded in two key insights: first, context tokens exhibit significantly lower saliency than source tokens; second, we theoretically prove and empirically validate that Query sharpness correlates with approximation error. Motivated by these findings, ISA implements an efficient pre-selection strategy to prune redundant context, followed by a dynamic query grouping mechanism that routes high-error queries to full attention and low-error ones to a computationally efficient 0-th order Taylor sparse attention. Furthermore, we build \textbf{\texttt{LIVEditor}} , a novel lightning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
