AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming   And Keyframe Selection

Shuheng Zhang; Yuqi Liu; Hongbo Zhou; Jun Peng; Yiyi Zhou; Xiaoshuai; Sun; Rongrong Ji

arXiv:2502.05433·cs.CV·February 11, 2025

AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection

Shuheng Zhang, Yuqi Liu, Hongbo Zhou, Jun Peng, Yiyi Zhou, Xiaoshuai, Sun, Rongrong Ji

PDF

Open Access 1 Datasets

TL;DR

AdaFlow introduces an adaptive, training-free method for efficient long video editing by selectively slimming attention and choosing keyframes, enabling editing of videos over ten times longer than previous methods.

Contribution

The paper proposes AdaFlow, a novel approach that adaptively reduces attention complexity and selects keyframes, significantly improving long video editing efficiency and quality without additional training.

Findings

01

AdaFlow can edit over 1,000 frames in one inference on a single GPU.

02

It achieves about ten times longer video editing than previous methods like TokenFlow.

03

The approach maintains high-quality editing with adaptive attention and keyframe selection.

Abstract

Despite great progress, text-driven long video editing is still notoriously challenging mainly due to excessive memory overhead. Although recent efforts have simplified this task into a two-step process of keyframe translation and interpolation generation, the token-wise keyframe translation still plagues the upper limit of video length. In this paper, we propose a novel and training-free approach towards efficient and effective long video editing, termed AdaFlow. We first reveal that not all tokens of video frames hold equal importance for keyframe translation, based on which we propose an Adaptive Attention Slimming scheme for AdaFlow to squeeze the $K V$ sequence, thus increasing the number of keyframes for translations by an order of magnitude. In addition, an Adaptive Keyframe Selection scheme is also equipped to select the representative frames for joint editing, further improving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

zhangsh2001/LongV-EVAL
dataset· 1.2k dl
1.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies