Anchoring and Rescaling Attention for Semantically Coherent Inbetweening
Tae Eun Choi, Sumin Shim, Junhyeok Kim, Seong Jae Hwang

TL;DR
This paper introduces a novel attention mechanism and a new benchmark to improve and evaluate semantic coherence and consistency in generative inbetweening of video frames, especially for sparse sequences and large motions.
Contribution
The paper proposes Keyframe-anchored Attention Bias and Rescaled Temporal RoPE to enhance frame consistency and semantic alignment in generative inbetweening, along with a new benchmark TGI-Bench for evaluation.
Findings
Achieves state-of-the-art frame consistency and semantic fidelity.
Improves pace stability across diverse sequence lengths.
Operates effectively without additional training.
Abstract
Generative inbetweening (GI) seeks to synthesize realistic intermediate frames between the first and last keyframes beyond mere interpolation. As sequences become sparser and motions larger, previous GI models struggle with inconsistent frames with unstable pacing and semantic misalignment. Since GI involves fixed endpoints and numerous plausible paths, this task requires additional guidance gained from the keyframes and text to specify the intended path. Thus, we give semantic and temporal guidance from the keyframes and text onto each intermediate frame through Keyframe-anchored Attention Bias. We also better enforce frame consistency with Rescaled Temporal RoPE, which allows self-attention to attend to keyframes more faithfully. TGI-Bench, the first benchmark specifically designed for text-conditioned GI evaluation, enables challenge-targeted evaluation to analyze GI models. Without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis
