When to Lock Attention: Training-Free KV Control in Video Diffusion

Tianyi Zeng; Jincheng Gao; Tianyi Wang; Zijie Meng; Miao Zhang; Jun Yin; Haoyuan Sun; Junfeng Jiao; Christian Claudel; Junbo Tan; Xueqian Wang

arXiv:2603.09657·cs.CV·March 11, 2026

When to Lock Attention: Training-Free KV Control in Video Diffusion

Tianyi Zeng, Jincheng Gao, Tianyi Wang, Zijie Meng, Miao Zhang, Jun Yin, Haoyuan Sun, Junfeng Jiao, Christian Claudel, Junbo Tan, Xueqian Wang

PDF

Open Access

TL;DR

KV-Lock is a training-free, adaptive framework for video diffusion models that dynamically balances background locking and foreground enhancement, reducing artifacts and improving video quality without additional training.

Contribution

We introduce KV-Lock, a novel training-free method that uses hallucination detection to adaptively control background locking and guidance in video diffusion models.

Findings

01

Outperforms existing methods in foreground quality and background fidelity

02

Effectively reduces artifacts in video editing tasks

03

Easily integrates with pre-trained DiT-based models

Abstract

Maintaining background consistency while enhancing foreground quality remains a core challenge in video editing. Injecting full-image information often leads to background artifacts, whereas rigid background locking severely constrains the model's capacity for foreground generation. To address this issue, we propose KV-Lock, a training-free framework tailored for DiT-based video diffusion models. Our core insight is that the hallucination metric (variance of denoising prediction) directly quantifies generation diversity, which is inherently linked to the classifier-free guidance (CFG) scale. Building upon this, KV-Lock leverages diffusion hallucination detection to dynamically schedule two key components: the fusion ratio between cached background key-values (KVs) and newly generated KVs, and the CFG scale. When hallucination risk is detected, KV-Lock strengthens background KV locking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Advanced Image Processing Techniques