Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models
Wooseok Jeon, Seungho Park, Seunghyun Shin, Sangeyl Lee, Hyeonho Jeong, Hae-Gon Jeon

TL;DR
This paper introduces DyMoS, a training-free method that rebalances attention in image-to-video models to enhance motion dynamics without sacrificing image fidelity.
Contribution
DyMoS is a novel, model-agnostic approach that adjusts attention pathways during denoising to improve motion in generated videos without retraining.
Findings
DyMoS improves motion dynamics across multiple models.
It maintains visual quality and fidelity to the reference image.
DyMoS requires only a single scalar parameter for control.
Abstract
Image-to-video models often generate videos that remain overly static, compared to text-to-video models. While prior approaches mitigate this issue by weakening or modifying the image-conditioning signal, they often require additional training or sacrifice fidelity to the reference image. In this work, we identify reference-frame dominance as a key mechanism behind motion suppression. We observe that non-reference frames in I2V models allocate excessive self-attention to reference-frame key tokens, causing reference information to be over-propagated across time and suppressing inter-frame dynamics. Based on this finding, we propose DyMoS (Dynamic Motion Slider), a training-free and model-agnostic method that rebalances the attention pathway from generated frames to the reference frame during initial denoising steps. DyMoS leaves both the input image and model weights unchanged and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
