Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening
Wooseok Jeon, Seunghyun Shin, Dongmin Shin, Hae-Gon Jeon

TL;DR
This paper introduces Motion Prior Distillation (MPD), an inference-time technique that improves temporal coherence in image-to-video diffusion models for inbetweening by reducing path mismatch and artifacts.
Contribution
The paper proposes MPD, a novel distillation method that aligns forward and backward motion priors during inference, enhancing temporal consistency in generated video frames.
Findings
MPD significantly reduces temporal artifacts in generated videos.
Quantitative results show improved coherence on standard benchmarks.
User studies confirm higher perceived quality of inbetweened frames.
Abstract
Recent progress in image-to-video (I2V) diffusion models has significantly advanced the field of generative inbetweening, which aims to generate semantically plausible frames between two keyframes. In particular, inference-time sampling strategies, which leverage the generative priors of large-scale pre-trained I2V models without additional training, have become increasingly popular. However, existing inference-time sampling, either fusing forward and backward paths in parallel or alternating them sequentially, often suffers from temporal discontinuities and undesirable visual artifacts due to the misalignment between the two generated paths. This is because each path follows the motion prior induced by its own conditioning frame. In this work, we propose Motion Prior Distillation (MPD), a simple yet effective inference-time distillation technique that suppresses bidirectional mismatch…
Peer Reviews
Decision·ICLR 2026 Poster
This paper tackles a well-defined problem: the mismatch of generated motion priors that occurs when conditioning the image-to-video denoiser on each end frame during the adaptation of image-to-video models for generative in-betweening. The novel aspect of the proposed approach lies in converting the forward motion noise estimate into a backward motion noise estimate using per-frame motion residuals. This design allows the backward motion to be learned without relying on denoiser estimates condit
When the end point of the forward motion prior path is far from the second input end frame, Eq (15) would not be a good initialization. The paper should discuss this limitation and include examples to to illustrate the scenarios in which the proposed method is most effective and scenarios in which is not effective.
- The problem definition is clear. - The results of the proposed method seem to be strong, outperforming existing work.
1. According to my understanding, in short, the proposed method aims to model the noise residual between frames, especially from the forward path, and distill it to the backward path for alignment. However, I feel quite unsure how this could well-align the motions. Figure 1 (c) seems to be a good description, and this also display my concern. In Fig.1(c), the position of the red car in the forward path and the backward path differ. To be more specific, in the backward path, starting from the end
1) Good motivation and novel framework Motivated by the goal of resolving the bidirectional path misalignment problem, the authors propose a novel framework called Motion Prior Distillation. This approach injects motion information from the forward path into the end-frame latent $z_{end}$ to guide the generation of the backward path. As a result, the method constrains to terminate at the end-frame latent $z_{end}$ while utilizing only the forward-path motion prior, effectively mitigating bidire
1) Limited explanation of early-stage application After carefully reviewing the submission, it remains unclear why the proposed method should be applied only during the early stage of the sampling process. The authors briefly attribute this choice to the “coarse-to-fine property of diffusion sampling,” but a more detailed explanation is needed. In particular, the paper should clarify why the proposed method is especially effective for correcting global or low-frequency structures, thereby helpi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Face recognition and analysis
