Disentangled Motion Modeling for Video Frame Interpolation
Jaihyun Lew, Jooyoung Choi, Chaehun Shin, Dahuin Jung, Sungroh Yoon

TL;DR
This paper introduces MoMo, a diffusion-based video frame interpolation method that models intermediate motion through disentangled, low-frequency flow representations, achieving high perceptual quality with lower computational costs.
Contribution
The paper presents a novel two-stage training process and a specialized U-Net architecture for optical flow, improving VFI by focusing on motion modeling and reducing computational complexity.
Findings
MoMo outperforms state-of-the-art methods in perceptual metrics.
It achieves high-quality frame interpolation with less computational cost.
The approach effectively models bi-directional flows using low-frequency motion representations.
Abstract
Video Frame Interpolation (VFI) aims to synthesize intermediate frames between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works have employed generative models for improved perceptual quality. However, they require complex training and large computational costs for pixel space modeling. In this paper, we introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling. We propose a disentangled two-stage training process. In the initial stage, frame synthesis and flow models are trained to generate accurate frames and flows optimal for synthesis. In the subsequent stage, we introduce a motion diffusion model, which incorporates our novel U-Net architecture specifically designed for optical flow, to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDigital Media Forensic Detection · Advanced Image Processing Techniques · Video Coding and Compression Technologies
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Max Pooling · U-Net · Diffusion
