Disentangled Motion Modeling for Video Frame Interpolation

Jaihyun Lew; Jooyoung Choi; Chaehun Shin; Dahuin Jung; Sungroh Yoon

arXiv:2406.17256·cs.CV·December 20, 2024

Disentangled Motion Modeling for Video Frame Interpolation

Jaihyun Lew, Jooyoung Choi, Chaehun Shin, Dahuin Jung, Sungroh Yoon

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MoMo, a diffusion-based video frame interpolation method that models intermediate motion through disentangled, low-frequency flow representations, achieving high perceptual quality with lower computational costs.

Contribution

The paper presents a novel two-stage training process and a specialized U-Net architecture for optical flow, improving VFI by focusing on motion modeling and reducing computational complexity.

Findings

01

MoMo outperforms state-of-the-art methods in perceptual metrics.

02

It achieves high-quality frame interpolation with less computational cost.

03

The approach effectively models bi-directional flows using low-frequency motion representations.

Abstract

Video Frame Interpolation (VFI) aims to synthesize intermediate frames between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works have employed generative models for improved perceptual quality. However, they require complex training and large computational costs for pixel space modeling. In this paper, we introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling. We propose a disentangled two-stage training process. In the initial stage, frame synthesis and flow models are trained to generate accurate frames and flows optimal for synthesis. In the subsequent stage, we introduce a motion diffusion model, which incorporates our novel U-Net architecture specifically designed for optical flow, to generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jhlew/momo
pytorchOfficial

Videos

Disentangled Motion Modeling for Video Frame Interpolation· underline

Taxonomy

TopicsDigital Media Forensic Detection · Advanced Image Processing Techniques · Video Coding and Compression Technologies

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Max Pooling · U-Net · Diffusion