RFDM: Residual Flow Diffusion Model for Efficient Causal Video Editing

Mohammadreza Salehi; Mehdi Noroozi; Luca Morreale; Ruchika Chavhan; Malcolm Chadwick; Alberto Gil Ramos; Abhinav Mehrotra

arXiv:2602.06871·cs.CV·March 3, 2026

RFDM: Residual Flow Diffusion Model for Efficient Causal Video Editing

Mohammadreza Salehi, Mehdi Noroozi, Luca Morreale, Ruchika Chavhan, Malcolm Chadwick, Alberto Gil Ramos, Abhinav Mehrotra

PDF

Open Access

TL;DR

RFDM introduces a novel residual flow diffusion approach for efficient, variable-length causal video editing that leverages temporal redundancy and outperforms existing methods in style transfer and object removal tasks.

Contribution

The paper presents RFDM, a new diffusion-based video editing model that predicts residuals between frames, enabling efficient, scalable editing of variable-length videos with improved performance.

Findings

01

RFDM surpasses I2I-based methods in quality.

02

RFDM competes with fully spatiotemporal models.

03

RFDM maintains efficiency regardless of input video length.

Abstract

Instructional video editing applies edits to an input video using only text prompts, enabling intuitive natural-language control. Despite rapid progress, most methods still require fixed-length inputs and substantial compute. Meanwhile, autoregressive video generation enables efficient variable-length synthesis, yet remains under-explored for video editing. We introduce a causal, efficient video editing model that edits variable-length videos frame by frame. For efficiency, we start from a 2D image-to-image (I2I) diffusion model and adapt it to video-to-video (V2V) editing by conditioning the edit at time step t on the model's prediction at t-1. To leverage videos' temporal redundancy, we propose a new I2I diffusion forward process formulation that encourages the model to predict the residual between the target output and the previous prediction. We call this Residual Flow Diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Multimedia Communication and Technology