MiniMax-Remover: Taming Bad Noise Helps Video Object Removal

Bojia Zi; Weixuan Peng; Xianbiao Qi; Jianan Wang; Shihao Zhao; Rong Xiao; Kam-Fai Wong

arXiv:2505.24873·cs.CV·June 2, 2025

MiniMax-Remover: Taming Bad Noise Helps Video Object Removal

Bojia Zi, Weixuan Peng, Xianbiao Qi, Jianan Wang, Shihao Zhao, Rong Xiao, Kam-Fai Wong

PDF

Open Access 3 Models

TL;DR

MiniMax-Remover introduces a two-stage, efficient video object removal method that eliminates the need for textual guidance and reduces sampling steps, achieving state-of-the-art results with faster inference.

Contribution

The paper presents a novel lightweight, two-stage video object removal approach that removes reliance on textual input and classifier-free guidance, improving efficiency and effectiveness.

Findings

01

Achieves state-of-the-art removal results with as few as 6 sampling steps

02

Does not rely on classifier-free guidance, enhancing inference speed

03

Demonstrates superior performance over existing methods through extensive experiments

Abstract

Recent advances in video diffusion models have driven rapid progress in video editing techniques. However, video object removal, a critical subtask of video editing, remains challenging due to issues such as hallucinated objects and visual artifacts. Furthermore, existing methods often rely on computationally expensive sampling procedures and classifier-free guidance (CFG), resulting in slow inference. To address these limitations, we propose MiniMax-Remover, a novel two-stage video object removal approach. Motivated by the observation that text condition is not best suited for this task, we simplify the pretrained video generation model by removing textual input and cross-attention layers, resulting in a more lightweight and efficient model architecture in the first stage. In the second stage, we distilled our remover on successful videos produced by the stage-1 model and curated by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Digital Media Forensic Detection · Physical Unclonable Functions (PUFs) and Hardware Security

MethodsDiffusion