From Understanding to Erasing: Towards Complete and Stable Video Object Removal

Dingming Liu; Wenjing Wang; Chen Li; Jing Lyu

arXiv:2604.01693·cs.CV·April 3, 2026

From Understanding to Erasing: Towards Complete and Stable Video Object Removal

Dingming Liu, Wenjing Wang, Chen Li, Jing Lyu

PDF

1 Repo

TL;DR

This paper introduces a novel video object removal method that combines external knowledge transfer and internal context grounding to achieve more coherent and complete removal, setting new state-of-the-art results.

Contribution

It proposes a dual-guidance framework integrating foundation model distillation and framewise context attention for improved video object removal.

Findings

01

Achieves state-of-the-art performance on video object removal benchmarks.

02

Establishes the first real-world benchmark for this task.

03

Demonstrates effective removal of shadows, reflections, and illumination effects.

Abstract

Video object removal aims to eliminate target objects from videos while plausibly completing missing regions and preserving spatio-temporal consistency. Although diffusion models have recently advanced this task, it remains challenging to remove object-induced side effects (e.g., shadows, reflections, and illumination changes) without compromising overall coherence. This limitation stems from the insufficient physical and semantic understanding of the target object and its interactions with the scene. In this paper, we propose to introduce understanding into erasing from two complementary perspectives. Externally, we introduce a distillation scheme that transfers the relationships between objects and their induced effects from vision foundation models to video diffusion models. Internally, we propose a framewise context cross-attention mechanism that grounds each denoising block in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WeChatCV/UnderEraser
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.