From Ideal to Real: Stable Video Object Removal under Imperfect Conditions

Jiagao Hu; Yuxuan Chen; Fuhao Li; Zepeng Wang; Fei Wang; Daiguo Zhou; Jian Luan

arXiv:2603.09283·cs.CV·April 23, 2026

From Ideal to Real: Stable Video Object Removal under Imperfect Conditions

Jiagao Hu, Yuxuan Chen, Fuhao Li, Zepeng Wang, Fei Wang, Daiguo Zhou, Jian Luan

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces SVOR, a robust video object removal framework that effectively handles shadows, motion, and mask defects, achieving state-of-the-art results in real-world scenarios.

Contribution

The paper proposes three novel techniques—MUSE, DA-Seg, and a two-stage training process—that significantly improve the stability and robustness of video object removal under imperfect conditions.

Findings

01

SVOR outperforms existing methods on multiple datasets.

02

The framework effectively removes shadows and reflections.

03

It maintains temporal stability and visual consistency in challenging scenarios.

Abstract

Removing objects from videos remains difficult in the presence of real-world imperfections such as shadows, abrupt motion, and defective masks. Existing diffusion-based video inpainting models often struggle to maintain temporal stability and visual consistency under these challenges. We propose Stable Video Object Removal (SVOR), a robust framework that achieves shadow-free, flicker-free, and mask-defect-tolerant removal through three key designs: (1) Mask Union for Stable Erasure (MUSE), a windowed union strategy applied during temporal mask downsampling to preserve all target regions observed within each window, effectively handling abrupt motion and reducing missed removals; (2) Denoising-Aware Segmentation (DA-Seg), a lightweight segmentation head on a decoupled side branch equipped with Denoising-Aware AdaLN and trained with mask degradation to provide an internal diffusion-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://xiaomi-research.github.io/svor
github

Models

🤗
HigherHu/SVOR
model· 260 dl· ♡ 1
260 dl♡ 1

Datasets

HigherHu/RORD-50
dataset· 261 dl
261 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.