Physics-Aware Video Instance Removal Benchmark

Zirui Li; Xinghao Chen; Lingyu Jiang; Dengzhe Hou; Fangzhou Lin; Kazunori Yamada; Xiangbo Gao; Zhengzhong Tu

arXiv:2604.05898·cs.CV·April 8, 2026

Physics-Aware Video Instance Removal Benchmark

Zirui Li, Xinghao Chen, Lingyu Jiang, Dengzhe Hou, Fangzhou Lin, Kazunori Yamada, Xiangbo Gao, Zhengzhong Tu

PDF

TL;DR

The paper introduces PVIR, a benchmark for video instance removal that emphasizes physical realism, with annotated videos and evaluation of methods on complex physical interactions.

Contribution

It presents a new benchmark with annotated videos and a comprehensive evaluation protocol focusing on physical and semantic consistency in VIR.

Findings

01

PISCO-Removal and UniVideo achieve state-of-the-art results.

02

DiffuEraser often causes blurring artifacts.

03

Performance drops on the Hard subset highlight challenges in complex physical interactions.

Abstract

Video Instance Removal (VIR) requires removing target objects while maintaining background integrity and physical consistency, such as specular reflections and illumination interactions. Despite advancements in text-guided editing, current benchmarks primarily assess visual plausibility, often overlooking the physical causalities, such as lingering shadows, triggered by object removal. We introduce the Physics-Aware Video Instance Removal (PVIR) benchmark, featuring 95 high-quality videos annotated with instance-accurate masks and removal prompts. PVIR is partitioned into Simple and Hard subsets, the latter explicitly targeting complex physical interactions. We evaluate four representative methods, PISCO-Removal, UniVideo, DiffuEraser, and CoCoCo, using a decoupled human evaluation protocol across three dimensions to isolate semantic, visual, and spatial failures: instruction following,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.