VOID: Video Object and Interaction Deletion

Saman Motamed; William Harvey; Benjamin Klein; Luc Van Gool; Zhuoning Yuan; Ta-Ying Cheng

arXiv:2604.02296·cs.CV·April 3, 2026

VOID: Video Object and Interaction Deletion

Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan, Ta-Ying Cheng

PDF

1 Repo 10 Models 1 Datasets

TL;DR

VOID is a novel video object removal framework that achieves physically plausible inpainting in complex scenarios involving object interactions by combining scene understanding and diffusion models.

Contribution

The paper introduces a new dataset and a method that integrates vision-language models with video diffusion to improve scene consistency after object removal.

Findings

01

Outperforms prior methods in preserving scene dynamics.

02

Generates more physically consistent inpainted videos.

03

Effective on both synthetic and real data.

Abstract

Existing video object removal methods excel at inpainting content "behind" the object and correcting appearance-level artifacts such as shadows and reflections. However, when the removed object has more significant interactions, such as collisions with other objects, current models fail to correct them and produce implausible results. We present VOID, a video object removal framework designed to perform physically-plausible inpainting in these complex scenarios. To train the model, we generate a new paired dataset of counterfactual object removals using Kubric and HUMOTO, where removing an object requires altering downstream physical interactions. During inference, a vision-language model identifies regions of the scene affected by the removed object. These regions are then used to guide a video diffusion model that generates physically consistent counterfactual outcomes. Experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

netflix/void-model
github

Models

Datasets

ErenAta00/VOID-Quadmask-Dataset
dataset· 249 dl
249 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.