Object-WIPER : Training-Free Object and Associated Effect Removal in Videos

Saksham Singh Kushwaha; Sayan Nag; Yapeng Tian; Kuldeep Kulkarni

arXiv:2601.06391·cs.CV·February 24, 2026

Object-WIPER : Training-Free Object and Associated Effect Removal in Videos

Saksham Singh Kushwaha, Sayan Nag, Yapeng Tian, Kuldeep Kulkarni

PDF

Open Access

TL;DR

Object-WIPER is a training-free video editing framework that removes objects and their effects while maintaining temporal and semantic consistency, using a pre-trained diffusion model and a new evaluation metric.

Contribution

It introduces a novel training-free method for object and effect removal in videos, leveraging a pre-trained text-to-video diffusion transformer and a new benchmark for evaluation.

Findings

01

Outperforms existing methods in object removal quality and temporal stability.

02

Achieves effective removal without retraining or fine-tuning.

03

Introduces a new benchmark for evaluating object removal in videos.

Abstract

In this paper, we introduce Object-WIPER, a training-free framework for removing dynamic objects and their associated visual effects from videos, and inpainting them with semantically consistent and temporally coherent content. Our approach leverages a pre-trained text-to-video diffusion transformer (DiT). Given an input video, a user-provided object mask, and query tokens describing the target object and its effects, we localize relevant visual tokens via visual-text cross-attention and visual self-attention. This produces an intermediate effect mask that we fuse with the user mask to obtain a final foreground token mask to replace. We first invert the video through the DiT to obtain structured noise, then reinitialize the masked tokens with Gaussian noise while preserving background tokens. During denoising, we copy values for the background tokens saved during inversion to maintain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Domain Adaptation and Few-Shot Learning