Object-WIPER : Training-Free Object and Associated Effect Removal in Videos
Saksham Singh Kushwaha, Sayan Nag, Yapeng Tian, Kuldeep Kulkarni

TL;DR
Object-WIPER is a training-free video editing framework that removes objects and their effects while maintaining temporal and semantic consistency, using a pre-trained diffusion model and a new evaluation metric.
Contribution
It introduces a novel training-free method for object and effect removal in videos, leveraging a pre-trained text-to-video diffusion transformer and a new benchmark for evaluation.
Findings
Outperforms existing methods in object removal quality and temporal stability.
Achieves effective removal without retraining or fine-tuning.
Introduces a new benchmark for evaluating object removal in videos.
Abstract
In this paper, we introduce Object-WIPER, a training-free framework for removing dynamic objects and their associated visual effects from videos, and inpainting them with semantically consistent and temporally coherent content. Our approach leverages a pre-trained text-to-video diffusion transformer (DiT). Given an input video, a user-provided object mask, and query tokens describing the target object and its effects, we localize relevant visual tokens via visual-text cross-attention and visual self-attention. This produces an intermediate effect mask that we fuse with the user mask to obtain a final foreground token mask to replace. We first invert the video through the DiT to obtain structured noise, then reinitialize the masked tokens with Gaussian noise while preserving background tokens. During denoising, we copy values for the background tokens saved during inversion to maintain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Domain Adaptation and Few-Shot Learning
