TL;DR
PANDORA is a zero-shot object removal method that leverages pixel-wise attention dissolution and latent guidance to remove objects from images without fine-tuning or prompts, achieving high fidelity and scalability.
Contribution
It introduces a novel zero-shot framework operating on pre-trained diffusion models, with pixel-wise attention dissolution and localized disentanglement guidance for effective multi-object removal.
Findings
Outperforms state-of-the-art methods in visual fidelity and semantic plausibility.
Enables precise, non-rigid, multi-object removal in a single pass.
Requires no fine-tuning, prompts, or optimization.
Abstract
Removing objects from natural images is challenging due to difficulty of synthesizing semantically coherent content while preserving background integrity. Existing methods often rely on fine-tuning, prompt engineering, or inference-time optimization, yet still suffer from texture inconsistency, rigid artifacts, weak foreground-background disentanglement, and poor scalability for multi-object removal. We propose a novel zero-shot object removal framework, namely PANDORA, that operates directly on pre-trained text-to-image diffusion models, requiring no fine-tuning, prompts, or optimization. We propose Pixel-wise Attention Dissolution to remove object by nullifying the most correlated attention keys for masked pixels, effectively eliminating the object from self-attention flow and allowing background context to dominate reconstruction. We further introduce Localized Attentional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
