Towards Source-Aware Object Swapping with Initial Noise Perturbation
Jiahui Zhan, Xianbing Sun, Xiangnan Zhu, Yikun Ji, Ruitong Liu, Liqing Zhang, Jianfu Zhang

TL;DR
This paper introduces SourceSwap, a self-supervised framework for object swapping that learns cross-object alignment using frequency-separated perturbations, enabling high-quality, zero-shot editing without per-object finetuning.
Contribution
It proposes a novel frequency-based pseudo pair synthesis method and a dual U-Net architecture for source-aware object swapping without requiring paired data or finetuning.
Findings
Outperforms existing methods in fidelity and scene preservation.
Enables zero-shot inference and diverse editing tasks.
Introduces SourceBench, a new high-quality benchmark.
Abstract
Object swapping aims to replace a source object in a scene with a reference object while preserving object fidelity, scene fidelity, and object-scene harmony. Existing methods either require per-object finetuning and slow inference or rely on extra paired data that mostly depict the same object across contexts, forcing models to rely on background cues rather than learning cross-object alignment. We propose SourceSwap, a self-supervised and source-aware framework that learns cross-object alignment. Our key insight is to synthesize high-quality pseudo pairs from any image via a frequency-separated perturbation in the initial-noise space, which alters appearance while preserving pose, coarse shape, and scene layout, requiring no videos, multi-view data, or additional images. We then train a dual U-Net with full-source conditioning and a noise-free reference encoder, enabling direct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
