Zero-Shot Image Harmonization with Generative Model Prior
Jianqi Chen, Yilan Zhang, Zhengxia Zou, Keyan Chen, Zhenwei Shi

TL;DR
This paper introduces a zero-shot image harmonization method that uses foundation models for description and guidance, eliminating the need for large training datasets and improving generalization to unseen images.
Contribution
It presents a modular framework leveraging vision-language models and generative models for zero-shot image harmonization, inspired by human reasoning.
Findings
Achieves harmonious image results without extensive training.
Outperforms existing methods in generalization to unseen images.
Validated by visual results and user study.
Abstract
We propose a zero-shot approach to image harmonization, aiming to overcome the reliance on large amounts of synthetic composite images in existing methods. These methods, while showing promising results, involve significant training expenses and often struggle with generalization to unseen images. To this end, we introduce a fully modularized framework inspired by human behavior. Leveraging the reasoning capabilities of recent foundation models in language and vision, our approach comprises three main stages. Initially, we employ a pretrained vision-language model (VLM) to generate descriptions for the composite image. Subsequently, these descriptions guide the foreground harmonization direction of a text-to-image generative model (T2I). We refine text embeddings for enhanced representation of imaging conditions and employ self-attention and edge maps for structure preservation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
Methodsfail
