Zero-Shot Image Harmonization with Generative Model Prior

Jianqi Chen; Yilan Zhang; Zhengxia Zou; Keyan Chen; Zhenwei Shi

arXiv:2307.08182·cs.CV·March 12, 2024·2 cites

Zero-Shot Image Harmonization with Generative Model Prior

Jianqi Chen, Yilan Zhang, Zhengxia Zou, Keyan Chen, Zhenwei Shi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a zero-shot image harmonization method that uses foundation models for description and guidance, eliminating the need for large training datasets and improving generalization to unseen images.

Contribution

It presents a modular framework leveraging vision-language models and generative models for zero-shot image harmonization, inspired by human reasoning.

Findings

01

Achieves harmonious image results without extensive training.

02

Outperforms existing methods in generalization to unseen images.

03

Validated by visual results and user study.

Abstract

We propose a zero-shot approach to image harmonization, aiming to overcome the reliance on large amounts of synthetic composite images in existing methods. These methods, while showing promising results, involve significant training expenses and often struggle with generalization to unseen images. To this end, we introduce a fully modularized framework inspired by human behavior. Leveraging the reasoning capabilities of recent foundation models in language and vision, our approach comprises three main stages. Initially, we employ a pretrained vision-language model (VLM) to generate descriptions for the composite image. Subsequently, these descriptions guide the foreground harmonization direction of a text-to-image generative model (T2I). We refine text embeddings for enhanced representation of imaging conditions and employ self-attention and edge maps for structure preservation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

windvchen/diff-harmonization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

Methodsfail