Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics
Jinghao Hu, Yuhe Zhang, GuoHua Geng, Liuyuxin Yang, JiaRui Yan,, Jingtao Cheng, YaDong Zhang, Kang Li

TL;DR
This paper introduces a zero-shot method for generating style-specific image variations by converting images to text descriptions, elaborating style details, and then synthesizing new images, enabling style transfer with semantic consistency.
Contribution
The study presents a novel zero-shot framework that combines vision-language models, ChatGPT, and diffusion models with a fine-tuning strategy for style and semantic control.
Findings
High-fidelity stylized image generation in zero-shot setting
Effective preservation of semantics across styles
Introduction of a new benchmark and evaluation metrics
Abstract
Traditionally, style has been primarily considered in terms of artistic elements such as colors, brushstrokes, and lighting. However, identical semantic subjects, like people, boats, and houses, can vary significantly across different artistic traditions, indicating that style also encompasses the underlying semantics. Therefore, in this study, we propose a zero-shot scheme for image variation with coordinated semantics. Specifically, our scheme transforms the image-to-image problem into an image-to-text-to-image problem. The image-to-text operation employs vision-language models e.g., BLIP) to generate text describing the content of the input image, including the objects and their positions. Subsequently, the input style keyword is elaborated into a detailed description of this style and then merged with the content text using the reasoning capabilities of ChatGPT. Finally, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Aesthetic Perception and Analysis
MethodsDiffusion
