StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles   with Dual Binding

Junseo Park; Beomseok Ko; Hyeryung Jang

arXiv:2404.05256·cs.CV·July 18, 2024·1 cites

StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding

Junseo Park, Beomseok Ko, Hyeryung Jang

PDF

Open Access

TL;DR

StyleForge introduces a dual binding approach for personalized text-to-image synthesis, enabling the generation of images in diverse artistic styles with improved quality and style fidelity.

Contribution

The paper presents Single-StyleForge and Multi-StyleForge, novel methods that enhance style representation and image quality in style-specific text-to-image synthesis.

Findings

01

Significant improvements in FID, KID, and CLIP scores across six styles.

02

Effective binding of style attributes with minimal images (15-20).

03

Enhanced consistency and perceptual fidelity in generated images.

Abstract

Recent advancements in text-to-image models, such as Stable Diffusion, have showcased their ability to create visual images from natural language prompts. However, existing methods like DreamBooth struggle with capturing arbitrary art styles due to the abstract and multifaceted nature of stylistic attributes. We introduce Single-StyleForge, a novel approach for personalized text-to-image synthesis across diverse artistic styles. Using approximately 15 to 20 images of the target style, Single-StyleForge establishes a foundational binding of a unique token identifier with a broad range of attributes of the target style. Additionally, auxiliary images are incorporated for dual binding that guides the consistent representation of crucial elements such as people within the target style. Furthermore, we present Multi-StyleForge, which enhances image quality and text alignment by binding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis · Image Retrieval and Classification Techniques

MethodsContrastive Language-Image Pre-training · Diffusion