StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding
Junseo Park, Beomseok Ko, Hyeryung Jang

TL;DR
StyleForge introduces a dual binding approach for personalized text-to-image synthesis, enabling the generation of images in diverse artistic styles with improved quality and style fidelity.
Contribution
The paper presents Single-StyleForge and Multi-StyleForge, novel methods that enhance style representation and image quality in style-specific text-to-image synthesis.
Findings
Significant improvements in FID, KID, and CLIP scores across six styles.
Effective binding of style attributes with minimal images (15-20).
Enhanced consistency and perceptual fidelity in generated images.
Abstract
Recent advancements in text-to-image models, such as Stable Diffusion, have showcased their ability to create visual images from natural language prompts. However, existing methods like DreamBooth struggle with capturing arbitrary art styles due to the abstract and multifaceted nature of stylistic attributes. We introduce Single-StyleForge, a novel approach for personalized text-to-image synthesis across diverse artistic styles. Using approximately 15 to 20 images of the target style, Single-StyleForge establishes a foundational binding of a unique token identifier with a broad range of attributes of the target style. Additionally, auxiliary images are incorporated for dual binding that guides the consistent representation of crucial elements such as people within the target style. Furthermore, we present Multi-StyleForge, which enhances image quality and text alignment by binding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis · Image Retrieval and Classification Techniques
MethodsContrastive Language-Image Pre-training · Diffusion
