Fine-Tuning Stable Diffusion XL for Stylistic Icon Generation: A   Comparison of Caption Size

Youssef Sultan; Jiangqin Ma; Yu-Ying Liao

arXiv:2407.08513·cs.CV·July 16, 2024·1 cites

Fine-Tuning Stable Diffusion XL for Stylistic Icon Generation: A Comparison of Caption Size

Youssef Sultan, Jiangqin Ma, Yu-Ying Liao

PDF

Open Access

TL;DR

This paper explores fine-tuning methods for Stable Diffusion XL to generate stylistic icons, emphasizing the importance of proper evaluation metrics beyond FID scores, and highlights the limitations of CLIP scores in icon quality assessment.

Contribution

It introduces tailored fine-tuning techniques and critiques existing evaluation metrics for icon generation, proposing more effective approaches for commercial applications.

Findings

01

FID scores may not reflect icon quality accurately

02

CLIP scores can misjudge icon similarity and quality

03

Proper evaluation metrics are crucial for commercial icon generation

Abstract

In this paper, we show different fine-tuning methods for Stable Diffusion XL; this includes inference steps, and caption customization for each image to align with generating images in the style of a commercial 2D icon training set. We also show how important it is to properly define what "high-quality" really is especially for a commercial-use environment. As generative AI models continue to gain widespread acceptance and usage, there emerge many different ways to optimize and evaluate them for various applications. Specifically text-to-image models, such as Stable Diffusion XL and DALL-E 3 require distinct evaluation practices to effectively generate high-quality icons according to a specific style. Although some images that are generated based on a certain style may have a lower FID score (better), we show how this is not absolute in and of itself even for rasterized icons. While FID…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Video Analysis and Summarization

MethodsContrastive Language-Image Pre-training · ALIGN · Diffusion