Style Ambiguity Loss Using CLIP
James Baker

TL;DR
This paper introduces a novel style ambiguity loss for diffusion models that leverages CLIP embeddings and clustering, eliminating the need for labeled data or training classifiers, thereby enhancing creative image generation.
Contribution
It proposes new forms of style ambiguity loss using CLIP without requiring classifiers or labeled datasets, improving diffusion model training.
Findings
Centroids generated via K-means in CLIP space improve style ambiguity.
Using text labels to generate CLIP embeddings is effective for style control.
The method enhances creative image synthesis without labeled data.
Abstract
In this work, we explore using the style ambiguity training objective, originally used to approximate creativity, on a diffusion model. However, this objective requires the use of a pretrained classifier and a labeled dataset. We introduce new forms of style ambiguity loss that do not require training a new classifier or a labeled dataset. Instead of using a classifier, we generate centroids in the CLIP embedding space, and images are classified based on their relative distance to said centroids. We find the centroids via K-means clustering of an unlabeled dataset, as well as using text labels to generate CLIP embeddings, to be used as centroids. Code is available at https://github.com/jamesBaker361/clipcreate
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAesthetic Perception and Analysis
MethodsDiffusion
