Style Ambiguity Loss Using CLIP

James Baker

arXiv:2410.02055·cs.CV·August 19, 2025

Style Ambiguity Loss Using CLIP

James Baker

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel style ambiguity loss for diffusion models that leverages CLIP embeddings and clustering, eliminating the need for labeled data or training classifiers, thereby enhancing creative image generation.

Contribution

It proposes new forms of style ambiguity loss using CLIP without requiring classifiers or labeled datasets, improving diffusion model training.

Findings

01

Centroids generated via K-means in CLIP space improve style ambiguity.

02

Using text labels to generate CLIP embeddings is effective for style control.

03

The method enhances creative image synthesis without labeled data.

Abstract

In this work, we explore using the style ambiguity training objective, originally used to approximate creativity, on a diffusion model. However, this objective requires the use of a pretrained classifier and a labeled dataset. We introduce new forms of style ambiguity loss that do not require training a new classifier or a labeled dataset. Instead of using a classifier, we generate centroids in the CLIP embedding space, and images are classified based on their relative distance to said centroids. We find the centroids via K-means clustering of an unlabeled dataset, as well as using text labels to generate CLIP embeddings, to be used as centroids. Code is available at https://github.com/jamesBaker361/clipcreate

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jamesbaker361/clipcreate
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAesthetic Perception and Analysis

MethodsDiffusion