CLUE: Controllable Latent space of Unprompted Embeddings for Diversity Management in Text-to-Image Synthesis
Keunwoo Park, Jihye Chae, Joong Ho Ahn, Jihoon Kweon

TL;DR
CLUE is a novel generative framework that produces diverse, stable images from limited data using fixed prompts, enhancing data augmentation especially in specialized fields like medicine.
Contribution
Introduces CLUE, a style-based generative model that achieves diversity and stability without extra data, utilizing a new attention layer and Gaussian latent space.
Findings
Significantly reduces FID in medical image synthesis.
Improves recall and F1 scores with synthetic data augmentation.
Effective in domain-specific applications with limited datasets.
Abstract
Text-to-image synthesis models require the ability to generate diverse images while maintaining stability. To overcome this challenge, a number of methods have been proposed, including the collection of prompt-image datasets and the integration of additional data modalities during training. Although these methods have shown promising results in general domains, they face limitations when applied to specialized fields such as medicine, where only limited types and insufficient amounts of data are available. We present CLUE (Controllable Latent space of Unprompted Embeddings), a generative model framework that achieves diverse generation while maintaining stability through fixed-format prompts without requiring any additional data. Based on the Stable Diffusion architecture, CLUE employs a Style Encoder that processes images and prompts to generate style embeddings, which are subsequently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies · Face recognition and analysis
