A Creative Agent is Worth a 64-Token Template
Ruixiao Shi, Fu Feng, Yucheng Xie, Xu Yang, Jing Wang, Xin Geng

TL;DR
This paper introduces CAT, a creative tokenization framework that encodes creative understanding into reusable tokens, significantly improving the efficiency and quality of text-to-image generation with creative prompts.
Contribution
The paper presents a novel Creative Tokenizer trained via semantic disentanglement, enabling reusable tokens that enhance creativity in T2I models without costly prompt augmentation.
Findings
Achieves 3.7x speedup in image generation
Reduces computational cost by 4.8x
Produces images with higher human preference and better text-image alignment
Abstract
Text-to-image (T2I) models have substantially improved image fidelity and prompt adherence, yet their creativity remains constrained by reliance on discrete natural language prompts. When presented with fuzzy prompts such as ``a creative vinyl record-inspired skyscraper'', these models often fail to infer the underlying creative intent, leaving creative ideation and prompt design largely to human users. Recent reasoning- or agent-driven approaches iteratively augment prompts but incur high computational and monetary costs, as their instance-specific generation makes ``creativity'' costly and non-reusable, requiring repeated queries or reasoning for subsequent generations. To address this, we introduce \textbf{CAT}, a framework for \textbf{C}reative \textbf{A}gent \textbf{T}okenization that encapsulates agents' intrinsic understanding of ``creativity'' through a \textit{Creative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games
