A Creative Agent is Worth a 64-Token Template

Ruixiao Shi; Fu Feng; Yucheng Xie; Xu Yang; Jing Wang; Xin Geng

arXiv:2603.17895·cs.CV·March 19, 2026

A Creative Agent is Worth a 64-Token Template

Ruixiao Shi, Fu Feng, Yucheng Xie, Xu Yang, Jing Wang, Xin Geng

PDF

Open Access

TL;DR

This paper introduces CAT, a creative tokenization framework that encodes creative understanding into reusable tokens, significantly improving the efficiency and quality of text-to-image generation with creative prompts.

Contribution

The paper presents a novel Creative Tokenizer trained via semantic disentanglement, enabling reusable tokens that enhance creativity in T2I models without costly prompt augmentation.

Findings

01

Achieves 3.7x speedup in image generation

02

Reduces computational cost by 4.8x

03

Produces images with higher human preference and better text-image alignment

Abstract

Text-to-image (T2I) models have substantially improved image fidelity and prompt adherence, yet their creativity remains constrained by reliance on discrete natural language prompts. When presented with fuzzy prompts such as ``a creative vinyl record-inspired skyscraper'', these models often fail to infer the underlying creative intent, leaving creative ideation and prompt design largely to human users. Recent reasoning- or agent-driven approaches iteratively augment prompts but incur high computational and monetary costs, as their instance-specific generation makes ``creativity'' costly and non-reusable, requiring repeated queries or reasoning for subsequent generations. To address this, we introduce \textbf{CAT}, a framework for \textbf{C}reative \textbf{A}gent \textbf{T}okenization that encapsulates agents' intrinsic understanding of ``creativity'' through a \textit{Creative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games