Local Representative Token Guided Merging for Text-to-Image Generation
Min-Jeong Lee, Hee-Dong Kim, and Seong-Whan Lee

TL;DR
This paper introduces ReToM, a local token merging method for text-to-image models that improves efficiency and image quality by selecting representative tokens within attention windows, balancing speed and fidelity.
Contribution
ReToM is a novel token merging strategy that adaptively defines local boundaries and selects representative tokens based on similarity, enhancing efficiency without sacrificing image quality.
Findings
ReToM improves FID by 6.2% over baseline.
ReToM maintains comparable inference time to existing methods.
ReToM effectively balances visual quality and computational efficiency.
Abstract
Stable diffusion is an outstanding image generation model for text-to-image, but its time-consuming generation process remains a challenge due to the quadratic complexity of attention operations. Recent token merging methods improve efficiency by reducing the number of tokens during attention operations, but often overlook the characteristics of attention-based image generation models, limiting their effectiveness. In this paper, we propose local representative token guided merging (ReToM), a novel token merging strategy applicable to any attention mechanism in image generation. To merge tokens based on various contextual information, ReToM defines local boundaries as windows within attention inputs and adjusts window sizes. Furthermore, we introduce a representative token, which represents the most representative token per window by computing similarity at a specific timestep and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Digital Humanities and Scholarship · Multimedia Communication and Technology
