Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Michael Toker, Ido Galil, Hadas Orgad, Rinon Gal, Yoad Tewel, Gal, Chechik, Yonatan Belinkov

TL;DR
This paper investigates the role of padding tokens in text-to-image diffusion models, revealing their impact during encoding, diffusion, or being ignored, and linking these effects to model architecture and training.
Contribution
It provides the first mechanistic analysis of padding tokens in T2I models, introducing causal techniques to understand their influence on image generation.
Findings
Padding tokens can influence output during text encoding, diffusion, or be ignored.
The impact depends on model architecture and training process.
Insights may inform future T2I model design and training practices.
Abstract
Text-to-image (T2I) diffusion models rely on encoded prompts to guide the image generation process. Typically, these prompts are extended to a fixed length by adding padding tokens before text encoding. Despite being a default practice, the influence of padding tokens on the image generation process has not been investigated. In this work, we conduct the first in-depth analysis of the role padding tokens play in T2I models. We develop two causal techniques to analyze how information is encoded in the representation of tokens across different components of the T2I pipeline. Using these techniques, we investigate when and how padding tokens impact the image generation process. Our findings reveal three distinct scenarios: padding tokens may affect the model's output during text encoding, during the diffusion process, or be effectively ignored. Moreover, we identify key relationships…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTactile and Sensory Interactions · Architecture and Computational Design · Modular Robots and Swarm Intelligence
MethodsDiffusion
