The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models
Alfio Ferrara, Sergio Picascia, Elisabetta Rocchetti

TL;DR
This paper investigates how text-to-image diffusion models internally represent content and style in artworks, revealing an emergent understanding of artistic concepts through attention analysis.
Contribution
It introduces a method to analyze content and style encoding in diffusion models using cross-attention heatmaps, providing new insights into their internal representations.
Findings
Content tokens influence object regions; style tokens affect background and textures.
Models show varying degrees of content-style separation depending on prompts.
The study offers a new perspective on how large-scale generative models understand artistic concepts.
Abstract
Text-to-image diffusion models have demonstrated remarkable capabilities in generating artistic content by learning from billions of images, including popular artworks. However, the fundamental question of how these models internally represent concepts, such as content and style in paintings, remains unexplored. Traditional computer vision assumes content and style are orthogonal, but diffusion models receive no explicit guidance about this distinction during training. In this work, we investigate how transformer-based text-to-image diffusion models encode content and style concepts when generating artworks. We leverage cross-attention heatmaps to attribute pixels in generated images to specific prompt tokens, enabling us to isolate image regions influenced by content-describing versus style-describing tokens. Our findings reveal that diffusion models demonstrate varying degrees of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAesthetic Perception and Analysis · Visual Culture and Art Theory · Architecture and Art History Studies
