The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models
Maria-Teresa De Rosa Palmini, Eva Cetinic

TL;DR
This paper introduces the Cultural Reference Transformation (CRT) metric to evaluate diffusion models' ability to recognize and reinterpret culturally shared visual references without mere replication.
Contribution
It proposes a novel evaluation framework and metric for assessing how diffusion models handle multimodal iconicity, distinguishing recognition from realization.
Findings
Models vary in recognition and reliance on replication of cultural references.
Models often reproduce visual structures even with altered textual cues.
Recognition correlates with data frequency, textual uniqueness, and reference popularity.
Abstract
The ambiguity between generalization and memorization in TTI diffusion models becomes pronounced when prompts invoke culturally shared visual references, a phenomenon we term multimodal iconicity. These are instances in which images and texts reflect established cultural associations, such as when a title recalls a familiar artwork or film scene. Such cases challenge existing approaches to evaluating memorization, as they define a setting in which instance-level memorization and culturally grounded generalization are structurally intertwined. To address this challenge, we propose an evaluation framework to assess a model's ability to remain culturally grounded without relying on visual replication. Specifically, we introduce the Cultural Reference Transformation (CRT) metric, which separates two dimensions of model behavior: Recognition, whether a model evokes a reference, from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
