Does AI See like Art Historians? Interpreting How Vision Language Models Recognize Artistic Style

Marvin Limpijankit; Milad Alshomary; Yassin Oulad Daoud; Amith Ananthram; Tim Trombley; Emily L. Spratt; Anna Filonenko; Hannah Pivo; Elias Stengel-Eskin; Mohit Bansal; Noam M. Elcott; Kathleen McKeown

arXiv:2603.11024·cs.CV·May 20, 2026

Does AI See like Art Historians? Interpreting How Vision Language Models Recognize Artistic Style

Marvin Limpijankit, Milad Alshomary, Yassin Oulad Daoud, Amith Ananthram, Tim Trombley, Emily L. Spratt, Anna Filonenko, Hannah Pivo, Elias Stengel-Eskin, Mohit Bansal, Noam M. Elcott, Kathleen McKeown

PDF

TL;DR

This paper investigates how vision language models recognize artistic style, comparing their mechanisms with art historians' criteria through quantitative and causal analyses.

Contribution

It introduces a latent-space decomposition method to interpret VLMs' style prediction and evaluates their alignment with art historical reasoning.

Findings

01

73% of concepts identified are semantically meaningful according to art historians.

02

90% of style-predicting concepts are deemed relevant by art historians.

03

Models sometimes use formal features like contrast to predict style, aligning with art historical reasoning.

Abstract

VLMs have become increasingly proficient at a range of computer vision tasks, such as visual question answering and object detection. This includes increasingly strong capabilities in the domain of art, from analyzing artwork to generation of art. In an interdisciplinary collaboration between computer scientists and art historians, we characterize the mechanisms underlying VLMs' ability to predict artistic style and assess the extent to which they align with the criteria art historians use to reason about artistic style. We employ a latent-space decomposition approach to identify concepts that drive art style prediction and conduct quantitative evaluations, causal analysis and assessment by art historians. Our findings indicate that 73% of the extracted concepts are judged by art historians to exhibit a coherent and semantically meaningful visual feature and 90% of concepts used to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.