Cross-Cultural Expert-Level Art Critique Evaluation with Vision-Language Models
Haorui Yu, Xuehang Wen, Fengrui Zhang, Qiufeng Yi

TL;DR
This paper introduces a new evaluation framework for assessing vision-language models' cultural understanding in art critique, addressing the limitations of existing metrics and benchmarks in capturing cultural interpretation.
Contribution
It presents a novel, multi-tiered evaluation method grounded in art theory, specifically designed to measure cross-cultural understanding in generative art critique by VLMs.
Findings
Automated metrics and judge scoring measure different constructs.
Single-judge calibration is more reliable than automated metrics.
Cultural understanding decreases from visual description to interpretation.
Abstract
Vision-Language Models (VLMs) excel at visual description yet remain under-validated for cultural interpretation. Existing benchmarks assess perception without interpretation, and common evaluation proxies, such as automated metrics and LLM-judge averaging, are unreliable for culturally sensitive generative tasks. We address this measurement gap with a tri-tier evaluation framework grounded in art-theoretical constructs (Section 2). The framework operationalises cultural understanding through five levels (L1--L5) and 165 culture-specific dimensions across six traditions: Tier I computes automated quality indicators, Tier II applies rubric-based single-judge scoring, and Tier III calibrates the aggregate score to human expert ratings via sigmoid calibration. Applied to 15 VLMs across 294 evaluation pairs, the validated instrument reveals that (i) automated metrics and judge scoring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAesthetic Perception and Analysis · Generative Adversarial Networks and Image Synthesis · Art Education and Development
