AesTest: Measuring Aesthetic Intelligence from Perception to Production
Guolong Wang, Heng Huang, Zhiqiang Zhang, Wentian Li, Feilong Ma, Xin Jin

TL;DR
AesTest is a comprehensive benchmark designed to evaluate multimodal large language models' abilities in aesthetic perception and production across diverse tasks, sources, and aesthetic query types, addressing current limitations in scope and diversity.
Contribution
The paper introduces AesTest, a novel benchmark with diverse tasks and data sources to systematically evaluate aesthetic intelligence in multimodal models.
Findings
Instruction-tuned IAA MLLMs show significant challenges on AesTest.
The benchmark covers perception, appreciation, creation, and photography tasks.
AesTest will be publicly released for future research.
Abstract
Perceiving and producing aesthetic judgments is a fundamental yet underexplored capability for multimodal large language models (MLLMs). However, existing benchmarks for image aesthetic assessment (IAA) are narrow in perception scope or lack the diversity needed to evaluate systematic aesthetic production. To address this gap, we introduce AesTest, a comprehensive benchmark for multimodal aesthetic perception and production, distinguished by the following features: 1) It consists of curated multiple-choice questions spanning ten tasks, covering perception, appreciation, creation, and photography. These tasks are grounded in psychological theories of generative learning. 2) It integrates data from diverse sources, including professional editing workflows, photographic composition tutorials, and crowdsourced preferences. It ensures coverage of both expert-level principles and real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Aesthetic Perception and Analysis · Multimodal Machine Learning Applications
