Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Ruixiang Jiang; Changwen Chen

arXiv:2501.09012·cs.CV·September 3, 2025

Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Ruixiang Jiang, Changwen Chen

PDF

1 Repo 1 Datasets

TL;DR

This paper explores how multimodal large language models can be prompted to perform aesthetic judgments in art, revealing their reasoning process and addressing hallucinations to better align with human aesthetic understanding.

Contribution

It introduces an evidence-based prompting method, ArtCoT, that enhances MLLMs' aesthetic reasoning, reducing hallucinations and improving alignment with human judgments.

Findings

01

MLLMs can perform aesthetic reasoning with proper prompting.

02

Hallucinations in MLLMs can be mitigated through evidence-based prompts.

03

Enhanced reasoning aligns better with human aesthetic judgments.

Abstract

The rapid technical progress of generative art (GenArt) has democratized the creation of visually appealing imagery. However, achieving genuine artistic impact - the kind that resonates with viewers on a deeper, more meaningful level - remains formidable as it requires a sophisticated aesthetic sensibility. This sensibility involves a multifaceted cognitive process extending beyond mere visual appeal, which is often overlooked by current computational methods. This paper pioneers an approach to capture this complex process by investigating how the reasoning capabilities of Multimodal LLMs (MLLMs) can be effectively elicited to perform aesthetic judgment. Our analysis reveals a critical challenge: MLLMs exhibit a tendency towards hallucinations during aesthetic reasoning, characterized by subjective opinions and unsubstantiated artistic interpretations. We further demonstrate that these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

songrise/mllm4art
noneOfficial

Datasets

Ruixiang/FineArtBench
dataset· 3.0k dl
3.0k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN