Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment

Henglin Liu; Nisha Huang; Chang Liu; Jiangpeng Yan; Huijuan Huang; Jixuan Ying; Tong-Yee Lee; Pengfei Wan; Xiangyang Ji

arXiv:2512.23413·cs.CV·January 6, 2026

Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment

Henglin Liu, Nisha Huang, Chang Liu, Jiangpeng Yan, Huijuan Huang, Jixuan Ying, Tong-Yee Lee, Pengfei Wan, Xiangyang Ji

PDF

Open Access 1 Video

TL;DR

This paper introduces a large-scale multi-dimensional aesthetic description dataset and a novel assessment framework that leverages joint description generation and large language models to improve artistic image aesthetic evaluation, reducing training time.

Contribution

The paper presents RAD, a scalable multi-dimensional aesthetic dataset, and ArtQuant, a new assessment framework integrating description generation and LLMs, advancing aesthetic evaluation methods.

Findings

01

Achieved state-of-the-art performance on multiple datasets.

02

Reduced training epochs by 67%, enhancing efficiency.

03

Provided a scalable, multi-dimensional aesthetic dataset.

Abstract

The aesthetic quality assessment task is crucial for developing a human-aligned quantitative evaluation system for AIGC. However, its inherently complex nature, spanning visual perception, cognition, and emotion, poses fundamental challenges. Although aesthetic descriptions offer a viable representation of this complexity, two critical challenges persist: (1) data scarcity and imbalance: existing dataset overly focuses on visual perception and neglects deeper dimensions due to the expensive manual annotation; and (2) model fragmentation: current visual networks isolate aesthetic attributes with multi-branch encoder, while multimodal methods represented by contrastive learning struggle to effectively process long-form textual descriptions. To resolve challenge (1), we first present the Refined Aesthetic Description (RAD) dataset, a large-scale (70k), multi-dimensional structured dataset,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment· underline

Taxonomy

TopicsAesthetic Perception and Analysis · Visual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis