UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark
Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin, Zheng, Li Yuan, Pengfei Wan, Di Zhang

TL;DR
This paper introduces UNIAA, a unified multi-modal framework for image aesthetic assessment using large language models, along with a comprehensive benchmark to evaluate multi-level aesthetic understanding, demonstrating promising results and potential for future improvements.
Contribution
The work presents a novel unified multi-modal approach for image aesthetic assessment and establishes a comprehensive benchmark to evaluate multi-level aesthetic understanding.
Findings
UNIAA-LLaVA outperforms existing MLLMs in aesthetic perception.
UNIAA-LLaVA approaches junior-level human performance.
The benchmark covers perception, description, and assessment levels.
Abstract
As an alternative to expensive expert evaluation, Image Aesthetic Assessment (IAA) stands out as a crucial task in computer vision. However, traditional IAA methods are typically constrained to a single data source or task, restricting the universality and broader application. In this work, to better align with human aesthetics, we propose a Unified Multi-modal Image Aesthetic Assessment (UNIAA) framework, including a Multi-modal Large Language Model (MLLM) named UNIAA-LLaVA and a comprehensive benchmark named UNIAA-Bench. We choose MLLMs with both visual perception and language ability for IAA and establish a low-cost paradigm for transforming the existing datasets into unified and high-quality visual instruction tuning data, from which the UNIAA-LLaVA is trained. To further evaluate the IAA capability of MLLMs, we construct the UNIAA-Bench, which consists of three aesthetic levels:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAesthetic Perception and Analysis
MethodsALIGN
