What Do Vision-Language Models Encode for Personalized Image Aesthetics Assessment?

Koki Ryu; Hitomi Yanaka

arXiv:2604.11374·cs.CV·April 14, 2026

What Do Vision-Language Models Encode for Personalized Image Aesthetics Assessment?

Koki Ryu, Hitomi Yanaka

PDF

1 Repo

TL;DR

This paper investigates how vision-language models encode aesthetic attributes and demonstrates their potential for personalized image aesthetics assessment without fine-tuning.

Contribution

It reveals that VLMs encode diverse aesthetic attributes useful for personalization and shows that simple linear models can effectively perform PIAA.

Findings

01

VLMs encode diverse aesthetic attributes across layers.

02

Simple linear models can perform PIAA effectively.

03

Aesthetic information transfer varies across architectures and domains.

Abstract

Personalized image aesthetics assessment (PIAA) is an important research problem with practical real-world applications. While methods based on vision-language models (VLMs) are promising candidates for PIAA, it remains unclear whether they internally encode rich, multi-level aesthetic attributes required for effective personalization. In this paper, we first analyze the internal representations of VLMs to examine the presence and distribution of such aesthetic attributes, and then leverage them for lightweight, individual-level personalization without model fine-tuning. Our analysis reveals that VLMs encode diverse aesthetic attributes that propagate into the language decoder layers. Building on these representations, we demonstrate that simple linear models can perform PIAA effectively. We further analyze how aesthetic information is transferred across layers in different VLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ynklab/vlm-latent-piaa
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.