Aligning VLM Assistants with Personalized Situated Cognition

Yongqi Li; Shen Zhou; Xiaohu Li; Xin Miao; Jintao Wen; Mayi Xu; Jianhao Chen; Birong Pan; Hankun Kang; Yuanyuan Zhu; Ming Zhong; Tieyun Qian

arXiv:2506.00930·cs.AI·June 3, 2025

Aligning VLM Assistants with Personalized Situated Cognition

Yongqi Li, Shen Zhou, Xiaohu Li, Xin Miao, Jintao Wen, Mayi Xu, Jianhao Chen, Birong Pan, Hankun Kang, Yuanyuan Zhu, Ming Zhong, Tieyun Qian

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces a new framework and benchmark for aligning vision-language model assistants with personalized situated cognition, accounting for individual differences in expectations and actions.

Contribution

It proposes a novel personalized alignment framework using a cognition-aware reward model and creates a benchmark with diverse individuals for evaluation.

Findings

01

The PCogAlignBench benchmark effectively captures personalized cognition.

02

The PCogAlign framework improves alignment with individual expectations.

03

Experimental results show enhanced personalized assistance performance.

Abstract

Vision-language models (VLMs) aligned with general human objectives, such as being harmless and hallucination-free, have become valuable assistants of humans in managing visual tasks. However, people with diversified backgrounds have different cognition even in the same situation. Consequently, they may have personalized expectations for VLM assistants. This highlights the urgent need to align VLM assistants with personalized situated cognition for real-world assistance. To study this problem, we first simplify it by characterizing individuals based on the sociological concept of Role-Set. Then, we propose to evaluate the individuals' actions to examine whether the personalized alignment is achieved. Further, we construct a benchmark named PCogAlignBench, which includes 18k instances and 20 individuals with different Role-Sets. Finally, we present a framework called PCogAlign, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

YongqiLi/PCogAlignBench
dataset· 334 dl
334 dl

Videos

Aligning VLM Assistants with Personalized Situated Cognition· underline

Taxonomy

TopicsTactile and Sensory Interactions · Constraint Satisfaction and Optimization · Transportation Planning and Optimization