USER-VLM 360: Personalized Vision Language Models with User-aware Tuning   for Social Human-Robot Interactions

Hamed Rahimi; Adil Bahaj; Mouad Abrini; Mahdi Khoramshahi; Mounir; Ghogho; Mohamed Chetouani

arXiv:2502.10636·cs.AI·March 3, 2025

USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions

Hamed Rahimi, Adil Bahaj, Mouad Abrini, Mahdi Khoramshahi, Mounir, Ghogho, Mohamed Chetouani

PDF

Open Access 1 Repo

TL;DR

This paper introduces User-VLM 360, a personalized vision-language framework for social robots that adapts interactions to individual users while mitigating biases, achieving state-of-the-art results and real-time deployment.

Contribution

The paper presents a novel user-aware tuning and bias mitigation framework for vision-language models in social robotics, enabling personalized and ethical human-robot interactions.

Findings

01

35.3% improvement in personalized VQA accuracy

02

47.5% enhancement in facial feature understanding

03

15% reduction in bias

Abstract

The integration of vision-language models into robotic systems constitutes a significant advancement in enabling machines to interact with their surroundings in a more intuitive manner. While VLMs offer rich multimodal reasoning, existing approaches lack user-specific adaptability, often relying on generic interaction paradigms that fail to account for individual behavioral, contextual, or socio-emotional nuances. When customization is attempted, ethical concerns arise from unmitigated biases in user data, risking exclusion or unfair treatment. To address these dual challenges, we propose User-VLM 360{\deg}, a holistic framework integrating multimodal user modeling with bias-aware optimization. Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360{\deg}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hamedR96/User-VLM
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques