Demographic User Modeling for Social Robotics with Multimodal   Pre-trained Models

Hamed Rahimi; Mouad Abrini; Mahdi Khoramshahi; and Mohamed Chetouani

arXiv:2502.10642·cs.AI·February 18, 2025

Demographic User Modeling for Social Robotics with Multimodal Pre-trained Models

Hamed Rahimi, Mouad Abrini, Mahdi Khoramshahi, and Mohamed Chetouani

PDF

Open Access

TL;DR

This paper evaluates the use of multimodal pre-trained models, specifically CLIP, for demographic user profiling in social robotics, introduces new datasets, and proposes a masked image modeling strategy to improve demographic attribute recognition.

Contribution

It introduces two new datasets for demographic profiling and proposes a masked image modeling approach to enhance generalization in multimodal user modeling.

Findings

01

CLIP performs poorly without fine-tuning on demographic tasks.

02

Fine-tuning improves CLIP's performance but limitations remain.

03

Masked image modeling can potentially enhance demographic attribute recognition.

Abstract

This paper investigates the performance of multimodal pre-trained models in user profiling tasks based on visual-linguistic demographic data. These models are critical for adapting to the needs and preferences of human users in social robotics, thereby providing personalized responses and enhancing interaction quality. First, we introduce two datasets specifically curated to represent demographic characteristics derived from user facial images. Next, we evaluate the performance of a prominent contrastive multimodal pre-trained model, CLIP, on these datasets, both in its out-of-the-box state and after fine-tuning. Initial results indicate that CLIP performs suboptimal in matching images to demographic descriptions without fine-tuning. Although fine-tuning significantly enhances its predictive capacity, the model continues to exhibit limitations in effectively generalizing subtle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Mobility and Location-Based Analysis · Social Robot Interaction and HRI · Context-Aware Activity Recognition Systems

MethodsContrastive Language-Image Pre-training