HARMONI: Multimodal Personalization of Multi-User Human-Robot Interactions with LLMs
Jeanne Mal\'ecot, Hamed Rahimi, Jeanne Cattoni, Marie Samson, Mouad Abrini, Mahdi Khoramshahi, Maribel Pino, Mohamed Chetouani

TL;DR
HARMONI is a multimodal framework that uses large language models to enable personalized, ethically aware, multi-user human-robot interactions, improving long-term engagement and user satisfaction in real-world settings.
Contribution
The paper introduces HARMONI, a novel multimodal personalization framework that integrates perception, world modeling, user profiling, and response generation for improved multi-user human-robot interaction.
Findings
Supports robust speaker identification and online memory updating
Outperforms baseline approaches in personalization accuracy
Enhances user satisfaction in real-world scenarios
Abstract
Existing human-robot interaction systems often lack mechanisms for sustained personalization and dynamic adaptation in multi-user environments, limiting their effectiveness in real-world deployments. We present HARMONI, a multimodal personalization framework that leverages large language models to enable socially assistive robots to manage long-term multi-user interactions. The framework integrates four key modules: (i) a perception module that identifies active speakers and extracts multimodal input; (ii) a world modeling module that maintains representations of the environment and short-term conversational context; (iii) a user modeling module that updates long-term speaker-specific profiles; and (iv) a generation module that produces contextually grounded and ethically informed responses. Through extensive evaluation and ablation studies on four datasets, as well as a real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Speech and dialogue systems · Multimodal Machine Learning Applications
