Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents
Sabit Hassan, Hye-Young Chung, Xiang Zhi Tan, Malihe Alikhani

TL;DR
This paper introduces M-CoDAL, a multimodal dialogue system for embodied robots that uses discourse coherence and active learning with LLMs to improve safety-critical communication and understanding in real-world scenarios.
Contribution
The paper presents a novel coherence-driven multimodal dialogue system with a clustering-based active learning mechanism utilizing LLMs, evaluated on a new safety violation dataset and real-world robot deployment.
Findings
Improved safety situation resolution and user sentiment.
Enhanced safety of conversations in multimodal interactions.
System outperforms baseline in real-world user study.
Abstract
When assisting people in daily tasks, robots need to accurately interpret visual cues and respond effectively in diverse safety-critical situations, such as sharp objects on the floor. In this context, we present M-CoDAL, a multimodal-dialogue system specifically designed for embodied agents to better understand and communicate in safety-critical situations. The system leverages discourse coherence relations to enhance its contextual understanding and communication abilities. To train this system, we introduce a novel clustering-based active learning mechanism that utilizes an external Large Language Model (LLM) to identify informative instances. Our approach is evaluated using a newly created multimodal dataset comprising 1K safety violations extracted from 2K Reddit images. These violations are annotated using a Large Multimodal Model (LMM) and verified by human annotators. Results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multi-Agent Systems and Negotiation · Social Robot Interaction and HRI
Methodsbye
