Safe Multimodal Communication in Human-Robot Collaboration
Davide Ferrari, Andrea Pupa, Alberto Signoretti, Cristian Secchi

TL;DR
This paper presents a framework for safe, multimodal human-robot communication using voice and gesture fusion, ensuring safety compliance and improved collaboration efficiency in industrial settings.
Contribution
It introduces a novel multimodal communication framework that combines voice and gesture inputs with safety regulation adherence for human-robot collaboration.
Findings
Multimodal communication improves information extraction for robot tasks.
The safety layer allows robots to adjust speed for operator safety.
Experimental validation shows enhanced collaboration efficiency.
Abstract
The new industrial settings are characterized by the presence of human and robots that work in close proximity, cooperating in performing the required job. Such a collaboration, however, requires to pay attention to many aspects. Firstly, it is crucial to enable a communication between this two actors that is natural and efficient. Secondly, the robot behavior must always be compliant with the safety regulations, ensuring always a safe collaboration. In this paper, we propose a framework that enables multi-channel communication between humans and robots by leveraging multimodal fusion of voice and gesture commands while always respecting safety regulations. The framework is validated through a comparative experiment, demonstrating that, thanks to multimodal communication, the robot can extract valuable information for performing the required task and additionally, with the safety layer,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Social Robot Interaction and HRI · Robotics and Automated Systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
