Multimodal Anomaly Detection for Human-Robot Interaction

Guilherme Ribeiro; Iordanis Antypas; Leonardo Bizzaro; Jo\~ao Bimbo; Nuno Cruz Garcia

arXiv:2604.09326·cs.RO·April 13, 2026

Multimodal Anomaly Detection for Human-Robot Interaction

Guilherme Ribeiro, Iordanis Antypas, Leonardo Bizzaro, Jo\~ao Bimbo, Nuno Cruz Garcia

PDF

TL;DR

This paper introduces MADRI, a multimodal anomaly detection framework for human-robot interaction that uses feature vectors from videos, sensors, and scene graphs to improve safety and reliability.

Contribution

MADRI is the first approach to perform reconstruction-based anomaly detection directly on feature vectors from multiple modalities in HRI.

Findings

01

Vision-based feature vector reconstruction effectively detects anomalies.

02

Adding sensor data and scene graphs enhances detection accuracy.

03

Multimodal reconstruction improves robustness in anomaly detection.

Abstract

Ensuring safety and reliability in human-robot interaction (HRI) requires the timely detection of unexpected events that could lead to system failures or unsafe behaviours. Anomaly detection thus plays a critical role in enabling robots to recognize and respond to deviations from normal operation during collaborative tasks. While reconstruction models have been actively explored in HRI, approaches that operate directly on feature vectors remain largely unexplored. In this work, we propose MADRI, a framework that first transforms video streams into semantically meaningful feature vectors before performing reconstruction-based anomaly detection. Additionally, we augment these visual feature vectors with the robot's internal sensors' readings and a Scene Graph, enabling the model to capture both external anomalies in the visual environment and internal failures within the robot itself. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.