FINO-Net: A Deep Multimodal Sensor Fusion Framework for Manipulation Failure Detection
Arda Inceoglu, Eren Erdal Aksoy, Abdullah Cihan Ak, Sanem Sariel

TL;DR
FINO-Net is a deep neural network framework that fuses RGB, depth, and audio data to accurately detect and classify manipulation failures in robotic tasks, enhancing safety in unstructured environments.
Contribution
The paper introduces FINO-Net, a novel multimodal sensor fusion neural network, and provides a new dataset for manipulation failure detection in robotics.
Findings
Achieves 98.60% detection accuracy
Achieves 87.31% classification accuracy
Multimodal fusion improves performance significantly
Abstract
Safe manipulation in unstructured environments for service robots is a challenging problem. A failure detection system is needed to monitor and detect unintended outcomes. We propose FINO-Net, a novel multimodal sensor fusion based deep neural network to detect and identify manipulation failures. We also introduce a multimodal dataset, containing 229 real-world manipulation data recorded with a Baxter robot. Our network combines RGB, depth and audio readings to effectively detect and classify failures. Results indicate that fusing RGB with depth and audio modalities significantly improves the performance. FINO-Net achieves 98.60% detection and 87.31% classification accuracy on our novel dataset. Code and data are publicly available at https://github.com/ardai/fino-net.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Anomaly Detection Techniques and Applications · Software Testing and Debugging Techniques
