Enhancing Situational Awareness in Underwater Robotics with Multi-modal Spatial Perception
Pushyami Kaveti, Ambjorn Grimsrud Waldum, Hanumant Singh, and Martin Ludvigsen

TL;DR
This paper introduces a multi-modal sensing approach combining cameras, IMUs, and acoustic sensors to improve underwater SLAM, achieving robust real-time 3D mapping in challenging underwater environments.
Contribution
It presents a novel multi-sensor fusion framework for underwater SLAM that integrates geometric, learning-based, and semantic techniques, validated through real-world field deployments.
Findings
Real-time state estimation achieved in challenging conditions
High-quality 3D reconstructions demonstrated
Multi-modal fusion improves robustness over single-sensor methods
Abstract
Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs) demand robust spatial perception capabilities, including Simultaneous Localization and Mapping (SLAM), to support both remote and autonomous tasks. Vision-based systems have been integral to these advancements, capturing rich color and texture at low cost while enabling semantic scene understanding. However, underwater conditions -- such as light attenuation, backscatter, and low contrast -- often degrade image quality to the point where traditional vision-based SLAM pipelines fail. Moreover, these pipelines typically rely on monocular or stereo inputs, limiting their scalability to the multi-camera configurations common on many vehicles. To address these issues, we propose to leverage multi-modal sensing that fuses data from multiple sensors-including cameras, inertial measurement units (IMUs), and acoustic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Underwater Vehicles and Communication Systems · Advanced Vision and Imaging
