Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation

Dimitrios Dagdilelis; Panagiotis Grigoriadis; Roberto Galeazzi

arXiv:2505.01615·cs.CV·May 6, 2025

Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation

Dimitrios Dagdilelis, Panagiotis Grigoriadis, Roberto Galeazzi

PDF

Open Access

TL;DR

This paper introduces a cross attention transformer model that fuses multimodal sensor data, including RGB, infrared, LiDAR, radar, and electronic charts, to create a detailed scene view for safer autonomous marine navigation, validated through real sea trials.

Contribution

It presents a novel deep fusion approach using cross attention transformers for integrating diverse maritime sensors, enhancing scene understanding for autonomous vessels.

Findings

01

Improved navigational accuracy in complex maritime environments.

02

Robust scene representation under adverse weather conditions.

03

Validated effectiveness through real-world sea trials.

Abstract

We propose a cross attention transformer based method for multimodal sensor fusion to build a birds eye view of a vessels surroundings supporting safer autonomous marine navigation. The model deeply fuses multiview RGB and long wave infrared images with sparse LiDAR point clouds. Training also integrates X band radar and electronic chart data to inform predictions. The resulting view provides a detailed reliable scene representation improving navigational accuracy and robustness. Real world sea trials confirm the methods effectiveness even in adverse weather and complex maritime settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMaritime Navigation and Safety

MethodsSoftmax · Attention Is All You Need