A Multimodal Data Fusion Attention-Empowered Generative Adversarial Network for Real Time 3D Underwater Sound Speed Field Construction

Wei Huang; Yuqiang Huang; Jixuan Zhou; Hao Zhang; Tianhe Xu; Qian Sun; Fang Ji

arXiv:2507.11812·cs.SD·May 5, 2026

A Multimodal Data Fusion Attention-Empowered Generative Adversarial Network for Real Time 3D Underwater Sound Speed Field Construction

Wei Huang, Yuqiang Huang, Jixuan Zhou, Hao Zhang, Tianhe Xu, Qian Sun, Fang Ji

PDF

TL;DR

This paper introduces a novel multimodal data fusion GAN with attention mechanisms for real-time 3D underwater sound speed field reconstruction, eliminating the need for on-site data collection.

Contribution

It proposes MDF-RAGAN, a new deep learning architecture that integrates attention and residual modules to improve sound speed profile accuracy using multimodal data.

Findings

01

Achieves less than 0.3 m/s estimation error on real-world data.

02

Reduces RMSE by nearly 50% compared to CNN and SITP methods.

03

Attains 65.8% RMSE reduction relative to mean profile method.

Abstract

Sound speed profiles (SSPs) are crucial underwater parameters that determine the propagation patterns of acoustic signals, directly influencing the energy efficiency of underwater communication and the accuracy of positioning systems. Conventional techniques for obtaining SSPs, such as matched field processing (MFP), compressive sensing (CS), and deep learning (DL), typically depend on on-site sonar measurements, which impose stringent requirements on the deployment of underwater observation systems. To overcome this limitation and enable high-precision sound speed field reconstruction without the need for on-site underwater data collection, we propose a novel multimodal data-fusion generative adversarial network enhanced with residual attention blocks (MDF-RAGAN). This architecture integrates attention mechanisms to capture global spatial feature correlations effectively, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.