From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition
Mengcheng Huang, Xue Zhou, Chen Xu, and Dapeng Man

TL;DR
This paper demonstrates that large speech models can be effectively transferred to underwater acoustic target recognition, achieving high accuracy and robustness despite limited labeled data.
Contribution
It introduces UATR-SLM, a framework that adapts speech large models for underwater acoustics, showing promising transferability and performance.
Findings
Achieves over 99% in-domain accuracy
Maintains robustness across variable signal lengths
Reaches 96.67% accuracy in cross-domain tests
Abstract
Underwater acoustic target recognition (UATR) plays a vital role in marine applications but remains challenging due to limited labeled data and the complexity of ocean environments. This paper explores a central question: can speech large models (SLMs), trained on massive human speech corpora, be effectively transferred to underwater acoustics? To investigate this, we propose UATR-SLM, a simple framework that reuses the speech feature pipeline, adapts the SLM as an acoustic encoder, and adds a lightweight classifier.Experiments on the DeepShip and ShipsEar benchmarks show that UATR-SLM achieves over 99% in-domain accuracy, maintains strong robustness across variable signal lengths, and reaches up to 96.67% accuracy in cross-domain evaluation. These results highlight the strong transferability of SLMs to UATR, establishing a promising paradigm for leveraging speech foundation models in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUnderwater Acoustics Research · Speech Recognition and Synthesis · Underwater Vehicles and Communication Systems
