AquaVLM: Improving Underwater Situation Awareness with Mobile Vision Language Models

Beitong Tian; Lingzhi Zhao; Bo Chen; Haozhen Zheng; Jingcheng Yang; Mingyuan Wu; Deepak Vasisht; Klara Nahrstedt

arXiv:2510.21722·cs.HC·October 28, 2025

AquaVLM: Improving Underwater Situation Awareness with Mobile Vision Language Models

Beitong Tian, Lingzhi Zhao, Bo Chen, Haozhen Zheng, Jingcheng Yang, Mingyuan Wu, Deepak Vasisht, Klara Nahrstedt

PDF

TL;DR

AquaVLM introduces a mobile vision-language system for underwater communication that generates context-aware messages, enhancing diver safety and communication efficiency using smartphones.

Contribution

The paper presents AquaVLM, a novel underwater communication system with a fine-tuned vision-language model and hierarchical message generation for improved context awareness.

Findings

01

System effectively generates context-specific messages.

02

Robustness to transmission errors is improved.

03

User evaluations show increased communication clarity.

Abstract

Underwater activities like scuba diving enable millions annually to explore marine environments for recreation and scientific research. Maintaining situational awareness and effective communication are essential for diver safety. Traditional underwater communication systems are often bulky and expensive, limiting their accessibility to divers of all levels. While recent systems leverage lightweight smartphones and support text messaging, the messages are predefined and thus restrict context-specific communication. In this paper, we present AquaVLM, a tap-and-send underwater communication system that automatically generates context-aware messages and transmits them using ubiquitous smartphones. Our system features a mobile vision-language model (VLM) fine-tuned on an auto-generated underwater conversation dataset and employs a hierarchical message generation pipeline. We co-design the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.