AquaVLM: Improving Underwater Situation Awareness with Mobile Vision Language Models
Beitong Tian, Lingzhi Zhao, Bo Chen, Haozhen Zheng, Jingcheng Yang, Mingyuan Wu, Deepak Vasisht, Klara Nahrstedt

TL;DR
AquaVLM introduces a mobile vision-language system for underwater communication that generates context-aware messages, enhancing diver safety and communication efficiency using smartphones.
Contribution
The paper presents AquaVLM, a novel underwater communication system with a fine-tuned vision-language model and hierarchical message generation for improved context awareness.
Findings
System effectively generates context-specific messages.
Robustness to transmission errors is improved.
User evaluations show increased communication clarity.
Abstract
Underwater activities like scuba diving enable millions annually to explore marine environments for recreation and scientific research. Maintaining situational awareness and effective communication are essential for diver safety. Traditional underwater communication systems are often bulky and expensive, limiting their accessibility to divers of all levels. While recent systems leverage lightweight smartphones and support text messaging, the messages are predefined and thus restrict context-specific communication. In this paper, we present AquaVLM, a tap-and-send underwater communication system that automatically generates context-aware messages and transmits them using ubiquitous smartphones. Our system features a mobile vision-language model (VLM) fine-tuned on an auto-generated underwater conversation dataset and employs a hierarchical message generation pipeline. We co-design the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
