Assessing Vision-Language Models for Perception in Autonomous Underwater Robotic Software

Muhammad Yousaf; Aitor Arrieta; Shaukat Ali; Paolo Arcaini; Shuai Wang

arXiv:2602.10655·cs.SE·March 31, 2026

Assessing Vision-Language Models for Perception in Autonomous Underwater Robotic Software

Muhammad Yousaf, Aitor Arrieta, Shaukat Ali, Paolo Arcaini, Shuai Wang

PDF

TL;DR

This paper empirically evaluates vision-language models for perception tasks in autonomous underwater robots, focusing on their ability to detect underwater trash amidst challenging conditions.

Contribution

It provides an assessment of VLM performance and uncertainty in underwater environments, aiding software engineers in selecting suitable models.

Findings

01

VLMs show potential for underwater perception tasks.

02

Uncertainty correlates with detection performance.

03

Performance varies across different VLMs and conditions.

Abstract

Autonomous Underwater Robots (AURs) operate in challenging underwater environments, including low visibility and harsh water conditions. Such conditions present challenges for software engineers developing perception modules for the AUR software. To successfully carry out these tasks, deep learning has been incorporated into the AUR software to support its operations. However, the unique challenges of underwater environments pose difficulties for deep learning models, which often rely on labeled data that is scarce and noisy. This may undermine the trustworthiness of AUR software that relies on perception modules. Vision-Language Models (VLMs) offer promising solutions for AUR software as they generalize to unseen objects and remain robust in noisy conditions by inferring information from contextual cues. Despite this potential, their performance and uncertainty in underwater…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.