Composing Open-domain Vision with RAG for Ocean Monitoring and Conservation
Sepand Dyanatkar, Angran Li, Alexander Dungate

TL;DR
This paper introduces a novel approach combining pretrained vision-language models with retrieval-augmented generation to improve marine species identification in ocean monitoring, addressing challenges of domain diversity and rare species detection.
Contribution
It proposes a scalable, open-domain learning framework using RAG and VLMs for marine image analysis, enabling effective classification without domain-specific training.
Findings
Effective fish classification from vessel videos
Emergent retrieval capabilities demonstrated
No domain-specific training required
Abstract
Climate change's destruction of marine biodiversity is threatening communities and economies around the world which rely on healthy oceans for their livelihoods. The challenge of applying computer vision to niche, real-world domains such as ocean conservation lies in the dynamic and diverse environments where traditional top-down learning struggle with long-tailed distributions, generalization, and domain transfer. Scalable species identification for ocean monitoring is particularly difficult due to the need to adapt models to new environments and identify rare or unseen species. To overcome these limitations, we propose leveraging bottom-up, open-domain learning frameworks as a resilient, scalable solution for image and video analysis in marine applications. Our preliminary demonstration uses pretrained vision-language models (VLMs) combined with retrieval-augmented generation (RAG) as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUnderwater Vehicles and Communication Systems · Remote-Sensing Image Classification · Water Quality Monitoring Technologies
