Empowering Semantic-Sensitive Underwater Image Enhancement with VLM
Guodong Fan, Shengning Zhou, Genji Yuan, Huiyu Li, Jingchun Zhou, Jinjiang Li

TL;DR
This paper introduces a novel semantic-sensitive underwater image enhancement method leveraging Vision-Language Models to improve the focus on key objects, resulting in better perceptual quality and downstream task performance.
Contribution
It proposes a new learning mechanism that uses VLMs to generate semantic guidance for UIE models, enhancing their ability to restore key object features accurately.
Findings
Significantly improves perceptual quality metrics.
Enhances detection and segmentation task performance.
Demonstrates adaptability across different UIE baselines.
Abstract
In recent years, learning-based underwater image enhancement (UIE) techniques have rapidly evolved. However, distribution shifts between high-quality enhanced outputs and natural images can hinder semantic cue extraction for downstream vision tasks, thereby limiting the adaptability of existing enhancement models. To address this challenge, this work proposes a new learning mechanism that leverages Vision-Language Models (VLMs) to empower UIE models with semantic-sensitive capabilities. To be concrete, our strategy first generates textual descriptions of key objects from a degraded image via VLMs. Subsequently, a text-image alignment model remaps these relevant descriptions back onto the image to produce a spatial semantic guidance map. This map then steers the UIE network through a dual-guidance mechanism, which combines cross-attention and an explicit alignment loss. This forces the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Enhancement Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
