Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation

Ruturaj Reddy; Hrishav Bakul Barua; Junn Yong Loo; Thanh Thi Nguyen; Ganesh Krishnasamy

arXiv:2602.07343·cs.CV·February 10, 2026

Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation

Ruturaj Reddy, Hrishav Bakul Barua, Junn Yong Loo, Thanh Thi Nguyen, Ganesh Krishnasamy

PDF

Open Access

TL;DR

CLARITY is a dynamic, scene-adaptive RGB-T segmentation framework guided by vision-language priors, significantly improving robustness and accuracy in challenging driving conditions.

Contribution

It introduces a scene-aware fusion strategy guided by vision-language models, enhancing segmentation performance over static fusion methods.

Findings

01

Achieves new state-of-the-art performance on MFNet dataset.

02

Effectively preserves dark-object semantics in adverse conditions.

03

Enforces structural consistency to sharpen object boundaries.

Abstract

Robust semantic segmentation of road scenes under adverse illumination, lighting, and shadow conditions remain a core challenge for autonomous driving applications. RGB-Thermal fusion is a standard approach, yet existing methods apply static fusion strategies uniformly across all conditions, allowing modality-specific noise to propagate throughout the network. Hence, we propose CLARITY that dynamically adapts its fusion strategy to the detected scene condition. Guided by vision-language model (VLM) priors, the network learns to modulate each modality's contribution based on the illumination state while leveraging object embeddings for segmentation, rather than applying a fixed fusion policy. We further introduce two mechanisms, i.e., one which preserves valid dark-object semantics that prior noise-suppression methods incorrectly discard, and a hierarchical decoder that enforces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety