TASeg: Text-aware RGB-T Semantic Segmentation based on Fine-tuning Vision Foundation Models

Meng Yu; Te Cui; Qitong Chu; Wenjie Song; Yi Yang; Yufeng Yue

arXiv:2506.21975·cs.CV·June 30, 2025

TASeg: Text-aware RGB-T Semantic Segmentation based on Fine-tuning Vision Foundation Models

Meng Yu, Te Cui, Qitong Chu, Wenjie Song, Yi Yang, Yufeng Yue

PDF

TL;DR

TASeg introduces a novel framework that combines fine-tuned vision foundation models with text embeddings to improve RGB-T semantic segmentation, especially in challenging environments.

Contribution

The paper presents a new text-aware RGB-T segmentation method using LoRA fine-tuning, a dynamic feature fusion module, and CLIP text embeddings to enhance semantic accuracy.

Findings

01

Outperforms existing methods on multiple datasets.

02

Requires fewer trainable parameters.

03

Effectively integrates textual and visual information.

Abstract

Reliable semantic segmentation of open environments is essential for intelligent systems, yet significant problems remain: 1) Existing RGB-T semantic segmentation models mainly rely on low-level visual features and lack high-level textual information, which struggle with accurate segmentation when categories share similar visual characteristics. 2) While SAM excels in instance-level segmentation, integrating it with thermal images and text is hindered by modality heterogeneity and computational inefficiency. To address these, we propose TASeg, a text-aware RGB-T segmentation framework by using Low-Rank Adaptation (LoRA) fine-tuning technology to adapt vision foundation models. Specifically, we propose a Dynamic Feature Fusion Module (DFFM) in the image encoder, which effectively merges features from multiple visual modalities while freezing SAM's original transformer blocks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSegment Anything Model