UniRGB-IR: A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning
Maoxun Yuan, Bo Cui, Tianyi Zhao, Jiayi Wang, Shan Fu, Xue Yang, Xingxing Wei

TL;DR
UniRGB-IR introduces a scalable adapter-based framework that enhances pre-trained RGB foundation models for diverse RGB-IR semantic tasks, achieving state-of-the-art results with improved generalization.
Contribution
The paper proposes a novel adapter mechanism with MFP and SFI modules to effectively incorporate multi-modal features into frozen pre-trained models for RGB-IR tasks.
Findings
Achieves state-of-the-art performance on various RGB-IR semantic tasks.
Effectively incorporates multi-scale features via adapter modules.
Maintains high scalability and generalization across tasks.
Abstract
Semantic analysis on visible (RGB) and infrared (IR) images has gained significant attention due to their enhanced accuracy and robustness under challenging conditions including low-illumination and adverse weather. However, due to the lack of pre-trained foundation models on the large-scale infrared image datasets, existing methods prefer to design task-specific frameworks and directly fine-tune them with pre-trained foundation models on their RGB-IR semantic relevance datasets, which results in poor scalability and limited generalization. To address these limitations, we propose UniRGB-IR, a scalable and efficient framework for RGB-IR semantic tasks that introduces a novel adapter mechanism to effectively incorporate rich multi-modal features into pre-trained RGB-based foundation models. Our framework comprises three key components: a vision transformer (ViT) foundation model, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies · CCD and CMOS Imaging Sensors
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Layer Normalization · Multi-Head Attention · Residual Connection · Softmax · Adapter · Vision Transformer
