SpectraDINO: Bridging the Spectral Gap in Vision Foundation Models via Lightweight Adapters

Yagiz Nalcakan; Hyeongjin Ju; Incheol Park; Sanghyeop Yeo; Youngwan Jin; Shiho Kim

arXiv:2605.02258·cs.CV·May 5, 2026

SpectraDINO: Bridging the Spectral Gap in Vision Foundation Models via Lightweight Adapters

Yagiz Nalcakan, Hyeongjin Ju, Incheol Park, Sanghyeop Yeo, Youngwan Jin, Shiho Kim

PDF

1 Repo

TL;DR

SpectraDINO is a novel multispectral vision model that extends RGB foundation models to NIR, SWIR, and LWIR modalities using lightweight adapters and a multi-stage training protocol, achieving state-of-the-art results.

Contribution

It introduces a lightweight, modality-specific adapter approach combined with a multi-stage distillation training method to adapt RGB models for multispectral vision tasks.

Findings

01

SpectraDINO outperforms existing methods on multispectral object detection and segmentation benchmarks.

02

The model effectively bridges the spectral gap while preserving RGB priors.

03

State-of-the-art performance across multiple multispectral benchmarks.

Abstract

Vision Foundation Models (VFMs) pretrained on large-scale RGB data have demonstrated remarkable representation quality, yet their applicability to multispectral imaging spanning Near-Infrared (NIR), Short-Wave Infrared (SWIR), and Long-Wave Infrared (LWIR) remains largely unexplored. These spectral modalities offer complementary sensing capabilities critical for robust perception in adverse conditions, but present a fundamental domain gap relative to RGB-centric pretrained models. We present SpectraDINO, a multispectral VFM that bridges this spectral gap by extending DINOv2 ViT backbones to beyond-visible modalities through lightweight, per-modality bottleneck adapters, while preserving the rich representations of the frozen RGB backbone. We introduce a multi-stage teacher-student training protocol in which a frozen DINOv2 teacher guides a spectral student via cosine distillation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yonsei-STL/SpectraDINO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.