Thermo-VL: Extending Vision-Language Models to Thermal Infrared Perception

Rusiru Thushara; Yasiru Ranasinghe; Jay Paranjape; Vishal M. Patel

arXiv:2605.21882·cs.CV·May 22, 2026

Thermo-VL: Extending Vision-Language Models to Thermal Infrared Perception

Rusiru Thushara, Yasiru Ranasinghe, Jay Paranjape, Vishal M. Patel

PDF

1 Repo

TL;DR

Thermo-VL is a novel vision-language model that effectively integrates thermal infrared data with RGB imagery, enhancing low-light scene understanding and cross-spectrum reasoning.

Contribution

It introduces a wavelength-aware fusion module, a new RGB-thermal dataset, and a benchmark for low-light and thermal reasoning tasks.

Findings

01

Significant improvements on thermal-only and RGB+thermal reasoning tasks.

02

Effective fusion of thermal and RGB data without disrupting pretrained RGB-language models.

03

Availability of a new dataset and benchmark for RGB-thermal visual question answering.

Abstract

Vision-language models (VLMs) often fail under low illumination because their visual grounding is learned predominantly from RGB imagery, whereas thermal infrared preserves complementary scene structure when visible cues degrade. We present Thermo-VL, a wavelength-aware VLM that augments a frozen Molmo-7B backbone with a trainable thermal encoder and a text-guided dual-attention fusion module. Given aligned RGB tokens, thermal tokens, and prompt embeddings, the fusion module conditions thermal features on both language and RGB context, then injects a gated residual into the frozen RGB stream so thermal evidence can be incorporated without disrupting Molmo's pretrained RGB-language interface. We train the model with the standard language-modeling objective together with auxiliary alignment and regularization losses that improve cross-modal grounding and reduce over-reliance on RGB. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://thusharakart.github.io/Thermo-VL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.