Unlocking Multi-Spectral Data for Multi-Modal Models with Guided Inputs and Chain-of-Thought Reasoning

Dahun Kim; Ganesh Satish Mallya; Anelia Angelova

arXiv:2604.21032·cs.CV·April 24, 2026

Unlocking Multi-Spectral Data for Multi-Modal Models with Guided Inputs and Chain-of-Thought Reasoning

Dahun Kim, Ganesh Satish Mallya, Anelia Angelova

PDF

TL;DR

This paper presents a training-free method to incorporate multi-spectral data into RGB-only large multimodal models, significantly improving remote sensing task performance through inference-time adaptations and reasoning instructions.

Contribution

It introduces a novel inference pipeline approach that enables multi-spectral data integration into existing RGB models without retraining, enhancing their applicability in remote sensing.

Findings

01

Achieved strong zero-shot performance gains on remote sensing benchmarks.

02

Demonstrated the effectiveness of guided inputs and Chain-of-Thought reasoning.

03

Enabled geospatial professionals to leverage generalist models for specialized sensor data.

Abstract

Multi-spectral imagery is a valuable input signal for Remote Sensing applications, such as land-use and land-cover classification and environmental monitoring. However, generalist Large Multi-modal Models (LMMs) are typically trained on RGB images, limiting their applicability to the RGB domain. At the same time, training multi-spectral multi-modal models is expensive and produces uniquely specialized models. To address this, we propose a novel training-free approach that introduces multi-spectral data within the inference pipeline of standard RGB-only LMMs, allowing large gains in performance. Our approach leverages the LMMs' understanding of the visual space by adapting non-RGB inputs to that space and injecting domain-specific information and Chain-of-Thought reasoning as instructions. We demonstrate this with the Gemini 2.5 model and observe strong Zero-Shot performance gains on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.