Unlocking Multi-Spectral Data for Multi-Modal Models with Guided Inputs and Chain-of-Thought Reasoning
Dahun Kim, Ganesh Satish Mallya, Anelia Angelova

TL;DR
This paper presents a training-free method to incorporate multi-spectral data into RGB-only large multimodal models, significantly improving remote sensing task performance through inference-time adaptations and reasoning instructions.
Contribution
It introduces a novel inference pipeline approach that enables multi-spectral data integration into existing RGB models without retraining, enhancing their applicability in remote sensing.
Findings
Achieved strong zero-shot performance gains on remote sensing benchmarks.
Demonstrated the effectiveness of guided inputs and Chain-of-Thought reasoning.
Enabled geospatial professionals to leverage generalist models for specialized sensor data.
Abstract
Multi-spectral imagery is a valuable input signal for Remote Sensing applications, such as land-use and land-cover classification and environmental monitoring. However, generalist Large Multi-modal Models (LMMs) are typically trained on RGB images, limiting their applicability to the RGB domain. At the same time, training multi-spectral multi-modal models is expensive and produces uniquely specialized models. To address this, we propose a novel training-free approach that introduces multi-spectral data within the inference pipeline of standard RGB-only LMMs, allowing large gains in performance. Our approach leverages the LMMs' understanding of the visual space by adapting non-RGB inputs to that space and injecting domain-specific information and Chain-of-Thought reasoning as instructions. We demonstrate this with the Gemini 2.5 model and observe strong Zero-Shot performance gains on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
