HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

Xinyu Zhang; Zurong Mai; Qingmei Li; Zjin Liao; Yibin Wen; Yuhang Chen; Xiaoya Fan; Chan Tsz Ho; Bi Tianyuan; Haoyuan Liang; Ruifeng Su; Zihao Qian; Juepeng Zheng; Jianxi Huang; Yutong Lu; and Haohuan Fu

arXiv:2604.08884·cs.CV·April 13, 2026

HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

Xinyu Zhang, Zurong Mai, Qingmei Li, Zjin Liao, Yibin Wen, Yuhang Chen, Xiaoya Fan, Chan Tsz Ho, Bi Tianyuan, Haoyuan Liang, Ruifeng Su, Zihao Qian, Juepeng Zheng, Jianxi Huang, Yutong Lu, and Haohuan Fu

PDF

1 Repo

TL;DR

HM-Bench is a new benchmark designed to evaluate multimodal large language models' ability to understand hyperspectral images in remote sensing, addressing a gap in current model capabilities.

Contribution

It introduces a large-scale hyperspectral dataset, a dual-modality evaluation framework, and comprehensive assessments of 18 models on spectral reasoning tasks.

Findings

01

Visual inputs outperform textual inputs in HSI understanding.

02

Existing MLLMs face significant challenges in complex spectral-spatial reasoning.

03

Grounding in spectral-spatial evidence is crucial for effective hyperspectral image analysis.

Abstract

While multimodal large language models (MLLMs) have made significant strides in natural image understanding, their ability to perceive and reason over hyperspectral image (HSI) remains underexplored, which is a vital modality in remote sensing. The high dimensionality and intricate spectral-spatial properties of HSI pose unique challenges for models primarily trained on RGB data.To address this gap, we introduce Hyperspectral Multimodal Benchmark (HM-Bench), the first benchmark designed specifically to evaluate MLLMs in HSI understanding. We curate a large-scale dataset of 19,337 question-answer pairs across 13 task categories, ranging from basic perception to spectral reasoning. Given that existing MLLMs are not equipped to process raw hyperspectral cubes natively, we propose a dual-modality evaluation framework that transforms HSI data into two complementary representations: PCA-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HuoRiLi-Yu/HM-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.