IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

Tao Zhang; Yuyang Hong; Yang Xia; Kun Ding; Zeyu Zhang; Ying Wang; Shiming Xiang; Chunhong Pan

arXiv:2512.09663·cs.CV·December 11, 2025

IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

Tao Zhang, Yuyang Hong, Yang Xia, Kun Ding, Zeyu Zhang, Ying Wang, Shiming Xiang, Chunhong Pan

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper introduces IF-Bench, a comprehensive benchmark for evaluating multimodal large language models on infrared images, and proposes GenViP, a training-free visual prompting method that improves model performance by translating infrared images into RGB counterparts.

Contribution

The paper presents the first high-quality infrared image understanding benchmark and a novel training-free visual prompting technique to enhance MLLMs' infrared comprehension.

Findings

01

Model scale and architecture significantly influence infrared image understanding.

02

GenViP consistently improves MLLMs' performance on infrared tasks.

03

IF-Bench provides a reliable platform for evaluating multimodal models on infrared data.

Abstract

Recent advances in multimodal large language models (MLLMs) have led to impressive progress across various benchmarks. However, their capability in understanding infrared images remains unexplored. To address this gap, we introduce IF-Bench, the first high-quality benchmark designed for evaluating multimodal understanding of infrared images. IF-Bench consists of 499 images sourced from 23 infrared datasets and 680 carefully curated visual question-answer pairs, covering 10 essential dimensions of image understanding. Based on this benchmark, we systematically evaluate over 40 open-source and closed-source MLLMs, employing cyclic evaluation, bilingual assessment, and hybrid judgment strategies to enhance the reliability of the results. Our analysis reveals how model scale, architecture, and inference paradigms affect infrared image comprehension, providing valuable insights for this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
casiatao/Qwen-Edit-2509-FT
model· 10 dl· ♡ 6
10 dl♡ 6

Datasets

casiatao/IF-Bench
dataset· 189 dl
189 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis