IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
Tao Zhang, Yuyang Hong, Yang Xia, Kun Ding, Zeyu Zhang, Ying Wang, Shiming Xiang, Chunhong Pan

TL;DR
This paper introduces IF-Bench, a comprehensive benchmark for evaluating multimodal large language models on infrared images, and proposes GenViP, a training-free visual prompting method that improves model performance by translating infrared images into RGB counterparts.
Contribution
The paper presents the first high-quality infrared image understanding benchmark and a novel training-free visual prompting technique to enhance MLLMs' infrared comprehension.
Findings
Model scale and architecture significantly influence infrared image understanding.
GenViP consistently improves MLLMs' performance on infrared tasks.
IF-Bench provides a reliable platform for evaluating multimodal models on infrared data.
Abstract
Recent advances in multimodal large language models (MLLMs) have led to impressive progress across various benchmarks. However, their capability in understanding infrared images remains unexplored. To address this gap, we introduce IF-Bench, the first high-quality benchmark designed for evaluating multimodal understanding of infrared images. IF-Bench consists of 499 images sourced from 23 infrared datasets and 680 carefully curated visual question-answer pairs, covering 10 essential dimensions of image understanding. Based on this benchmark, we systematically evaluate over 40 open-source and closed-source MLLMs, employing cyclic evaluation, bilingual assessment, and hybrid judgment strategies to enhance the reliability of the results. Our analysis reveals how model scale, architecture, and inference paradigms affect infrared image comprehension, providing valuable insights for this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
