OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models

Tengjin Weng; Wenhao Jiang; Jingyi Wang; Ming Li; Lin Ma; Zhong Ming

arXiv:2603.09326·cs.CV·March 31, 2026

OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models

Tengjin Weng, Wenhao Jiang, Jingyi Wang, Ming Li, Lin Ma, Zhong Ming

PDF

1 Repo

TL;DR

This paper introduces OddGridBench, a benchmark for evaluating fine-grained visual discrepancy detection in multimodal large language models, revealing current models' limitations and proposing a reinforcement learning framework to improve their perceptual sensitivity.

Contribution

The work presents a new benchmark dataset and a reinforcement learning method to enhance the visual discrepancy sensitivity of multimodal large language models.

Findings

01

MLLMs perform far below human levels in visual discrepancy detection.

02

OddGrid-GRPO significantly improves models' fine-grained visual discrimination.

03

Code and dataset are publicly available at the provided URL.

Abstract

Multimodal large language models (MLLMs) have achieved remarkable performance across a wide range of vision language tasks. However, their ability in low-level visual perception, particularly in detecting fine-grained visual discrepancies, remains underexplored and lacks systematic analysis. In this work, we introduce OddGridBench, a controllable benchmark for evaluating the visual discrepancy sensitivity of MLLMs. OddGridBench comprises over 1,400 grid-based images, where a single element differs from all others by one or multiple visual attributes such as color, size, rotation, or position. Experiments reveal that all evaluated MLLMs, including open-source families such as Qwen3-VL and InternVL3.5, and proprietary systems like Gemini-2.5-Pro and GPT-5, perform far below human levels in visual discrepancy detection. We further propose OddGrid-GRPO, a reinforcement learning framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://wwwtttjjj.github.io/OddGridBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.