OrdinalBench: A Benchmark Dataset for Diagnosing Generalization Limits in Ordinal Number Understanding of Vision-Language Models

Yusuke Tozaki; Hisashi Miyamori

arXiv:2603.07786·cs.CV·March 10, 2026

OrdinalBench: A Benchmark Dataset for Diagnosing Generalization Limits in Ordinal Number Understanding of Vision-Language Models

Yusuke Tozaki, Hisashi Miyamori

PDF

Open Access

TL;DR

OrdinalBench introduces a comprehensive diagnostic benchmark to evaluate and improve vision-language models' ability to understand and generalize ordinal numbers, especially for large indices and complex arrangements.

Contribution

The paper presents OrdinalBench, a new benchmark dataset with structured evaluation for ordinal number understanding in VLMs, emphasizing reasoning over large and complex ordinal tasks.

Findings

01

Zero-shot GPT-5 performance degrades on large-ordinal tasks

02

Models struggle with complex path reasoning in ordinal tasks

03

Benchmark enables targeted diagnostics for ordinal understanding

Abstract

Vision-Language Models (VLMs) have advanced across multimodal benchmarks but still show clear gaps in ordinal number understanding, i.e., the ability to track relative positions and generalize to large indices. We present OrdinalBench, a diagnostic benchmark that standardizes ordinal number understanding as an evaluation task for VLMs. The core task is N-th object identification, defined by a starting reference and traversal rule. Task difficulty is controlled along three axes: (i) ordinal magnitude, from small numbers to extreme cases up to 300; (ii) arrangement complexity, from single loops to maze-like paths; and (iii) object count. The benchmark provides 39,000 question-answer pairs, each annotated with a ground-truth reasoning trajectory and balanced across difficulty levels for controlled large-scale testing. Beyond answer-only evaluation, our framework requires models to generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications