Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs

Somraj Gautam; Abhirama Subramanyam Penamakuri; Abhishek Bhandari; Gaurav Harit

arXiv:2508.17334·cs.CV·August 27, 2025

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs

Somraj Gautam, Abhirama Subramanyam Penamakuri, Abhishek Bhandari, Gaurav Harit

PDF

1 Datasets 1 Video

TL;DR

This paper introduces MMCRICBENCH-3K, a benchmark for evaluating large vision-language models on complex numerical and cross-lingual reasoning tasks involving cricket scorecard images, revealing significant limitations of current models.

Contribution

The paper presents a new benchmark dataset, MMCRICBENCH-3K, for assessing LVLMs' abilities in structured numerical and cross-lingual reasoning over semi-structured images.

Findings

01

State-of-the-art LVLMs perform poorly on the benchmark.

02

Models show decreased performance on cross-lingual (Hindi) data.

03

The benchmark exposes limitations in structure-aware visual text understanding.

Abstract

We introduce MMCRICBENCH-3K, a benchmark for Visual Question Answering (VQA) on cricket scorecards, designed to evaluate large vision-language models (LVLMs) on complex numerical and cross-lingual reasoning over semi-structured tabular images. MMCRICBENCH-3K comprises 1,463 synthetically generated scorecard images from ODI, T20, and Test formats, accompanied by 1,500 English QA pairs. It includes two subsets: MMCRICBENCH-E-1.5K, featuring English scorecards, and MMCRICBENCH-H-1.5K, containing visually similar Hindi scorecards, with all questions and answers kept in English to enable controlled cross-script evaluation. The task demands reasoning over structured numerical data, multi-image context, and implicit domain knowledge. Empirical results show that even state-of-the-art LVLMs, such as GPT-4o and Qwen2.5VL, struggle on the English subset despite it being their primary training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

DIALab/MMCricBench
dataset· 239 dl
239 dl

Videos

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs· underline