Lost in Translation and Noise: A Deep Dive into the Failure Modes of VLMs on Real-World Tables

Anshul Singh; Rohan Chaudhary; Gagneet Singh; Abhay Kumary

arXiv:2511.17238·cs.CL·November 24, 2025

Lost in Translation and Noise: A Deep Dive into the Failure Modes of VLMs on Real-World Tables

Anshul Singh, Rohan Chaudhary, Gagneet Singh, Abhay Kumary

PDF

Open Access

TL;DR

This paper introduces MirageTVQA, a multilingual, noisy table question-answering benchmark revealing significant performance drops of vision-language models in real-world, imperfect, multilingual scenarios, highlighting key failure modes.

Contribution

The paper presents MirageTVQA, a new benchmark with 60,000 QA pairs across 24 languages, designed to evaluate VLMs on noisy, real-world tables, exposing their limitations.

Findings

01

Over 35% performance drop on noisy tables

02

Models exhibit strong English bias

03

Performance degrades significantly with visual noise

Abstract

The impressive performance of VLMs is largely measured on benchmarks that fail to capture the complexities of real-world scenarios. Existing datasets for tabular QA, such as WikiTableQuestions and FinQA, are overwhelmingly monolingual (English) and present tables in a digitally perfect, clean format. This creates a significant gap between research and practice. To address this, we present \textbf{MirageTVQA}, a new benchmark designed to evaluate VLMs on these exact dimensions. Featuring nearly 60,000 QA pairs across 24 languages, MirageTVQA challenges models with tables that are not only multilingual but also visually imperfect, incorporating realistic noise to mimic scanned documents. Our evaluation of the leading VLMs reveals two primary failure points: a severe degradation in performance (over 35\% drop for the best models) when faced with visual noise and a consistent English-first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Data Quality and Management · Handwritten Text Recognition Techniques