INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents

Somraj Gautam; Anathapindika Dravichi; Gaurav Harit

arXiv:2604.11970·cs.CV·April 15, 2026

INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents

Somraj Gautam, Anathapindika Dravichi, Gaurav Harit

PDF

1 Repo

TL;DR

INDOTABVQA is a new benchmark dataset for evaluating cross-lingual table understanding in Bahasa Indonesia documents, highlighting performance gaps in current models and the benefits of targeted fine-tuning.

Contribution

The paper introduces INDOTABVQA, a comprehensive dataset for cross-lingual table VQA in Bahasa Indonesia, and benchmarks multiple models to reveal performance gaps and improvements.

Findings

01

Significant performance gaps in current VLMs on complex tables and low-resource languages.

02

Fine-tuning models on INDOTABVQA improves accuracy by up to 17.8%.

03

Adding explicit table region coordinates enhances model performance by 4-7%.

Abstract

We introduce INDOTABVQA, a benchmark for evaluating cross-lingual Table Visual Question Answering (VQA) on real-world document images in Bahasa Indonesia. The dataset comprises 1,593 document images across three visual styles (bordered, borderless, and colorful) with one or more than one tables, and 1,593 question-answer sets in four languages: Bahasa Indonesia, English, Hindi, and Arabic. This enables evaluation of Vision-Language Models (VLMs) in both monolingual (Bahasa documents with Bahasa questions) and cross-lingual settings (Bahasa documents with questions in other languages). We benchmark leading open-source VLMs (Qwen2.5-VL, Gemma-3, LLaMA-3.2) and GPT-4o and reveal substantial performance gaps, particularly on structurally complex tables and in low-resource languages. Fine-tuning a compact 3B and LoRA-finetuned 7B model on our dataset yields 11.6% and 17.8% improvements in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/NusaBharat/INDOTABVQA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.