WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

Basel Shbita; Pengyuan Li; Anna Lisa Gentile

arXiv:2605.21479·cs.CV·May 21, 2026

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

Basel Shbita, Pengyuan Li, Anna Lisa Gentile

PDF

1 Repo 1 Datasets

TL;DR

WikiVQABench is a new knowledge-grounded VQA benchmark combining Wikipedia images, captions, and Wikidata, designed to evaluate models' ability to use external knowledge for visual question answering.

Contribution

It introduces a systematically constructed, human-curated benchmark that emphasizes external knowledge integration in visual question answering tasks.

Findings

01

Evaluation of 15 VLMs shows performance from 24.7% to 75.6% accuracy.

02

The benchmark effectively discriminates model capabilities on knowledge-intensive reasoning.

03

The dataset and code are publicly available.

Abstract

Visual Question Answering (VQA) benchmarks have largely emphasized perception-based tasks that can be solved from visual content alone. In contrast, many real-world scenarios require external knowledge that is not directly observable in the image to answer correctly. We introduce WikiVQABench, a human-curated knowledge-grounded VQA benchmark constructed by systematically combining Wikipedia images, their associated article captions, and structured knowledge from Wikidata. Our pipeline uses large language models (LLMs) to generate candidate multiple-choice image-question-answer sets. All generated instances are subsequently reviewed and curated by human annotators to ensure factual correctness, visual-text consistency, and that each question requires external knowledge in addition to visual evidence for correct resolution. WikiVQABench comprises a substantial collection of Wikipedia…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Datasets

ibm-research/WikiVQABench
dataset· 244 dl
244 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.