WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models

Runjie Zhou; Youbo Shao; Haoyu Lu; Bowei Xing; Tongtong Bai; Yujie Chen; Jie Zhao; Lin Sui; Haotian Yao; Zijia Zhao; Hao Yang; Haoning Wu; Zaida Zhou; Jinguo Zhu; Zhiqi Huang; Yiping Bao; Yangyang Liu; Y.Charles; Xinyu Zhou

arXiv:2602.02537·cs.CV·February 4, 2026

WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models

Runjie Zhou, Youbo Shao, Haoyu Lu, Bowei Xing, Tongtong Bai, Yujie Chen, Jie Zhao, Lin Sui, Haotian Yao, Zijia Zhao, Hao Yang, Haoning Wu, Zaida Zhou, Jinguo Zhu, Zhiqi Huang, Yiping Bao, Yangyang Liu, Y.Charles, Xinyu Zhou

PDF

Open Access 3 Datasets

TL;DR

WorldVQA is a new benchmark that rigorously evaluates the visual world knowledge of multimodal large language models by focusing on their memorization and grounding abilities across a wide range of visual entities.

Contribution

The paper introduces WorldVQA, a benchmark that decouples visual knowledge retrieval from reasoning to accurately measure what models memorize about the visual world.

Findings

01

WorldVQA effectively measures visual factuality and memorization.

02

It distinguishes between visual knowledge retrieval and reasoning capabilities.

03

The benchmark covers a broad spectrum of visual entities from common to rare.

Abstract

We introduce WorldVQA, a benchmark designed to evaluate the atomic visual world knowledge of Multimodal Large Language Models (MLLMs). Unlike current evaluations, which often conflate visual knowledge retrieval with reasoning, WorldVQA decouples these capabilities to strictly measure "what the model memorizes." The benchmark assesses the atomic capability of grounding and naming visual entities across a stratified taxonomy, spanning from common head-class objects to long-tail rarities. We expect WorldVQA to serve as a rigorous test for visual factuality, thereby establishing a standard for assessing the encyclopedic breadth and hallucination rates of current and next-generation frontier models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks