KnowledgeBerg: Evaluating Systematic Knowledge Coverage and Compositional Reasoning in Large Language Models

Xiao Zhang; Qianru Meng; Yongjian Chen; Yumeng Wang; Johan Bos

arXiv:2604.17621·cs.AI·April 21, 2026

KnowledgeBerg: Evaluating Systematic Knowledge Coverage and Compositional Reasoning in Large Language Models

Xiao Zhang, Qianru Meng, Yongjian Chen, Yumeng Wang, Johan Bos

PDF

1 Repo

TL;DR

KnowledgeBerg is a comprehensive benchmark testing large language models' ability to systematically cover knowledge domains and perform compositional reasoning, revealing significant limitations across multiple models and languages.

Contribution

The paper introduces KnowledgeBerg, a new benchmark with 4,800 questions across diverse domains and languages, to evaluate LLMs' knowledge coverage and reasoning capabilities.

Findings

01

Open-source LLMs perform poorly on universe enumeration and reasoning tasks.

02

Test-time augmentation improves model performance by up to 4.35 points.

03

Failures are due to missing knowledge, lack of awareness, and incorrect reasoning execution.

Abstract

Many real-world questions appear deceptively simple yet implicitly demand two capabilities: (i) systematic coverage of a bounded knowledge universe and (ii) compositional set-based reasoning over that universe, a phenomenon we term "the tip of the iceberg." We formalize this challenge through two orthogonal dimensions: knowledge width, the cardinality of the required universe, and reasoning depth, the number of compositional set operations. We introduce KnowledgeBerg, a benchmark of 4,800 multiple-choice questions derived from 1,183 enumeration seeds spanning 10 domains and 17 languages, with universes grounded in authoritative sources to ensure reproducibility. Representative open-source LLMs demonstrate severe limitations, achieving only 5.26-36.88 F1 on universe enumeration and 16.00-44.19 accuracy on knowledge-grounded reasoning. Diagnostic analyses reveal three stages of failure:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/2npc/KnowledgeBerg
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.