Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Seiji Maekawa, Hayate Iso, Nikita Bhutani

TL;DR
This paper introduces HoloBench, a framework for evaluating how long-context language models perform holistic reasoning over large textual data, revealing key factors affecting their capabilities and limitations.
Contribution
The work presents HoloBench, a systematic benchmark for assessing holistic reasoning in long-context LLMs across database-like operations on massive text collections.
Findings
Information density impacts LLM performance more than context length.
Query complexity influences accuracy more than the amount of information.
Finding maximum/minimum values is easier for LLMs and less affected by context length.
Abstract
The rapid increase in textual information means we need more efficient methods to sift through, organize, and understand it all. While retrieval-augmented generation (RAG) models excel in accessing information from large document collections, they struggle with complex tasks that require aggregation and reasoning over information spanning across multiple documents--what we call holistic reasoning. Long-context language models (LCLMs) have great potential for managing large-scale documents, but their holistic reasoning capabilities remain unclear. In this work, we introduce HoloBench, a novel framework that brings database reasoning operations into text-based contexts, making it easier to systematically evaluate how LCLMs handle holistic reasoning across large documents. Our approach adjusts key factors such as context length, information density, distribution of information, and query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Rough Sets and Fuzzy Logic
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Multi-Head Attention · WordPiece · Dropout · Layer Normalization · Adam
