Holistic Reasoning with Long-Context LMs: A Benchmark for Database   Operations on Massive Textual Data

Seiji Maekawa; Hayate Iso; Nikita Bhutani

arXiv:2410.11996·cs.CL·March 14, 2025

Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data

Seiji Maekawa, Hayate Iso, Nikita Bhutani

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces HoloBench, a framework for evaluating how long-context language models perform holistic reasoning over large textual data, revealing key factors affecting their capabilities and limitations.

Contribution

The work presents HoloBench, a systematic benchmark for assessing holistic reasoning in long-context LLMs across database-like operations on massive text collections.

Findings

01

Information density impacts LLM performance more than context length.

02

Query complexity influences accuracy more than the amount of information.

03

Finding maximum/minimum values is easier for LLMs and less affected by context length.

Abstract

The rapid increase in textual information means we need more efficient methods to sift through, organize, and understand it all. While retrieval-augmented generation (RAG) models excel in accessing information from large document collections, they struggle with complex tasks that require aggregation and reasoning over information spanning across multiple documents--what we call holistic reasoning. Long-context language models (LCLMs) have great potential for managing large-scale documents, but their holistic reasoning capabilities remain unclear. In this work, we introduce HoloBench, a novel framework that brings database reasoning operations into text-based contexts, making it easier to systematically evaluate how LCLMs handle holistic reasoning across large documents. Our approach adjusts key factors such as context length, information density, distribution of information, and query…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

megagonlabs/holobench
dataset· 72 dl
72 dl

Videos

Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data· slideslive

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Rough Sets and Fuzzy Logic

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Multi-Head Attention · WordPiece · Dropout · Layer Normalization · Adam