Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long   Context Evaluation Tasks

Zi Yang

arXiv:2409.06338·cs.CL·September 11, 2024

Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks

Zi Yang

PDF

Open Access

TL;DR

This paper introduces the Dolce framework to categorize and measure the difficulty of long context understanding tasks in language models, distinguishing between retrieval and holistic understanding capabilities.

Contribution

The paper proposes a novel parameterization and sampling method to automatically identify focus categories and difficulty levels in long context tasks.

Findings

01

0% to 67% of problems are retrieval focused

02

0% to 90% of problems are holistic understanding focused

03

Effective categorization across 44 benchmark tasks

Abstract

We argue that there are two major distinct capabilities in long context understanding: retrieval and holistic understanding. Understanding and further improving LLMs' long context capabilities would not be possible without knowing the tasks' focus categories. We aim to automatically identify retrieval focused and holistic understanding focused problems from suites of benchmarks and quantitatively measure the difficulty within each focus. In this paper, we present the Dolce framework, which parameterizes each problem by $λ$ (complexity) and $k$ (redundancy) and assigns to one of five predefined focus categories. We propose to sample short contexts from the full context and estimate the probability an LLM solves the problem using the sampled spans. To find the $λ$ and $k$ for each problem, we further propose a mixture model of a non-parametric background noise component and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvaluation and Performance Assessment

MethodsFocus