Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models

Boheng Sheng; Jiacheng Yao; Meicong Zhang; Guoxiu He

arXiv:2506.00773·cs.CL·June 4, 2025

Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models

Boheng Sheng, Jiacheng Yao, Meicong Zhang, Guoxiu He

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a dynamic chunking and selection method for large language models to improve comprehension of ultra-long texts by adaptively dividing contexts and selecting question-relevant chunks, outperforming fixed truncation approaches.

Contribution

It presents a novel adaptive chunking and question-aware selection technique that enhances LLM understanding of very long texts, maintaining robustness across extensive input lengths.

Findings

01

Outperforms strong baselines on question-answering benchmarks.

02

Maintains robustness with sequences up to 256k tokens.

03

Effective in both single-hop and multi-hop QA tasks.

Abstract

Large language models (LLMs) often struggle to accurately read and comprehend extremely long texts. Current methods for improvement typically rely on splitting long contexts into fixed-length chunks. However, fixed truncation risks separating semantically relevant content, leading to ambiguity and compromising accurate understanding. To overcome this limitation, we propose a straightforward approach for dynamically separating and selecting chunks of long context, facilitating a more streamlined input for LLMs. In particular, we compute semantic similarities between adjacent sentences, using lower similarities to adaptively divide long contexts into variable-length chunks. We further train a question-aware classifier to select sensitive chunks that are critical for answering specific questions. Experimental results on both single-hop and multi-hop question-answering benchmarks show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ecnu-text-computing/dcs
pytorchOfficial

Videos

Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models· underline

Taxonomy

TopicsTopic Modeling