When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
Zhen Xu, Shang Zhu, Jue Wang, Junlin Wang, Ben Athiwaratkun, Chi Wang, James Zou, Ce Zhang

TL;DR
This paper introduces a theoretical noise decomposition framework to analyze when divide and conquer strategies are effective for long context tasks in Large Language Models, supported by experiments on various NLP tasks.
Contribution
It presents a novel noise decomposition framework that explains the failure modes of long context processing and guides effective chunking strategies for LLMs.
Findings
Multi-agent chunking is effective under certain noise conditions.
Chunk-based processing can outperform single-shot models like GPT-4 on large inputs.
The framework clarifies when dividing long texts improves LLM performance.
Abstract
We investigate the challenge of applying Large Language Models (LLMs) to long texts. We propose a theoretical framework that distinguishes the failure modes of long context tasks into three categories: cross-chunk dependence (task noise), confusion that grows with context size (model noise), and the imperfect integration of partial results (aggregator noise). Under this view, we analyze when it is effective to use multi-agent chunking, i.e., dividing a lengthy sequence into smaller chunks and aggregating the processed results of each chunk. Our experiments on tasks such as retrieval, question answering, and summarization confirm both the theoretical analysis and the conditions that favor multi-agent chunking. By exploring the accelerated decay of model fidelity with input length, we also explain why, for large inputs, a weaker model configured with chunk-based processing can surpass a…
Peer Reviews
Decision·ICLR 2026 Poster
* **Unifying perspective.** The three-noise decomposition offers a clear lens for deciding when to chunk, how to chunk, and how to aggregate. * **Actionable empirical takeaways.** The paper documents superlinear performance decay with length and shows cases where chunking beats single-shot, provided aggregator noise is controlled. * **Concrete implementation.** The planner/manager/worker design and prompt scaffolding make the approach reproducible in spirit and show how aggregator strength mat
* **Superlinearity is argued largely by phenomenon, not direct estimation.** The paper builds the thesis from diagnostic curves and counter-hypotheses, but does not directly fit an exponent for error growth vs. length with uncertainty quantification. Stronger statistical evidence would help. * **Metric/assumption clarity.** Parts of the theoretical setup and noise interactions would benefit from clearer units/assumptions and closer alignment with common additive error analyses (currently the ex
1. Long context tasks are important as many real-world questions are based on long context. 2. The theoretical analysis gives unique evidence to show that chunking is better than single-agent. This conclusion is non-trivial and might change the understanding of users of Long-LLMs. 3. Experiments further echo the theory and give a more fine-grained analysis of the problem.
I think some settings are over-simplified. For instance, there are a lot of different structures in multi-agent systems, for instance, there are chains, trees, and graphs. However, the theoretical analysis and experiment directly assume the agents (chunks) are independent, where the failure of one agent will not infect its sublings, which is commonly seen in chains. Next, the work in a multi-agent system is different from one-shot agent as they have different tasks to finish (summary vs. generat
1. The paper not only defines aggregator noise but also demonstrates in practice how to reduce this type of noise by introducing a planner. 2. The noise framework proposed in the paper possesses diagnostic capabilities. By analyzing the relative dominance of the three types of noise, it divides long-context tasks into three distinct regions. This aids in determining whether a specific task is more suitable for the divide-and-conquer approach. 3. The paper offers a possible explanation for the ph
1. It is commendable that the paper attempts to provide a scientific explanation from a theoretical perspective. However, when constructing its core theoretical framework (Sections 3.1 and 3.2), the paper lacks rigor in the use of mathematical operators. There is no explanation as to why multiplication and addition can be directly performed in the output space. For instance, what does the product of the results of two functions imply? This is puzzling and weakens the persuasiveness of the entire
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education
