Beyond Isolated Capabilities: Bridging Long CoT Reasoning and Long-Context Understanding
Yifei Wang

TL;DR
This paper investigates how large-scale reasoning distillation enhances long-context understanding in language models, demonstrating significant improvements in multi-document question answering and addressing the 'lost in the middle' challenge.
Contribution
It provides a comprehensive analysis of the impact of reasoning distillation on long-context comprehension, highlighting its benefits for retrieval-augmented systems.
Findings
Distilled reasoning improves long-context information extraction.
Enhanced reasoning fosters better multi-document question answering.
Mitigates the 'lost in the middle' problem in long-context models.
Abstract
Reasoning distillation has emerged as an effective approach to enhance the reasoning capabilities of smaller language models. However, the impact of large-scale reasoning distillation on other critical abilities, particularly in-context retrieval and reasoning, remains unexplored. This gap in understanding is particularly significant given the increasing importance of Retrieval-Augmented Generation (RAG) systems, where efficient acquisition and utilization of contextual information are paramount for generating reliable responses. Motivated by the need to understand how the extended long-CoT process influences long-context comprehension, we conduct a comprehensive investigation using a series of open-source models distilled from Deepseek-R1, renowned for its exceptional reasoning capabilities. Our study focuses on evaluating these models' performance in extracting and integrating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
