Two are better than one: Context window extension with multi-grained self-injection
Wei Han, Pan Zhou, Soujanya Poria, Shuicheng Yan

TL;DR
This paper introduces SharedLLM, a novel multi-grained context extension method for large language models that uses a dual-model architecture and a tree-structured data retrieval system to efficiently incorporate broader context without extensive retraining.
Contribution
SharedLLM presents a new architecture with a compressor and decoder LLMs and a tree-based data structure for efficient multi-grained context extension in LLMs.
Findings
Enables broader context understanding without retraining the entire model.
Reduces computational costs compared to continual pre-training on long data.
Efficient retrieval of multi-grained context information.
Abstract
The limited context window of contemporary large language models (LLMs) remains a huge barrier to their broader application across various domains. While continual pre-training on long-context data is a straightforward and effective solution, it incurs substantial costs in terms of data acquisition and computational resources. To alleviate this issue, we propose SharedLLM, a novel approach grounded in the design philosophy of multi-grained context compression and query-aware information retrieval. SharedLLM is composed of two short-context LLMs such as LLaMA-2, termed upper model and lower model. The lower model functions as a compressor while the upper model acts as a decoder. The upper model receives compressed, multi-grained context information from the lower model and performs context-aware modeling on the running text. Information transfer between the compressor and decoder occurs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Advanced Database Systems and Queries · Data Stream Mining Techniques
