Towards robust long-context understanding of large language model via active recap learning
Chenyu Hui

TL;DR
This paper introduces Active Recap Learning (ARL), a framework that improves large language models' understanding of long contexts by enabling them to revisit, summarize, and utilize previous content during training and inference.
Contribution
ARL is a novel framework that enhances long-context understanding in LLMs through targeted sequence construction and recursive memory mechanisms during pretraining and inference.
Findings
26.8% improvement on RULER benchmark
9.44% improvement on LongBench
Effective long-context understanding enhancement
Abstract
In this paper, we propose active recap learning (ARL), a framework for enhancing large language model (LLM) in understanding long contexts. ARL enables models to revisit and summarize earlier content through targeted sequence construction during contined pretraining and retrospective summarization at inference. First, we identify key tokens in prepared long context based on loss gaps between long and short forward contexts and find most revant preceding paragraphs, then summarize them using an LLM. Second, ARL equips models with the ability to autonomously generate and utilize these retrospective summaries during inference, thereby establishing a recursive memory mechanism across paragraphs. Experimental results show substantial gains, with ARL achieving a 26.8% improvement on RULER and a 9.44% improvement on LongBench. Overall, ARL offers a simple yet effective continued…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Mental Health via Writing
