Large Language Models Can Self-Improve in Long-context Reasoning
Siheng Li, Cheng Yang, Zesen Cheng, Lemao Liu, Mo Yu, Yujiu Yang, Wai, Lam

TL;DR
This paper introduces extours, a method enabling large language models to self-improve in long-context reasoning by generating, scoring, and fine-tuning on multiple outputs, leading to significant performance gains without relying on external annotations.
Contribution
The paper presents a novel self-improvement approach for LLMs in long-context reasoning that does not depend on human or advanced model annotations, showing notable performance improvements.
Findings
extours improves Llama-3.1-8B-Instruct by 4.2 points.
The approach outperforms prior methods relying on external data.
Self-improvement can be effectively achieved through output sampling, scoring, and fine-tuning.
Abstract
Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. Existing approaches typically involve fine-tuning LLMs with synthetic data, which depends on annotations from human experts or advanced models like GPT-4, thus restricting further advancements. To address this issue, we investigate the potential for LLMs to self-improve in long-context reasoning and propose \ours, an approach specifically designed for this purpose. This approach is straightforward: we sample multiple outputs for each question, score them with Minimum Bayes Risk, and then apply supervised fine-tuning or preference optimization based on these outputs. Extensive experiments on several leading LLMs demonstrate the effectiveness of \ours, with an absolute improvement of points for Llama-3.1-8B-Instruct. Furthermore, \ours achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Label Smoothing · Layer Normalization · Adam · Multi-Head Attention · Position-Wise Feed-Forward Layer · Residual Connection
