Large Language Models Can Self-Improve in Long-context Reasoning

Siheng Li; Cheng Yang; Zesen Cheng; Lemao Liu; Mo Yu; Yujiu Yang; Wai; Lam

arXiv:2411.08147·cs.CL·November 14, 2024

Large Language Models Can Self-Improve in Long-context Reasoning

Siheng Li, Cheng Yang, Zesen Cheng, Lemao Liu, Mo Yu, Yujiu Yang, Wai, Lam

PDF

Open Access 1 Repo

TL;DR

This paper introduces extours, a method enabling large language models to self-improve in long-context reasoning by generating, scoring, and fine-tuning on multiple outputs, leading to significant performance gains without relying on external annotations.

Contribution

The paper presents a novel self-improvement approach for LLMs in long-context reasoning that does not depend on human or advanced model annotations, showing notable performance improvements.

Findings

01

extours improves Llama-3.1-8B-Instruct by 4.2 points.

02

The approach outperforms prior methods relying on external data.

03

Self-improvement can be effectively achieved through output sampling, scoring, and fine-tuning.

Abstract

Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. Existing approaches typically involve fine-tuning LLMs with synthetic data, which depends on annotations from human experts or advanced models like GPT-4, thus restricting further advancements. To address this issue, we investigate the potential for LLMs to self-improve in long-context reasoning and propose \ours, an approach specifically designed for this purpose. This approach is straightforward: we sample multiple outputs for each question, score them with Minimum Bayes Risk, and then apply supervised fine-tuning or preference optimization based on these outputs. Extensive experiments on several leading LLMs demonstrate the effectiveness of \ours, with an absolute improvement of $4.2$ points for Llama-3.1-8B-Instruct. Furthermore, \ours achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sihengli99/sealong
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Label Smoothing · Layer Normalization · Adam · Multi-Head Attention · Position-Wise Feed-Forward Layer · Residual Connection