LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information

Bowen Ping; Jiali Zeng; Fandong Meng; Shuo Wang; Jie Zhou; Shanghang Zhang

arXiv:2502.02095·cs.CL·May 21, 2025

LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information

Bowen Ping, Jiali Zeng, Fandong Meng, Shuo Wang, Jie Zhou, Shanghang Zhang

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces LongDPO, a method that enhances long-form generation in large language models by using critique-augmented, stepwise preference learning with Monte Carlo Tree Search and external critiques, leading to improved length and quality.

Contribution

It proposes a novel process supervision approach with Monte Carlo Tree Search and external critiques for better long-form generation in LLMs, outperforming existing methods.

Findings

01

Improved length and quality in long-form generation benchmarks.

02

Almost lossless performance on general benchmarks across various models.

03

Effective use of stepwise preference pairs and external critiques.

Abstract

Long-form generation is crucial for academic writing papers and repo-level code generation. Despite this, current models, including GPT-4o, still exhibit unsatisfactory performance. Existing methods that utilize preference learning with outcome supervision often fail to provide detailed feedback for extended contexts. This shortcoming can lead to content that does not fully satisfy query requirements, resulting in issues like length deviations, and diminished quality. In this paper, we propose enhancing long-form generation by incorporating process supervision. We employ Monte Carlo Tree Search to gather stepwise preference pairs, utilizing a global memory pool to maintain consistency. To address the issue of suboptimal candidate selection, we integrate external critiques to refine and improve the quality of the preference pairs. Finally, we apply step-level DPO using the collected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pingbowen23/longdpo
pytorchOfficial

Models

🤗
Bowen232/LongDPO
model

Datasets

Bowen232/LongDPO
dataset· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems

MethodsDirect Preference Optimization