Consensus Group Relative Policy Optimization for Text Generation

Yuki Ichihara; Yuu Jinnai; Kaito Ariu; Eiji Uchibe

arXiv:2602.03102·cs.LG·February 4, 2026

Consensus Group Relative Policy Optimization for Text Generation

Yuki Ichihara, Yuu Jinnai, Kaito Ariu, Eiji Uchibe

PDF

Open Access

TL;DR

C-GRPO is a novel training method that distills the benefits of MBR decoding into a policy optimization framework, enabling efficient text generation without high inference costs or reliance on gold references.

Contribution

It introduces a group-relative policy optimization approach that aligns training with MBR decoding, reducing inference costs and eliminating the need for curated preference data.

Findings

01

Achieves MBR-level performance without inference overhead

02

Outperforms reference-free baselines in translation and summarization

03

Converges under ideal conditions to the expected-utility objective

Abstract

Many strong decoding methods for text generation follow a sample-and-rerank paradigm: they draw multiple candidates, score each under a utility (reward) function using consensus across samples, and return the best one. Although effective, these methods incur high computational costs during inference due to repeated sampling and scoring. Prior attempts to amortize inference-time computation typically rely on gold references, teacher labels, or curated preference data, increasing dataset construction effort and the demand for high-fidelity reward models. We propose Consensus Group Relative Policy Optimization (C-GRPO), which distills Minimum Bayes Risk (MBR) decoding into training by formulating the consensus utility as a group-relative objective within GRPO. C-GRPO requires only a utility function and policy samples, without gold references or explicit preference labels. Under ideal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications