Group-Aware Reinforcement Learning for Output Diversity in Large Language Models

Oron Anschel; Alon Shoshan; Adam Botach; Shunit Haviv Hakimi; Asaf Gendler; Emanuel Ben Baruch; Nadav Bhonker; Igor Kviatkovsky; Manoj Aggarwal; Gerard Medioni

arXiv:2511.12596·cs.CL·November 18, 2025

Group-Aware Reinforcement Learning for Output Diversity in Large Language Models

Oron Anschel, Alon Shoshan, Adam Botach, Shunit Haviv Hakimi, Asaf Gendler, Emanuel Ben Baruch, Nadav Bhonker, Igor Kviatkovsky, Manoj Aggarwal, Gerard Medioni

PDF

Open Access

TL;DR

This paper introduces GAPO, a reinforcement learning method that enhances output diversity in large language models by optimizing group-level rewards, leading to more varied and valid responses without sacrificing accuracy.

Contribution

GAPO extends existing policy optimization techniques to incorporate group-level rewards, improving diversity in LLM outputs while maintaining performance on benchmarks.

Findings

01

GAPO increases response diversity in LLMs.

02

GAPO maintains accuracy on standard benchmarks.

03

GAPO generalizes to open-ended prompts.

Abstract

Large Language Models (LLMs) often suffer from mode collapse, repeatedly generating the same few completions even when many valid answers exist, limiting their diversity across a wide range of tasks. We introduce Group-Aware Policy Optimization (GAPO), a simple extension of the recent and popular Group Relative Policy Optimization (GRPO) that computes rewards over the group as a whole. GAPO enables learning from the group-level properties such as diversity and coverage. We demonstrate GAPO using a frequency-aware reward function that encourages uniform sampling over valid LLM completions, and show that GAPO-trained models produce valid and more diverse model responses. Beyond this setup, GAPO generalizes to open-ended prompts and improves response diversity without compromising accuracy on standard LLM benchmarks (GSM8K, MATH, HumanEval, MMLU-Pro). Our code will be made publicly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education