Transformation-Augmented GRPO for Enhancing Exploration in Reasoning of Large Language Models

Khiem Le; Phuc Nguyen; Youssef Mroueh; Chi-Heng Lin; Shangqian Gao; Ting Hua; Nitesh V. Chawla

arXiv:2601.22478·cs.LG·May 20, 2026

Transformation-Augmented GRPO for Enhancing Exploration in Reasoning of Large Language Models

Khiem Le, Phuc Nguyen, Youssef Mroueh, Chi-Heng Lin, Shangqian Gao, Ting Hua, Nitesh V. Chawla

PDF

TL;DR

TA-GRPO enhances exploration in large language models by using question rephrasing to generate diverse responses, addressing gradient vanishing and diversity collapse issues in reinforcement learning.

Contribution

It introduces a simple method that automatically generates question rephrasings to improve exploration and diversity in reinforcement learning for large language models.

Findings

01

TA-GRPO improves pass@$k$ on multiple benchmarks.

02

It increases average pass@32 by 4.97 and 4.34 points for two models.

03

It matches exploration quality of larger-data baselines.

Abstract

Group Relative Policy Optimization (GRPO) has become the dominant method for reinforcement learning with verifiable rewards in large language models, but it suffers from two critical limitations: gradient vanishing and diversity collapse. When training questions are too easy or too hard, all sampled responses receive identical rewards, yielding zero gradients. Meanwhile, the model tends to collapse its responses toward a single reasoning pattern rather than exploring diverse strategies. We propose Transformation-Augmented GRPO (TA-GRPO), a simple but effective method that addresses both issues via question rephrasing. For each training question, we automatically generate multiple problem-equivalent rephrasings that alter wording, format, and information order while preserving the underlying meaning. Because these rephrasings shift the model's perceived difficulty, pooling responses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.