Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement

Yongsheng Lian

arXiv:2512.07611·cs.AI·December 9, 2025

Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement

Yongsheng Lian

PDF

Open Access

TL;DR

This paper systematically compares PPO, GRPO, and DAPO RL algorithms for enhancing reasoning in large language models, providing insights on their performance, stability, and parameter effects through transfer learning and benchmark evaluations.

Contribution

It offers the first controlled transfer-learning evaluation of these RL algorithms on LLM reasoning, along with practical parametric guidance for training.

Findings

01

RL-trained models outperform base models on reasoning tasks

02

Increasing group size improves training stability and accuracy

03

Dynamic Sampling in DAPO does not enhance performance

Abstract

This study presents a systematic comparison of three Reinforcement Learning (RL) algorithms (PPO, GRPO, and DAPO) for improving complex reasoning in large language models (LLMs). Our main contribution is a controlled transfer-learning evaluation: models are first fine-tuned on the specialized Countdown Game and then assessed on a suite of general-purpose reasoning benchmarks. Across all tasks, RL-trained models outperform their corresponding base models, although the degree of improvement differs by benchmark. Our parametric analysis offers practical guidance for RL-based LLM training. Increasing the group size in GRPO and DAPO leads to more stable training dynamics and higher accuracy, while the impact of the KL-penalty coefficient is non-monotonic. Additionally, we find that the Dynamic Sampling (DS) component in DAPO does not improve performance; in fact, the best overall results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Language and cultural evolution