Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Xiaoying Zhang; Yipeng Zhang; Hao Sun; Kaituo Feng; Chaochao Lu; Chao Yang; Helen Meng

arXiv:2506.03106·cs.CL·February 23, 2026

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Xiaoying Zhang, Yipeng Zhang, Hao Sun, Kaituo Feng, Chaochao Lu, Chao Yang, Helen Meng

PDF

Open Access

TL;DR

Critique-GRPO is a novel reinforcement learning framework that combines natural language and numerical feedback to significantly improve large language models' reasoning abilities and self-refinement capabilities.

Contribution

It introduces Critique-GRPO, an online RL method integrating natural language critiques with numerical rewards, enhancing LLM reasoning and self-improvement over existing methods.

Findings

01

Achieves 15-21.6% Pass@1 improvements on Qwen models.

02

Attains 7.3% Pass@1 improvement on Llama-3.2-3B-Instruct.

03

Enables effective self-critiquing and substantial performance gains.

Abstract

Recent advances in reinforcement learning (RL) using numerical rewards have significantly enhanced the complex reasoning capabilities of large language models (LLMs). However, we identify three fundamental limitations of purely numerical feedback: performance plateaus, ineffective spontaneous self-reflection, and persistent failures. We show that plateaued RL models can successfully refine failed solutions when given natural language critiques. Motivated by this, we propose Critique-GRPO, an online RL framework that integrates both natural language and numerical feedback for policy optimization. This approach enables LLMs to learn simultaneously from initial responses and critique-guided refinements, effectively internalizing the exploration benefits of both stages. Extensive experiments show that Critique-GRPO outperforms all compared supervised and RL-based fine-tuning methods,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications