Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models

Fei Ding; Baiqiao Wang; Zijian Zeng; Youwei Wang

arXiv:2506.04746·cs.LG·June 6, 2025

Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models

Fei Ding, Baiqiao Wang, Zijian Zeng, Youwei Wang

PDF

Open Access

TL;DR

This paper introduces MGRPO, a multi-layer approach to improve reasoning and self-correction in large language models, leading to better performance on mathematical reasoning tasks.

Contribution

MGRPO adds a second layer for error correction, providing implicit supervision and enhancing reasoning and self-correction in LLMs.

Findings

01

MGRPO outperforms standard GRPO on mathematical benchmarks.

02

The two-layer structure improves reasoning accuracy.

03

Self-correction significantly boosts training stability.

Abstract

The Group Relative Policy Optimization (GRPO) algorithm has demonstrated considerable success in enhancing the reasoning capabilities of large language models (LLMs), as evidenced by DeepSeek-R1. However, the absence of intermediate supervision in GRPO frequently leads to inefficient exploration dynamics. A single error in a complex reasoning chain can invalidate the entire solution, resulting in abrupt reward vanishing and compromising training stability.To address these challenges, we propose MGRPO (Multi-layer GRPO). MGRPO operates in two layers: the first layer employs standard GRPO to generate an initial response. This response, along with the original query, is then fed into a second-layer GRPO process. This second layer is specifically trained to identify and correct errors in the initial response, effectively creating a self-correction loop. This mechanism provides implicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications