Group Sequence Policy Optimization
Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, Jingren Zhou, Junyang Lin

TL;DR
This paper presents GSPO, a reinforcement learning algorithm for large language models that improves training stability and efficiency by using sequence-level importance ratios, leading to better performance and simpler infrastructure.
Contribution
GSPO introduces sequence-level importance ratios and clipping for RL training of large language models, outperforming previous methods like GRPO and stabilizing MoE RL training.
Findings
GSPO achieves superior training efficiency and performance.
GSPO stabilizes Mixture-of-Experts RL training.
GSPO contributes to improvements in Qwen3 models.
Abstract
This paper introduces Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant reinforcement learning algorithm for training large language models. Unlike previous algorithms that adopt token-level importance ratios, GSPO defines the importance ratio based on sequence likelihood and performs sequence-level clipping, rewarding, and optimization. We demonstrate that GSPO achieves superior training efficiency and performance compared to the GRPO algorithm, notably stabilizes Mixture-of-Experts (MoE) RL training, and has the potential for simplifying the design of RL infrastructure. These merits of GSPO have contributed to the remarkable improvements in the latest Qwen3 models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ServiceNow-AI/Apriel-1.6-15b-Thinkermodel· 1.7k dl· ♡ 2961.7k dl♡ 296
- 🤗driaforall/mem-agentmodel· 57 dl· ♡ 9557 dl♡ 95
- 🤗hanchaow/QTuneVL1_5-3Bmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗vivekvar/GSPO-DeepSeek-R1-Distill-Qwen-1.5Bmodel· 3 dl· ♡ 23 dl♡ 2
- 🤗gabriellarson/Qwen2.5-GSPO-experimentmodel
- 🤗driaforall/mem-agent-mlx-bf16model· 4 dl· ♡ 14 dl♡ 1
- 🤗driaforall/mem-agent-mlx-4bitmodel· 76 dl· ♡ 576 dl♡ 5
- 🤗driaforall/mem-agent-mlx-8bitmodel· 6 dl· ♡ 26 dl♡ 2
- 🤗QuantFactory/mem-agent-GGUFmodel· 410 dl· ♡ 2410 dl♡ 2
- 🤗Mungert/mem-agent-GGUFmodel· 15 dl15 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
