DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

Gang Li; Ming Lin; Tomer Galanti; Zhengzhong Tu; Tianbao Yang

arXiv:2505.12366·cs.LG·January 7, 2026

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

Gang Li, Ming Lin, Tomer Galanti, Zhengzhong Tu, Tianbao Yang

PDF

Open Access 1 Repo 4 Models 1 Video

TL;DR

DisCO introduces a discriminative constrained optimization framework to improve large reasoning models by eliminating question difficulty bias and stabilizing training, leading to significant performance gains over existing methods.

Contribution

The paper proposes DisCO, a novel discriminative learning-based reinforcement method that overcomes limitations of GRPO, enhancing reasoning model training stability and performance.

Findings

01

DisCO outperforms GRPO and DAPO by 6-7% on six benchmarks.

02

It effectively eliminates question difficulty bias in reasoning tasks.

03

DisCO stabilizes training dynamics with non-clipping scoring functions and constrained optimization.

Abstract

The recent success and openness of DeepSeek-R1 have brought widespread attention to Group Relative Policy Optimization (GRPO) as a reinforcement learning method for large reasoning models (LRMs). In this work, we analyze the GRPO objective under a binary reward setting and reveal an inherent limitation of question-level difficulty bias. We also identify a connection between GRPO and traditional discriminative methods in supervised learning. Motivated by these insights, we introduce a new Discriminative Constrained Optimization (DisCO) framework for reinforcing LRMs, grounded in the principle of discriminative learning. The main differences between DisCO and GRPO and its recent variants are: (1) it replaces the group relative objective with a discriminative objective defined by a scoring function; (2) it abandons clipping-based surrogates in favor of non-clipping RL surrogate objectives…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

optimization-ai/disco
pytorchOfficial

Models

Videos

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications

MethodsSoftmax · Attention Is All You Need · Dialogue-Adaptive Pre-training Objective