ReNCE: Learning to Reason by Noise Contrastive Estimation

Wenzheng Zhang; Karl Stratos

arXiv:2601.22432·cs.LG·February 2, 2026

ReNCE: Learning to Reason by Noise Contrastive Estimation

Wenzheng Zhang, Karl Stratos

PDF

Open Access

TL;DR

This paper introduces ReNCE, a contrastive learning method for enhancing reasoning in pretrained language models, offering a more straightforward alternative to advantage estimation techniques like GRPO.

Contribution

ReNCE presents a novel explicit contrastive learning framework for LLM reasoning, simplifying the training process compared to advantage-based methods.

Findings

01

ReNCE achieves competitive results on math benchmarks.

02

It outperforms some existing methods like DAPO and online DPO.

03

The approach simplifies training by avoiding advantage estimation.

Abstract

GRPO is a standard approach to endowing pretrained LLMs with reasoning capabilities. It estimates the advantage of an outcome from a group of $K$ outcomes, and promotes those with positive advantages inside a trust region. Since GRPO discriminates between good and bad outcomes softly, it benefits from additional refinements such as asymmetric clipping and zero-variance data filtering. While effective, these refinements require significant empirical insight and can be challenging to identify. We instead propose an explicit contrastive learning approach. Instead of estimating advantages, we bifurcate $K$ outcomes into positive and negative sets, then maximize the likelihood of positive outcomes. Our approach can be viewed as an online instantiation of (multi-label) noise contrastive estimation for LLM reasoning. We validate our method by demonstrating competitive performance on a suite of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Advanced Graph Neural Networks