Coupled Variational Reinforcement Learning for Language Model General Reasoning

Xueru Wen; Jie Lou; Yanjiang Liu; Hongyu Lin; Ben He; Xianpei Han; Le Sun; Yaojie Lu; Debing Zhang

arXiv:2512.12576·cs.CL·January 28, 2026

Coupled Variational Reinforcement Learning for Language Model General Reasoning

Xueru Wen, Jie Lou, Yanjiang Liu, Hongyu Lin, Ben He, Xianpei Han, Le Sun, Yaojie Lu, Debing Zhang

PDF

Open Access

TL;DR

This paper introduces CoVRL, a novel reinforcement learning framework that couples variational inference with RL to improve reasoning in language models, achieving significant performance gains on reasoning benchmarks.

Contribution

It proposes a coupled variational RL method that enhances exploration and coherence in reasoning traces, advancing verifier-free RL for language models.

Findings

01

Improves reasoning performance by 12.4% over base models

02

Achieves 2.3% higher accuracy than state-of-the-art verifier-free RL methods

03

Provides a new principled framework for reasoning in language models

Abstract

While reinforcement learning has achieved impressive progress in language model reasoning, it is constrained by the requirement for verifiable rewards. Recent verifier-free RL methods address this limitation by utilizing the probabilities that LLMs generate reference answers as reward signals. However, these approaches typically sample reasoning traces conditioned only on the question. This design decouples reasoning-trace sampling from answer information, leading to inefficient exploration and incoherence between traces and final answers. In this paper, we propose \textit{\b{Co}upled \b{V}ariational \b{R}einforcement \b{L}earning} (CoVRL), which bridges variational inference and reinforcement learning by coupling prior and posterior distributions through a hybrid sampling strategy. By constructing and optimizing a composite distribution that integrates these two distributions, CoVRL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques