BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

Han Zhong; Yutong Yin; Shenao Zhang; Xiaojun Xu; Yuanxin Liu; Yifei Zuo; Zhihan Liu; Boyi Liu; Sirui Zheng; Hongyi Guo; Liwei Wang; Mingyi Hong; Zhaoran Wang

arXiv:2501.18858·cs.LG·June 10, 2025

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang

PDF

Open Access

TL;DR

BRiTE introduces a probabilistic framework and reinforcement learning algorithm to improve reasoning in large language models, achieving better performance without human-annotated data.

Contribution

The paper proposes BRiTE, a novel reinforcement learning method that enhances LLM reasoning by bootstrapping rationales within a probabilistic graphical model.

Findings

01

Consistently improves performance on math and coding benchmarks.

02

Matches or exceeds supervised fine-tuning results.

03

Converges at a rate of 1/T with iterative reinforcement learning.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. Within this framework, we introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps. First, it generates high-quality rationales by approximating the optimal thinking process through reinforcement learning, using a novel reward shaping mechanism. Second, it enhances the base LLM by maximizing the joint probability of rationale generation with respect to the model's parameters. Theoretically, we demonstrate BRiTE's convergence at a rate of $1/ T$ with $T$ representing the number of iterations. Empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsBalanced Selection