Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning
Purbesh Mitra, Sennur Ulukus

TL;DR
This paper introduces Semantic Soft Bootstrapping, a self-distillation method for long context reasoning in LLMs that improves accuracy without reinforcement learning, using a semantic context-based training pipeline.
Contribution
The work presents a novel self-distillation technique that leverages semantic contexts for training LLMs, eliminating the need for reinforcement learning with verifiable rewards.
Findings
Achieved 10.6% and 10% accuracy improvements on GSM8K and AIME2024 benchmarks.
Demonstrated effective training of LLMs using semantic soft bootstrapping without human intervention.
Outperformed traditional RLVR methods like GRPO in reasoning tasks.
Abstract
Long context reasoning in large language models (LLMs) has demonstrated enhancement of their cognitive capabilities via chain-of-thought (CoT) inference. Training such models is usually done via reinforcement learning with verifiable rewards (RLVR) in reasoning based problems, like math and programming. However, RLVR is limited by several bottlenecks, such as, lack of dense reward, and inadequate sample efficiency. As a result, it requires significant compute resources in post-training phase. To overcome these limitations, in this work, we propose \textbf{Semantic Soft Bootstrapping (SSB)}, a self-distillation technique, in which the same base language model plays the role of both teacher and student, but receives different semantic contexts about the correctness of its outcome at training time. The model is first prompted with a math problem and several rollouts are generated. From…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
