Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning

Purbesh Mitra; Sennur Ulukus

arXiv:2512.05105·cs.CL·December 5, 2025

Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning

Purbesh Mitra, Sennur Ulukus

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper introduces Semantic Soft Bootstrapping, a self-distillation method for long context reasoning in LLMs that improves accuracy without reinforcement learning, using a semantic context-based training pipeline.

Contribution

The work presents a novel self-distillation technique that leverages semantic contexts for training LLMs, eliminating the need for reinforcement learning with verifiable rewards.

Findings

01

Achieved 10.6% and 10% accuracy improvements on GSM8K and AIME2024 benchmarks.

02

Demonstrated effective training of LLMs using semantic soft bootstrapping without human intervention.

03

Outperformed traditional RLVR methods like GRPO in reasoning tasks.

Abstract

Long context reasoning in large language models (LLMs) has demonstrated enhancement of their cognitive capabilities via chain-of-thought (CoT) inference. Training such models is usually done via reinforcement learning with verifiable rewards (RLVR) in reasoning based problems, like math and programming. However, RLVR is limited by several bottlenecks, such as, lack of dense reward, and inadequate sample efficiency. As a result, it requires significant compute resources in post-training phase. To overcome these limitations, in this work, we propose \textbf{Semantic Soft Bootstrapping (SSB)}, a self-distillation technique, in which the same base language model plays the role of both teacher and student, but receives different semantic contexts about the correctness of its outcome at training time. The model is first prompted with a math problem and several rollouts are generated. From…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
purbeshmitra/semantic-soft-bootstrapping
model· 5 dl· ♡ 2
5 dl♡ 2

Datasets

purbeshmitra/ssb_teacher_data
dataset· 23 dl
23 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)