Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small   Models for Enhanced Reasoning Distillation

Yong Zhang; Bingyuan Zhang; Zhitao Li; Ming Li; Ning Cheng; Minchuan; Chen; Tao Wei; Jun Ma; Shaojun Wang; Jing Xiao

arXiv:2502.12744·cs.CL·February 19, 2025

Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation

Yong Zhang, Bingyuan Zhang, Zhitao Li, Ming Li, Ning Cheng, Minchuan, Chen, Tao Wei, Jun Ma, Shaojun Wang, Jing Xiao

PDF

Open Access

TL;DR

This paper introduces Self-Enhanced Reasoning Training (SERT), a method that activates and leverages the latent reasoning capabilities of small models through self-training on their own high-quality reasoning paths, improving their reasoning performance.

Contribution

The paper proposes SERT, a novel training approach that enhances small models' reasoning abilities by utilizing their latent reasoning paths without relying on chain-of-thought prompting.

Findings

01

SERT improves small models' reasoning performance in distillation tasks.

02

Small models can generate high-quality reasoning paths during sampling.

03

Self-training on latent reasoning paths enhances small models' reasoning capabilities.

Abstract

The rapid advancement of large language models (LLMs) has significantly enhanced their reasoning abilities, enabling increasingly complex tasks. However, these capabilities often diminish in smaller, more computationally efficient models like GPT-2. Recent research shows that reasoning distillation can help small models acquire reasoning capabilities, but most existing methods focus primarily on improving teacher-generated reasoning paths. Our observations reveal that small models can generate high-quality reasoning paths during sampling, even without chain-of-thought prompting, though these paths are often latent due to their low probability under standard decoding strategies. To address this, we propose Self-Enhanced Reasoning Training (SERT), which activates and leverages latent reasoning capabilities in small models through self-training on filtered, self-generated reasoning paths…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Topic Modeling · Advanced Text Analysis Techniques