DART: Distilling Autoregressive Reasoning to Silent Thought
Nan Jiang, Ziming Wu, De-Chuan Zhan, Fuming Lai, Shaobing Lian

TL;DR
DART introduces a self-distillation framework that enables large language models to perform reasoning with non-autoregressive silent thought, significantly reducing inference latency while maintaining high reasoning performance.
Contribution
DART is the first to successfully distill autoregressive reasoning into a non-autoregressive silent thought process using self-distillation and a reasoning evolvement module.
Findings
DART achieves comparable reasoning accuracy to autoregressive methods.
DART reduces inference latency without sacrificing performance.
Experimental results outperform existing non-autoregressive baselines.
Abstract
Chain-of-Thought (CoT) reasoning has significantly advanced Large Language Models (LLMs) in solving complex tasks. However, its autoregressive paradigm leads to significant computational overhead, hindering its deployment in latency-sensitive applications. To address this, we propose \textbf{DART} (\textbf{D}istilling \textbf{A}utoregressive \textbf{R}easoning to Silent \textbf{T}hought), a self-distillation framework that enables LLMs to replace autoregressive CoT with non-autoregressive Silent Thought (ST). Specifically, DART introduces two training pathways: the CoT pathway for traditional reasoning and the ST pathway for generating answers directly from a few ST tokens. The ST pathway utilizes a lightweight Reasoning Evolvement Module (REM) to align its hidden states with the CoT pathway, enabling the ST tokens to evolve into informative embeddings. During inference, only the ST…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning
MethodsALIGN · Difficulty-Aware Rejection Tuning
