DART: Distilling Autoregressive Reasoning to Silent Thought

Nan Jiang; Ziming Wu; De-Chuan Zhan; Fuming Lai; Shaobing Lian

arXiv:2506.11752·cs.CL·August 29, 2025

DART: Distilling Autoregressive Reasoning to Silent Thought

Nan Jiang, Ziming Wu, De-Chuan Zhan, Fuming Lai, Shaobing Lian

PDF

Open Access 1 Video

TL;DR

DART introduces a self-distillation framework that enables large language models to perform reasoning with non-autoregressive silent thought, significantly reducing inference latency while maintaining high reasoning performance.

Contribution

DART is the first to successfully distill autoregressive reasoning into a non-autoregressive silent thought process using self-distillation and a reasoning evolvement module.

Findings

01

DART achieves comparable reasoning accuracy to autoregressive methods.

02

DART reduces inference latency without sacrificing performance.

03

Experimental results outperform existing non-autoregressive baselines.

Abstract

Chain-of-Thought (CoT) reasoning has significantly advanced Large Language Models (LLMs) in solving complex tasks. However, its autoregressive paradigm leads to significant computational overhead, hindering its deployment in latency-sensitive applications. To address this, we propose \textbf{DART} (\textbf{D}istilling \textbf{A}utoregressive \textbf{R}easoning to Silent \textbf{T}hought), a self-distillation framework that enables LLMs to replace autoregressive CoT with non-autoregressive Silent Thought (ST). Specifically, DART introduces two training pathways: the CoT pathway for traditional reasoning and the ST pathway for generating answers directly from a few ST tokens. The ST pathway utilizes a lightweight Reasoning Evolvement Module (REM) to align its hidden states with the CoT pathway, enabling the ST tokens to evolve into informative embeddings. During inference, only the ST…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DART: Distilling Autoregressive Reasoning to Silent Thought· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning

MethodsALIGN · Difficulty-Aware Rejection Tuning