SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought
Jianwei Wang, Ziming Wu, Fuming Lai, Shaobing Lian, Ziqian Zeng

TL;DR
SynAdapt introduces a synthetic continuous chain-of-thought framework that enhances reasoning efficiency in large language models by guiding learning with synthetic CCoT and adaptively rethinking hard questions.
Contribution
The paper presents a novel synthetic CCoT generation method and an adaptive prompting strategy to improve reasoning accuracy and efficiency in large language models.
Findings
Achieves the best accuracy-efficiency trade-off across multiple benchmarks.
Effectively identifies hard questions using a difficulty classifier.
Enhances LLM reasoning by explicit CCoT guidance and adaptive re-thinking.
Abstract
While Chain-of-Thought (CoT) reasoning improves model performance, it incurs significant time costs due to the generation of discrete CoT tokens (DCoT). Continuous CoT (CCoT) offers a more efficient alternative, but existing CCoT methods are hampered by indirect fine-tuning, limited alignment, or inconsistent targets. To overcome these limitations, we propose \textit{SynAdapt}, an innovative efficient reasoning framework. Specifically, \textit{SynAdapt} generates the synthetic CCoT to serve as a precise and effective alignment target for LLMs. This synthetic CCoT explicitly guides the LLM to learn CCoT and derive accurate answers directly. Furthermore, relying solely on CCoT is insufficient for solving hard questions. To address this, \textit{SynAdapt} integrates a difficulty classifier that leverages both question context and CCoT to identify hard questions. CCoT can effectively help…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. Novel, tightly-coupled design: The three-stage pipeline (find optimal CCoT for LLM A → train LLM B to mimic it → deploy B+A) is novel and interesting. 2. The authors conduct extensive experiments to prove the superiority of SynAdapt against baselines.
> Compute cost is only partially accounted for Training LLM B requires n iterations and each forward pass concatenates the full current CCoT (length m) with the question. Complexity O(n*m) is paid during synthesis, yet Table 1 quotes only the final inference length. If n≈4 and m≈512, the total FLOPs *before seeing a single test example can be already large. A FLOPs count that includes the iterative stage is needed to argue for true efficiency. > Performance concerns Although high token eff
The authors implement wide experiments to evaluate the performance of their method.The proposed method is empirically shown to be effective in reasoning tasks in the author’s setting. The authors compare their method with various baselines on different datasets.
This paper introduces a quite complex framework based on different components proposed by past works, while it lacks really interesting or important insights/ findings. The effectiveness of the framework may be questionable. Based on Table 1, when fully using CCoT, the performance is basically the same as directly prompting the model to give the output. The better performance in the accuracy-sensitive scenario is because the full CoT of the model is used. So this naturally questions why we ever
1. The paper brings a new perspective to CCoT learning in LLMs, addressing weaknesses of prior partial or indirect alignment approaches. Specifically, SynAdapt’s use of synthetic, explicitly optimized CCoTs as full alignment targets for fine-tuning is clearly articulated and represents a concrete methodological advance. 2. The inclusion of an adaptive, CCoT-informed difficulty classifier is well-motivated and shows robust empirical performance over alternatives that rely on perplexity, prompting
1. Missing Discussion of Key Related Recent Works. [1] also proposed a module to classify the questions based on the questions’ complexity. [1] X. Chen, S. Zhou, K. Liang, and X. Liu, “Distilling reasoning ability from large language models with adaptive thinking,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–14, 2025. 2. Clarity and Interpretation of Mathematical Descriptions: - the motivation for aligning only the eot token’s hidden states is given for overfitting prevent
1. Adaptive reasoning is of practical use due to the need to save resources. 2. The paper is very well written and easy to follow. 3. Experiments have clearly demonstrated the capability of the proposed model in balancing effectiveness and efficiency for inference.
1. The proposed method is relatively weak in effectiveness. 2. CCoT method lacks explainability, which deviates from explainable inference in daily use. 3. The classifier introduces an additional hyperparameter to control easy and hard queries, which is difficult to set.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
