AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Chenwei Lou; Zewei Sun; Xinnian Liang; Meng Qu; Wei Shen; Wenqi Wang; Yuntao Li; Qingping Yang; Shuangzhi Wu

arXiv:2505.11896·cs.LG·May 27, 2025

AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Chenwei Lou, Zewei Sun, Xinnian Liang, Meng Qu, Wei Shen, Wenqi Wang, Yuntao Li, Qingping Yang, Shuangzhi Wu

PDF

Open Access

TL;DR

AdaCoT introduces an adaptive framework for large language models that intelligently decides when to generate detailed reasoning steps, significantly reducing computational costs while maintaining performance.

Contribution

It presents a reinforcement learning approach with Selective Loss Masking for stable, Pareto-optimal adaptive reasoning in LLMs, balancing accuracy and efficiency.

Findings

01

Reduced CoT triggering rate to 3.18% on testset

02

Decreased average response tokens by 69.06%

03

Maintained high performance on complex tasks

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities but often face challenges with tasks requiring sophisticated reasoning. While Chain-of-Thought (CoT) prompting significantly enhances reasoning, it indiscriminately generates lengthy reasoning steps for all queries, leading to substantial computational costs and inefficiency, especially for simpler inputs. To address this critical issue, we introduce AdaCoT (Adaptive Chain-of-Thought), a novel framework enabling LLMs to adaptively decide when to invoke CoT. AdaCoT framed adaptive reasoning as a Pareto optimization problem that seeks to balance model performance with the costs associated with CoT invocation (both frequency and computational overhead). We propose a reinforcement learning (RL) based method, specifically utilizing Proximal Policy Optimization (PPO), to dynamically control the CoT triggering decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics