Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models
Jinwu Hu, Dongjin Yang, Langyu Bian, Zhiquan Wen, Yufeng Wang, Yaofo Chen, Bin Xiao, Yuanqing Li, Mingkui Tan

TL;DR
This paper introduces CogER, a human-inspired hierarchical reasoning framework for LLMs that dynamically selects reasoning strategies based on query complexity, improving efficiency and accuracy across diverse tasks.
Contribution
CogER is a novel framework that models query difficulty assessment and strategy selection as a reinforcement learning problem, incorporating external tool invocation for enhanced reasoning.
Findings
Achieves at least 13% improvement in exact match on In-Domain tasks.
Attains 8% relative gain on Out-of-Domain tasks.
Outperforms state-of-the-art Test-Time scaling methods.
Abstract
Large language models (LLMs) have demonstrated impressive performance across various language tasks. However, existing LLM reasoning strategies mainly rely on the LLM itself with fast or slow mode (like o1 thinking) and thus struggle to balance reasoning efficiency and accuracy across queries of varying difficulties. In this paper, we propose Cognitive-Inspired Elastic Reasoning (CogER), a framework inspired by human hierarchical reasoning that dynamically selects the most suitable reasoning strategy for each query. Specifically, CogER first assesses the complexity of incoming queries and assigns them to one of several predefined levels, each corresponding to a tailored processing strategy, thereby addressing the challenge of unobservable query difficulty. To achieve automatic strategy selection, we model the process as a Markov Decision Process and train a CogER-Agent using…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
1. **Well-Motivated Problem**: Dynamic resource allocation in test-time compute is an important practical challenge. The paper clearly articulates the inefficiency of one-size-fits-all reasoning strategies. 2. **Comprehensive System Design**: The framework includes multiple well-integrated components: complexity classification, MDP formulation, specialized reward functions (particularly R_hierarchy for cost-awareness), and CoTool for tool integration. Algorithm 1 provides clear implementation g
## 1. Inadequate Baseline Selection **Missing critical routing baselines**: The paper omits comparisons to directly relevant work: - **RouteLLM (ICLR 2025)**: Uses preference-based training for LLM routing with similar objectives **Unfair comparisons**: - DeepSeek-R1 is a closed-source 671B model tested under unknown conditions; should compare against open DeepSeek-R1-Distill (7B/14B/32B) - No iso-compute baseline: should compare "always QwQ-32B" with same average compute budget as CogER ## 2
- The paper proposes categorizing queries by complexity into different reasoning modes, including direct answer, concise CoT, extended CoT, and tool-assisted reasoning. This method provides a novel mechanism to balance accuracy and computational efficiency. - The approach demonstrates significant improvements across multiple benchmarks, achieving notable gains in accuracy, efficiency, and reduced latency compared to standard fixed or scaling-based strategies. - By drawing inspiration from cogn
- The reward depends on $L_{\min}(\mathcal{S})$, the minimal sufficient level, but the paper does not explain how this unobservable quantity is obtained during training or evaluation. - It is not clear how to handle tool errors and prompt injection, and how to avoid gaming of the format reward by printing tags without real gains. - The MDP action space mixes high-level actions with the token vocabulary $\mathcal{V}$. It would be beneficial if the authors could further explain how actions are m
1. Clear Motivation & Principled Design: The paper clearly targets the "one-size-fits-all" inefficiency. The 4-level hierarchy derived from cognitive science (Bloom's Taxonomy) provides a logical and well-founded structure. 2. Strong Experimental Results: CogER achieves SOTA accuracy on both ID and OOD tasks. The efficiency gains are significant, demonstrating, for example, over 4x lower latency than the top-performing DeepSeek-R1 baseline. 3. Effective Reward Function: The composite reward i
1. Lack of Controlled Routing Overhead Analysis: The paper does not quantify the specific latency overhead in the controlled environment. This makes it difficult to ascertain the precise efficiency trade-off, especially for simple L1 queries where the router's cost may be non-trivial. 2. Absence of Error Analysis: There is no breakdown analysis of the framework's failure cases. It is unclear whether errors stem from (1) the agent's incorrect routing or (2) the execution module's failure despi
1. The authors propose a novel classification perspective and a new method for understanding problem difficulty. 2. Experiments were conducted on multiple In-domain and Out-of-Domain datasets, and detailed ablation studies were also included.
1. Regarding "the minimal level required for a given query," the paper does not provide a reasonable explanation. How is this obtained? If the question-solving ability is not stable (e.g., L2 might occasionally be able to solve it correctly), how to handle this? I believe this is a crucial point of the paper, but it is not discussed. 2. There is confusion between the cognitive hierarchy and tool requirements. Problem complexity and the need for tools are two orthogonal attributes. A very comple
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Big Data and Digital Economy · Natural Language Processing Techniques
