Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger

Wenjun Li; Dexun Li; Kuicai Dong; Cong Zhang; Hao Zhang; Weiwen Liu; Yasheng Wang; Ruiming Tang; Yong Liu

arXiv:2502.12961·cs.CL·August 22, 2025

Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger

Wenjun Li, Dexun Li, Kuicai Dong, Cong Zhang, Hao Zhang, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Liu

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces MeCo, a meta-cognition-based method enabling large language models to adaptively decide when to use external tools, thereby reducing unnecessary calls and errors without additional training.

Contribution

It presents a fine-tuning-free, minimal-cost meta-cognitive approach for LLMs to self-assess and improve external tool invocation decisions.

Findings

01

MeCo effectively detects LLMs' internal cognitive signals.

02

It significantly improves tool-use decision accuracy.

03

The method reduces unnecessary tool calls and errors.

Abstract

Large language models (LLMs) have shown remarkable emergent capabilities, transforming the execution of functional tasks by leveraging external tools for complex problems that require specialized processing or up-to-date data. While existing research expands LLMs access to diverse tools (e.g., program interpreters, search engines, calculators), the necessity of using these tools is often overlooked, leading to indiscriminate tool invocation. This naive approach raises two key issues: increased latency due to unnecessary tool calls, and potential errors resulting from faulty interactions with external tools. In this paper, we introduce meta-cognition as a proxy for LLMs self-assessment of their capabilities, reflecting the model's awareness of its own limitations. Based on this, we propose MeCo, an adaptive decision-making strategy for external tool use. MeCo quantifies metacognitive…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 3

Strengths

The problem identification and motivation are excellent. The authors clearly articulate why indiscriminate tool use is problematic and provide compelling examples. The empirical results are strong, showing an 11% improvement in accuracy across various benchmarks. The approach is also practical - it requires no fine-tuning and can be easily integrated into existing systems.

Weaknesses

I am unsure of any new technical details beyond just applying an existing RePe research to tool use. While the authors frame this as detecting "meta-cognition", it's functionally very similar to previous work on detecting other concepts like honesty or confidence. The main innovation seems to be in the framing rather than the technical approach. The decision mechanism is overly simplistic, using basic thresholds on the meta-cognition scores without any principled way to set these thresholds. T

Reviewer 02Rating 6Confidence 3

Strengths

Meta-Cognition Trigger Mechanism: The paper introduces a meta-cognition-oriented trigger mechanism for large language models (LLMs), which enables models to assess their own capabilities and invoke external tools only when needed. This approach optimizes efficiency by minimizing unnecessary tool usage. Policy Utilization Effectiveness: By integrating meta-cognition evaluations into decision-making policies, the approach improves decision accuracy, proving more effective than prior methods in g

Weaknesses

Simplified Benchmarks: The paper primarily evaluates its approach on benchmarks that may not fully reflect real-world complexity. This can limit the broader applicability and relevance of its findings in practical scenarios. Underexplored Limitations of Meta-Cognition Scoring: While the meta-cognition approach is promising, the paper does not deeply address cases where this scoring might fail or where it could lead to suboptimal decisions, particularly with ambiguous or highly nuanced queries.

Reviewer 03Rating 5Confidence 3

Strengths

* proposition of a new metric to help an LLM judge its own capabilities * two new datasets for judging whether the use of external resources in the forms of tools or RAG is necessary

Weaknesses

The paper lacks focus and flow. The components are only sometimes clearly described and the text contains several contradictions (no fine-tuning according to abstract, but is actually used), for example the decision process shown in the motivation figure is never discussed (and might be wrong) or some discussion seems to lack details like the determination of thresholds. This makes it hard to follow and to clearly grasp, what the contributions are. * the decision making process presented in Fig

Videos

Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger· underline

Taxonomy

TopicsTopic Modeling

MethodsBalanced Selection