Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents

Qihao Wang; Yue Hu; Mingzhe Lu; Jiayue Wu; Yanbing Liu; Yuanmin Tang

arXiv:2601.20412·cs.CL·January 29, 2026

Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents

Qihao Wang, Yue Hu, Mingzhe Lu, Jiayue Wu, Yanbing Liu, Yuanmin Tang

PDF

Open Access

TL;DR

This paper introduces a cognitive load framework for evaluating large language models' tool-use capabilities, moving beyond accuracy metrics to diagnose performance bottlenecks and map model boundaries under varying task complexities.

Contribution

It presents a novel framework based on Cognitive Load Theory, including a Tool Interaction Graph and a benchmark with adjustable load, to better understand model limitations.

Findings

01

Models exhibit performance cliffs at high cognitive loads.

02

The framework's predictions align closely with empirical results.

03

It enables precise mapping of model capability boundaries.

Abstract

The ability of Large Language Models (LLMs) to use external tools unlocks powerful real-world interactions, making rigorous evaluation essential. However, current benchmarks primarily report final accuracy, revealing what models can do but obscuring the cognitive bottlenecks that define their true capability boundaries. To move from simple performance scoring to a diagnostic tool, we introduce a framework grounded in Cognitive Load Theory. Our framework deconstructs task complexity into two quantifiable components: Intrinsic Load, the inherent structural complexity of the solution path, formalized with a novel Tool Interaction Graph; and Extraneous Load, the difficulty arising from ambiguous task presentation. To enable controlled experiments, we construct ToolLoad-Bench, the first benchmark with parametrically adjustable cognitive load. Our evaluation reveals distinct performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education