Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents

Xuan Qi

arXiv:2604.02155·cs.CL·April 3, 2026

Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents

Xuan Qi

PDF

TL;DR

This study reveals that brief chain-of-thought reasoning significantly improves function-calling accuracy in language agents, while extended reasoning can impair performance, leading to the proposal of a structured brief-CoT method for better reliability.

Contribution

It uncovers the non-monotonic effects of reasoning length on agent accuracy and introduces Function-Routing CoT (FR-CoT), a structured approach that enhances reliability without budget tuning.

Findings

01

Brief reasoning (32 tokens) boosts accuracy by 45%.

02

Long reasoning (256 tokens) degrades performance below no-CoT baseline.

03

FR-CoT reduces hallucinations to 0% and maintains accuracy at brief reasoning levels.

Abstract

How much should a language agent think before taking action? Chain-of-thought (CoT) reasoning is widely assumed to improve agent performance, but the relationship between reasoning length and accuracy in structured tool-use settings remains poorly understood. We present a systematic study of CoT budget effects on function-calling agents, sweeping six token budgets (0--512) across 200 tasks from the Berkeley Function Calling Leaderboard v3 Multiple benchmark. Our central finding is a striking non-monotonic pattern on Qwen2.5-1.5B-Instruct: brief reasoning (32 tokens) dramatically improves accuracy by 45% relative over direct answers, from 44.0% to 64.0%, while extended reasoning (256 tokens) degrades performance well below the no-CoT baseline, to 25.0% (McNemar p < 0.001). A three-way error decomposition reveals the mechanism. At d = 0, 30.5% of tasks fail because the model selects the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.