Same Signal, Opposite Meaning: Direction-Informed Adaptive Learning for LLM Agents
Ziming Li, Jiatan Huang, Xiaoguang Guo, Guilin Wang, Chuxu Zhang

TL;DR
This paper introduces DIAL, a method for adaptive compute in LLMs that learns the correct direction of gating signals, improving performance across diverse environments and models.
Contribution
It reveals the instability of fixed-direction gating signals and proposes a learned, environment-specific gating approach called DIAL.
Findings
DIAL outperforms fixed-direction baselines in success-cost trade-offs.
Gating signals can be unreliable due to conflicting interpretations of uncertainty.
Environment-specific gating directions improve adaptive compute effectiveness.
Abstract
Adaptive test-time compute for LLM agents aims to invoke extra computation only when it improves performance. Existing methods typically use confidence-, uncertainty-, or difficulty-based gates, assuming a fixed direction from the gating signal through compute need to the value of computation. This makes gating a utility-calibration problem: gating signals should align with whether extra computation improves the final outcome over the base policy. We show that this alignment is unstable: the same signal predicts rollout benefit in one setting and rollout harm in another, with reversals across environments and backbones even when the task is fixed. Wrong-direction gates can therefore worsen performance by precisely selecting harmful states. This reversal reflects a deeper distinction between compute need and compute suitability: a high uncertainty signal may indicate decision-difficult…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
