TACT: Mitigating Overthinking and Overacting in Coding Agents via Activation Steering
Yuan Sui, Yulin Chen, Yibo Li, Xue Jiang, Yufei He, Yihong Dong, Xiaoxin He, Tianyu Gao, Bryan Hooi

TL;DR
This paper introduces TACT, a method to detect and correct agent drift in language model agents by steering their internal activations, improving performance on complex software engineering tasks.
Contribution
The paper presents a novel activation steering technique to identify and mitigate overthinking and overacting in language model agents, enhancing their reliability.
Findings
Activation states can linearly separate overthinking and overacting behaviors.
Steering activations reduces agent errors and improves resolve rates.
TACT outperforms baselines on multiple benchmarks, reducing steps-to-resolve.
Abstract
When language model agents tackle complex software engineering tasks, they often degrade over long trajectories, which we define as *agent drift*. We focus on two recurring failure modes *overthinking* and *overacting*, i.e., where the agent repeatedly reasons over information it already has, and where it issues tool calls without integrating recent observations or acquiring new evidence. In this paper, we introduce TACT (Think-Act Calibration via activation Steering), to detect and mitigate agent drift in the residual stream before it surfaces as a behavioral failure. In specific, we label trajectory steps as overthinking, overacting, or calibrated, and find that their hidden states can separate linearly along two *drift axes*, pointing from calibrated behavior toward each failure mode (AUC 0.9). To mitigate agent drift, we project each step's activation onto these axes at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
