Wink: Recovering from Misbehaviors in Coding Agents
Rahul Nanda, Chandra Maddila, Smriti Jha, Euna Mehnaz Khan, Matteo Paltenghi, Satish Chandra

TL;DR
This paper introduces Wink, a system that automatically detects and corrects misbehaviors in autonomous coding agents powered by large language models, significantly improving their reliability and reducing manual interventions.
Contribution
We propose a taxonomy of agent misbehaviors and develop Wink, a lightweight, asynchronous self-intervention system that effectively recovers from these issues at scale.
Findings
Wink resolves 90% of single-intervention misbehaviors
Reduces Tool Call Failures, Tokens per Session, and Engineer Interventions
Successfully deployed and tested on over 10,000 real-world trajectories
Abstract
Autonomous coding agents, powered by large language models (LLMs), are increasingly being adopted in the software industry to automate complex engineering tasks. However, these agents are prone to a wide range of misbehaviors, such as deviating from the user's instructions, getting stuck in repetitive loops, or failing to use tools correctly. These failures disrupt the development workflow and often require resource-intensive manual intervention. In this paper, we present a system for automatically recovering from agentic misbehaviors at scale. We first introduce a taxonomy of misbehaviors grounded in an analysis of production traffic, identifying three primary categories: Specification Drift, Reasoning Problems, and Tool Call Failures, which we find occur in about 30% of all agent trajectories. To address these issues, we developed a lightweight, asynchronous self-intervention system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · AI-based Problem Solving and Planning · Topic Modeling
