Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents
Mehmet Iscan

TL;DR
This paper introduces RSCB-MC, a risk-sensitive memory controller for LLM-based coding agents that selectively retrieves and uses external memory to improve debugging safety and effectiveness.
Contribution
It reframes memory retrieval as a risk-sensitive control problem and develops a novel controller that balances reuse success with safety considerations.
Findings
Achieves 62.5% success rate with 0.0% false positives in offline replay.
Reaches 60.5% proxy success with 0.0% false positives in hot-path validation.
Effectively balances memory reuse and safety in coding agent debugging.
Abstract
Large language model (LLM)-based coding agents increasingly rely on external memory to reuse prior debugging experience, repair traces, and repository-local operational knowledge. However, retrieved memory is useful only when the current failure is genuinely compatible with a previous one; superficial similarity in stack traces, terminal errors, paths, or configuration symptoms can lead to unsafe memory injection. This paper reframes issue-memory use as a selective, risk-sensitive control problem rather than a pure top-k retrieval problem. We introduce RSCB-MC, a risk-sensitive contextual bandit memory controller that decides whether an agent should use no memory, inject the top resolution, summarize multiple candidates, perform high-precision or high-recall retrieval, abstain, or ask for feedback. The system stores reusable issue knowledge through a pattern-variant-episode schema and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
