Feedback-Normalized Developer Memory for Reinforcement-Learning Coding Agents: A Safety-Gated MCP Architecture
Mehmet Iscan

TL;DR
This paper introduces RL Developer Memory, a novel architecture for reinforcement-learning coding agents that enhances memory management and safety through a Model Context Protocol, with evaluation on a benchmark of RL algorithm bugs.
Contribution
It presents a new memory architecture that treats memory selection as a logged decision process, integrating safety gates and telemetry for improved RL coding agent reliability.
Findings
Achieved 80% expected-decision accuracy on a 200-case benchmark.
Full shadow/OPE configuration suppresses hard negatives effectively.
Static validation passed all checks, dynamic tests passed most cases.
Abstract
Large language model (LLM) coding agents increasingly operate over repositories, terminals, tests, and execution traces across long software-engineering episodes. Persistent memory is useful, but static vector stores or generic retrieval-augmented generation (RAG) are insufficient for reinforcement-learning (RL) code development, where small details can alter Bellman targets, terminal masks, gradient flow, or validation claims. This paper presents RL Developer Memory, a local-first, Model Context Protocol (MCP)-native developer-memory architecture for RL coding agents. It treats memory selection as a logged contextual decision process: issue_match ranks candidates and records telemetry, issue_feedback maps raw labels to bounded rewards, and issue_record_resolution links verified resolutions to earlier retrieval events. A deterministic ranker remains deployed, while a contextual-bandit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
