Feedback-Normalized Developer Memory for Reinforcement-Learning Coding Agents: A Safety-Gated MCP Architecture

Mehmet Iscan

arXiv:2605.01567·cs.SE·May 5, 2026

Feedback-Normalized Developer Memory for Reinforcement-Learning Coding Agents: A Safety-Gated MCP Architecture

Mehmet Iscan

PDF

TL;DR

This paper introduces RL Developer Memory, a novel architecture for reinforcement-learning coding agents that enhances memory management and safety through a Model Context Protocol, with evaluation on a benchmark of RL algorithm bugs.

Contribution

It presents a new memory architecture that treats memory selection as a logged decision process, integrating safety gates and telemetry for improved RL coding agent reliability.

Findings

01

Achieved 80% expected-decision accuracy on a 200-case benchmark.

02

Full shadow/OPE configuration suppresses hard negatives effectively.

03

Static validation passed all checks, dynamic tests passed most cases.

Abstract

Large language model (LLM) coding agents increasingly operate over repositories, terminals, tests, and execution traces across long software-engineering episodes. Persistent memory is useful, but static vector stores or generic retrieval-augmented generation (RAG) are insufficient for reinforcement-learning (RL) code development, where small details can alter Bellman targets, terminal masks, gradient flow, or validation claims. This paper presents RL Developer Memory, a local-first, Model Context Protocol (MCP)-native developer-memory architecture for RL coding agents. It treats memory selection as a logged contextual decision process: issue_match ranks candidates and records telemetry, issue_feedback maps raw labels to bounded rewards, and issue_record_resolution links verified resolutions to earlier retrieval events. A deterministic ranker remains deployed, while a contextual-bandit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.