ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment

Natchaya Temyingyong; Daman Jain; Neeraj Kumarsahu; Prabhat Kumar; Rachata Phondi; Wachiravit Modecrua; Krittanon Kaewtawee; Krittin Pachtrachai; Touchapon Kraisingkorn

arXiv:2512.24040·cs.AI·January 1, 2026

ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment

Natchaya Temyingyong, Daman Jain, Neeraj Kumarsahu, Prabhat Kumar, Rachata Phondi, Wachiravit Modecrua, Krittanon Kaewtawee, Krittin Pachtrachai, Touchapon Kraisingkorn

PDF

Open Access 1 Models

TL;DR

ROAD introduces a debugging-based optimization framework for LLM agents that improves performance efficiently without relying on large labeled datasets, by transforming failure logs into structured decision protocols.

Contribution

The paper presents ROAD, a novel multi-agent framework that converts failure logs into decision protocols, enabling data-efficient optimization without curated datasets.

Findings

01

Achieved a 5.6% increase in success rate in benchmark tests.

02

Improved search accuracy by 3.8% within three iterations.

03

Enhanced agent performance by approximately 19% on retail reasoning tasks.

Abstract

Automatic Prompt Optimization (APO) has emerged as a critical technique for enhancing Large Language Model (LLM) performance, yet current state-of-the-art methods typically rely on large, labeled gold-standard development sets to compute fitness scores for evolutionary or Reinforcement Learning (RL) approaches. In real-world software engineering, however, such curated datasets are rarely available during the initial cold start of agent development, where engineers instead face messy production logs and evolving failure modes. We present ROAD (Reflective Optimization via Automated Debugging), a novel framework that bypasses the need for refined datasets by treating optimization as a dynamic debugging investigation rather than a stochastic search. Unlike traditional mutation strategies, ROAD utilizes a specialized multi-agent architecture, comprising an Analyzer for root-cause analysis,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
amityco/amity-sigma-thinking-v3r
model· 1.4k dl· ♡ 1
1.4k dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Software Engineering Methodologies · Software Engineering Techniques and Practices