The Observability Gap: Why Output-Level Human Feedback Fails for LLM Coding Agents

Yinghao Wang; Cheng Wang

arXiv:2603.26942·cs.HC·March 31, 2026

The Observability Gap: Why Output-Level Human Feedback Fails for LLM Coding Agents

Yinghao Wang, Cheng Wang

PDF

TL;DR

This paper investigates the limitations of output-only human feedback in guiding LLM-based coding agents, revealing a structural observability gap that hampers effective learning and proposing the need for intermediate feedback mechanisms.

Contribution

The study identifies a fundamental observability gap in output-only feedback for LLM coding agents and demonstrates that adding minimal code-level information can restore effective learning.

Findings

01

Output-only feedback fails to guide agents to full success in complex tasks.

02

A structural observability gap causes persistent failure modes.

03

Adding minimal code-level feedback restores convergence.

Abstract

Large language model (LLM) multi-agent coding systems typically fix agent capabilities at design time. We study an alternative setting, earned autonomy, in which a coding agent starts with zero pre-defined functions and incrementally builds a reusable function library through lightweight human feedback on visual output alone. We evaluate this setup in a Blender-based 3D scene generation task requiring both spatial reasoning and programmatic geometric control. Although the agent rediscovered core utility functions comparable to a human reference implementation, it achieved 0% full-scene success under output-only feedback across multiple instruction granularities, where success required satisfying object completeness, ground contact, collision avoidance, and scale plausibility simultaneously. Our analysis identifies a structural observability gap: bugs originate in code logic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.