Iterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt Engineering Quality Assurance
Elias Calboreanu

TL;DR
This paper presents an empirical case study of iterative, agent-driven auditing of prompt specifications in a large LLM-managed multi-agent system, revealing defect patterns and convergence behavior.
Contribution
It introduces a structured audit protocol, defect taxonomy, and insights into iterative auditing dynamics in complex prompt engineering environments.
Findings
51 prompt-specification defects surfaced across nine audit rounds.
Non-monotonic convergence observed due to cascading edits and scope expansion.
Single-file review missed defect classes found in expanded-scope rounds.
Abstract
Prompt specifications for multi-agent large language model (LLM) systems carry data contracts and integration logic across many interdependent files but are rarely subjected to structured-inspection rigor. This paper reports a single-system empirical case study of iterative, agent-driven auditing applied to AEGIS (Autonomous Engineering Governance and Intelligence System), a production seven-lane orchestration pipeline whose prompt-specification surface comprises approximately 7150 lines: 6907 across seven lane PROMPT.md files and a 245-line shared Ticket Contract. Nine sequential audit rounds, executed by Claude sub-agents using a checklist-driven walkthrough adapted from Weinberg and Freedman, surfaced 51 prompt-specification consistency defects, distinct from the 51 STRIDE-categorized adversarial code findings reported in the companion preprint. Per-round counts were 15, 8, 12, 2, 8,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
