Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents
Akshay Manglik, Apaar Shanker, Kaustubh Deshpande, Jason Qin, Yash Maurya, Veronica Chatrath, Vijay S. Kalmath, Levi Lentz, Yuan (Emily) Xue

TL;DR
The paper introduces the Insights Generator system, a multi-agent approach for systematic, corpus-level diagnostic insights in LLM agents, improving failure analysis and performance.
Contribution
It formalizes corpus-level trace diagnostics and presents a multi-agent system that generates evidence-backed insights to enhance LLM failure analysis.
Findings
Human experts using IG reports improve scaffold performance by 30.4pp.
Coding agents leveraging IG insights show consistent performance gains.
IG's findings have comparable detection coverage to existing approaches.
Abstract
Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individual traces span tens of thousands of tokens. We formalize the problem of corpus-level trace diagnostics. Given a corpus of execution traces, the goal is to produce grounded natural-language insights that characterize systematic behavioral patterns across trace groups, each linked to supporting evidence. We present the Insights Generator (IG), a multi-agent system that answers diagnostic questions by proposing and testing hypotheses across the trace corpus to produce an evidence-backed insights report. We evaluate IG across qualitative and objective dimensions, spanning rubric-based report assessment and downstream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
