LIDL: LLM Integration Defect Localization via Knowledge Graph-Enhanced Multi-Agent Analysis
Gou Tan, Zilong He, Min Li, Pengfei Chen, Jieke Shi, Zhensu Sun, Ting Zhang, Danwen Chen, Lwin Khin Shar, Chuanfu Zhang, David Lo

TL;DR
LIDL is a novel multi-agent framework that effectively localizes defects in LLM-integrated software by leveraging knowledge graphs, multi-source error evidence, and counterfactual reasoning, significantly outperforming existing methods.
Contribution
This paper introduces LIDL, a multi-agent defect localization approach specifically designed for LLM-integrated software, addressing the limitations of existing techniques in handling cross-layer dependencies and semantic reasoning.
Findings
LIDL achieves a Top-3 accuracy of 0.64, outperforming baselines.
LIDL reduces defect localization cost by 92.5%.
LIDL outperforms five state-of-the-art baselines across all metrics.
Abstract
LLM-integrated software, which embeds or interacts with large language models (LLMs) as functional components, exhibits probabilistic and context-dependent behaviors that fundamentally differ from those of traditional software. This shift introduces a new category of integration defects that arise not only from code errors but also from misaligned interactions among LLM-specific artifacts, including prompts, API calls, configurations, and model outputs. However, existing defect localization techniques are ineffective at identifying these LLM-specific integration defects because they fail to capture cross-layer dependencies across heterogeneous artifacts, cannot exploit incomplete or misleading error traces, and lack semantic reasoning capabilities for identifying root causes. To address these challenges, we propose LIDL, a multi-agent framework for defect localization in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques
