LLM-Based Automated Diagnosis Of Integration Test Failures At Google
Celal Ziftci, Ray Liu, Spencer Greene, Livio Dalloro

TL;DR
Auto-Diagnose leverages large language models to efficiently analyze and diagnose integration test failures at Google, significantly reducing diagnosis time and improving developer workflow integration.
Contribution
The paper introduces Auto-Diagnose, an LLM-based tool that automates root cause analysis of integration test failures, integrated into Google's internal review system, demonstrating high accuracy and positive user feedback.
Findings
90.14% accuracy in diagnosing root causes
Used across 52,635 failing tests at Google
Only 5.8% of cases found 'Not helpful'
Abstract
Integration testing is critical for the quality and reliability of complex software systems. However, diagnosing their failures presents significant challenges due to the massive volume, unstructured nature, and heterogeneity of logs they generate. These result in a high cognitive load, low signal-to-noise ratio, and make diagnosis difficult and time-consuming. Developers complain about these difficulties consistently and report spending substantially more time diagnosing integration test failures compared to unit test failures. To address these shortcomings, we introduce Auto-Diagnose, a novel diagnosis tool that leverages LLMs to help developers efficiently determine the root cause of integration test failures. Auto-Diagnose analyzes failure logs, produces concise summaries with the most relevant log lines, and is integrated into Critique, Google's internal code review system,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
