TL;DR
This paper presents a practical, industry-oriented approach to automated code review that leverages advanced context extraction, multi-role LLMs, and filtering techniques to improve defect detection and reduce false alarms in real-world C++ codebases.
Contribution
It introduces a comprehensive automation pipeline with novel algorithms and prompt designs, addressing key challenges in defect-focused code review for large-scale industrial applications.
Findings
Achieved 2x improvement over standard LLMs in defect detection.
Real-world validation on industry C++ codebases with hundreds of thousands of lines.
Framework design is language-agnostic, enabling broader applicability.
Abstract
The complexity of code reviews has driven efforts to automate review comments, but prior approaches oversimplify this task by treating it as snippet-level code-to-text generation and relying on text similarity metrics like BLEU for evaluation. These methods overlook repository context, real-world merge request evaluation, and defect detection, limiting their practicality. To address these issues, we explore the full automation pipeline within the online recommendation service of a company with nearly 400 million daily active users, analyzing industry-grade C++ codebases comprising hundreds of thousands of lines of code. We identify four key challenges: 1) capturing relevant context, 2) improving key bug inclusion (KBI), 3) reducing false alarm rates (FAR), and 4) integrating human workflows. To tackle these, we propose 1) code slicing algorithms for context extraction, 2) a multi-role…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Methodstravel james · Focus
