Reducing False Positives in Static Bug Detection with LLMs: An Empirical Study in Industry

Xueying Du; Jiayi Feng; Yi Zou; Wei Xu; Jie Ma; Wei Zhang; Sisi Liu; Xin Peng; Yiling Lou

arXiv:2601.18844·cs.SE·January 28, 2026

Reducing False Positives in Static Bug Detection with LLMs: An Empirical Study in Industry

Xueying Du, Jiayi Feng, Yi Zou, Wei Xu, Jie Ma, Wei Zhang, Sisi Liu, Xin Peng, Yiling Lou

PDF

Open Access

TL;DR

This study empirically evaluates the effectiveness of large language models in reducing false positives generated by static analysis tools in industrial software, demonstrating significant improvements and cost savings.

Contribution

First comprehensive empirical analysis of LLM-based false alarm reduction techniques in an industrial setting, specifically at Tencent, with real-world data and developer insights.

Findings

01

LLMs can eliminate 94-98% of false positives in industrial static analysis.

02

LLM-based methods are highly cost-effective, reducing manual inspection time significantly.

03

False positives in industrial static analysis demand substantial manual effort, which LLMs can substantially reduce.

Abstract

Static analysis tools (SATs) are widely adopted in both academia and industry for improving software quality, yet their practical use is often hindered by high false positive rates, especially in large-scale enterprise systems. These false alarms demand substantial manual inspection, creating severe inefficiencies in industrial code review. While recent work has demonstrated the potential of large language models (LLMs) for false alarm reduction on open-source benchmarks, their effectiveness in real-world enterprise settings remains unclear. To bridge this gap, we conduct the first comprehensive empirical study of diverse LLM-based false alarm reduction techniques in an industrial context at Tencent, one of the largest IT companies in China. Using data from Tencent's enterprise-customized SAT on its large-scale Advertising and Marketing Services software, we construct a dataset of 433…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research