Sifting the Noise: A Comparative Study of LLM Agents in Vulnerability False Positive Filtering
Yunpeng Xiong, Ting Zhang

TL;DR
This study compares three LLM-based agent frameworks for filtering false positives in static security testing, showing they can significantly reduce noise but with varying effectiveness depending on model strength and vulnerability type.
Contribution
It provides a comprehensive comparison of LLM agent architectures for vulnerability false positive filtering, highlighting their strengths, limitations, and cost considerations.
Findings
LLM agents can reduce false positives from over 92% to as low as 6.3%.
Performance varies significantly with backbone model strength and vulnerability type.
Aggressive false positive reduction may suppress true vulnerabilities.
Abstract
Static Application Security Testing (SAST) tools are essential for identifying software vulnerabilities, but they often produce a high volume of false positives (FPs), imposing a substantial manual triage burden on developers. Recent advances in Large Language Model (LLM) agents offer a promising direction by enabling iterative reasoning, tool use, and environment interaction to refine SAST alerts. However, the comparative effectiveness of different LLM-based agent architectures for FP filtering remains poorly understood. In this paper, we present a comparative study of three state-of-the-art LLM-based agent frameworks, i.e., Aider, OpenHands, and SWE-agent, for vulnerability FP filtering. We evaluate these frameworks using the vulnerabilities from the OWASP Benchmark and real-world open-source Java projects. The experimental results show that LLM-based agents can remove the majority of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Software Testing and Debugging Techniques · Software Engineering Research
