Sifting the Noise: A Comparative Study of LLM Agents in Vulnerability False Positive Filtering

Yunpeng Xiong; Ting Zhang

arXiv:2601.22952·cs.SE·February 2, 2026

Sifting the Noise: A Comparative Study of LLM Agents in Vulnerability False Positive Filtering

Yunpeng Xiong, Ting Zhang

PDF

Open Access

TL;DR

This study compares three LLM-based agent frameworks for filtering false positives in static security testing, showing they can significantly reduce noise but with varying effectiveness depending on model strength and vulnerability type.

Contribution

It provides a comprehensive comparison of LLM agent architectures for vulnerability false positive filtering, highlighting their strengths, limitations, and cost considerations.

Findings

01

LLM agents can reduce false positives from over 92% to as low as 6.3%.

02

Performance varies significantly with backbone model strength and vulnerability type.

03

Aggressive false positive reduction may suppress true vulnerabilities.

Abstract

Static Application Security Testing (SAST) tools are essential for identifying software vulnerabilities, but they often produce a high volume of false positives (FPs), imposing a substantial manual triage burden on developers. Recent advances in Large Language Model (LLM) agents offer a promising direction by enabling iterative reasoning, tool use, and environment interaction to refine SAST alerts. However, the comparative effectiveness of different LLM-based agent architectures for FP filtering remains poorly understood. In this paper, we present a comparative study of three state-of-the-art LLM-based agent frameworks, i.e., Aider, OpenHands, and SWE-agent, for vulnerability FP filtering. We evaluate these frameworks using the vulnerabilities from the OWASP Benchmark and real-world open-source Java projects. The experimental results show that LLM-based agents can remove the majority of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Software Testing and Debugging Techniques · Software Engineering Research