SastBench: A Benchmark for Testing Agentic SAST Triage
Jake Feiglin, Guy Dar

TL;DR
SastBench is a new benchmark that evaluates the effectiveness of agents in triaging SAST tool findings, combining real vulnerabilities with false positives to better reflect real-world cybersecurity challenges.
Contribution
The paper introduces SastBench, a novel benchmark for testing SAST triage agents using real CVEs and filtered findings, addressing limitations of existing benchmarks.
Findings
Different agents show varying performance on SastBench.
The benchmark reveals strengths and weaknesses of current SAST triage methods.
Analysis guides future development of more effective triage agents.
Abstract
SAST (Static Application Security Testing) tools are among the most widely used techniques in defensive cybersecurity, employed by commercial and non-commercial organizations to identify potential vulnerabilities in software. Despite their great utility, they generate numerous false positives, requiring costly manual filtering (aka triage). While LLM-powered agents show promise for automating cybersecurity tasks, existing benchmarks fail to emulate real-world SAST finding distributions. We introduce SastBench, a benchmark for evaluating SAST triage agents that combines real CVEs as true positives with filtered SAST tool findings as approximate false positives. SastBench features an agent-agnostic design. We evaluate different agents on the benchmark and present a comparative analysis of their performance, provide a detailed analysis of the dataset, and discuss the implications for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Security and Verification in Computing · Information and Cyber Security
