Towards Effective Complementary Security Analysis using Large Language Models
Jonas Wagner, Simon M\"uller, Christian N\"ather, Jan-Philipp Stegh\"ofer, Andreas Both

TL;DR
This paper explores using Large Language Models with advanced prompting techniques to improve security analysis by reducing false positives from static analysis tools, demonstrating promising results in both benchmark and real-world datasets.
Contribution
It introduces a novel approach leveraging LLMs and advanced prompting to effectively identify false positives in security testing, enhancing automation and accuracy.
Findings
LLMs with Chain-of-Thought and Self-Consistency significantly reduce false positives.
Some LLMs identified approximately 62.5% of FPs in benchmark datasets.
Combining multiple LLMs increases FP detection to about 78.9%.
Abstract
A key challenge in security analysis is the manual evaluation of potential security weaknesses generated by static application security testing (SAST) tools. Numerous false positives (FPs) in these reports reduce the effectiveness of security analysis. We propose using Large Language Models (LLMs) to improve the assessment of SAST findings. We investigate the ability of LLMs to reduce FPs while trying to maintain a perfect true positive rate, using datasets extracted from the OWASP Benchmark (v1.2) and a real-world software project. Our results indicate that advanced prompting techniques, such as Chain-of-Thought and Self-Consistency, substantially improve FP detection. Notably, some LLMs identified approximately 62.5% of FPs in the OWASP Benchmark dataset without missing genuine weaknesses. Combining detections from different LLMs would increase this FP detection to approximately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
