Think Broad, Act Narrow: CWE Identification with Multi-Agent Large Language Models
Mohammed Sayagh, Mohammad Ghafari

TL;DR
This paper introduces a multi-agent large language model approach for more accurate vulnerability and CWE identification in functions, addressing limitations of existing methods by incorporating contextual analysis and multi-step decision making.
Contribution
It presents a novel multi-agent LLM framework that improves CWE detection accuracy by combining exhaustive search, external context analysis, and informed decision making.
Findings
40.9% accuracy in CWE identification on PrimeVul dataset
Significant reduction in false positives from 6-9 to 1-2 CWEs
Correctly identified true CWE in 9 out of 10 synthetic cases
Abstract
Machine learning and Large language models (LLMs) for vulnerability detection has received significant attention in recent years. Unfortunately, state-of-the-art techniques show that LLMs are unsuccessful in even distinguishing the vulnerable function from its benign counterpart, due to three main problems: Vulnerability detection requires deep analysis, which LLMs often struggle with when making a one-shot prediction. Existing techniques typically perform function-level analysis, whereas effective vulnerability detection requires contextual information beyond the function scope. The focus on binary classification can result in identifying a vulnerability but associating it with the wrong security weaknesses (CWE), which may mislead developers. We propose a novel multi-agent LLM approach to address the challenges of identifying CWEs. This approach consists of three steps: (1) a team of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Adversarial Robustness in Machine Learning
