From LLMs to Agents: A Comparative Evaluation of LLMs and LLM-based Agents in Security Patch Detection
Junxiao Han, Zheng Yu, Lingfeng Bao, Jiakun Liu, Yao Wan, Jianwei Yin, Shuiguang Deng, and Song Han

TL;DR
This paper systematically evaluates LLMs and LLM-based agents for security patch detection, revealing that data augmentation improves accuracy and agents reduce false positives, offering insights into their practical security applications.
Contribution
It provides a comprehensive comparison of LLM-based methods and agents for security patch detection, highlighting their strengths and limitations in reducing false positives and improving accuracy.
Findings
Data-Aug LLM achieves the best overall performance.
ReAct Agent has the lowest false positive rate.
Baseline methods have higher false positive rates despite accuracy.
Abstract
The widespread adoption of open-source software (OSS) has accelerated software innovation but also increased security risks due to the rapid propagation of vulnerabilities and silent patch releases. In recent years, large language models (LLMs) and LLM-based agents have demonstrated remarkable capabilities in various software engineering (SE) tasks, enabling them to effectively address software security challenges such as vulnerability detection. However, systematic evaluation of the capabilities of LLMs and LLM-based agents in security patch detection remains limited. To bridge this gap, we conduct a comprehensive evaluation of the performance of LLMs and LLM-based agents for security patch detection. Specifically, we investigate three methods: Plain LLM (a single LLM with a system prompt), Data-Aug LLM (data augmentation based on the Plain LLM), and the ReAct Agent (leveraging the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Information and Cyber Security · Web Application Security Vulnerabilities
