From LLMs to Agents: A Comparative Evaluation of LLMs and LLM-based Agents in Security Patch Detection

Junxiao Han; Zheng Yu; Lingfeng Bao; Jiakun Liu; Yao Wan; Jianwei Yin; Shuiguang Deng; and Song Han

arXiv:2511.08060·cs.CR·November 12, 2025

From LLMs to Agents: A Comparative Evaluation of LLMs and LLM-based Agents in Security Patch Detection

Junxiao Han, Zheng Yu, Lingfeng Bao, Jiakun Liu, Yao Wan, Jianwei Yin, Shuiguang Deng, and Song Han

PDF

Open Access

TL;DR

This paper systematically evaluates LLMs and LLM-based agents for security patch detection, revealing that data augmentation improves accuracy and agents reduce false positives, offering insights into their practical security applications.

Contribution

It provides a comprehensive comparison of LLM-based methods and agents for security patch detection, highlighting their strengths and limitations in reducing false positives and improving accuracy.

Findings

01

Data-Aug LLM achieves the best overall performance.

02

ReAct Agent has the lowest false positive rate.

03

Baseline methods have higher false positive rates despite accuracy.

Abstract

The widespread adoption of open-source software (OSS) has accelerated software innovation but also increased security risks due to the rapid propagation of vulnerabilities and silent patch releases. In recent years, large language models (LLMs) and LLM-based agents have demonstrated remarkable capabilities in various software engineering (SE) tasks, enabling them to effectively address software security challenges such as vulnerability detection. However, systematic evaluation of the capabilities of LLMs and LLM-based agents in security patch detection remains limited. To bridge this gap, we conduct a comprehensive evaluation of the performance of LLMs and LLM-based agents for security patch detection. Specifically, we investigate three methods: Plain LLM (a single LLM with a system prompt), Data-Aug LLM (data augmentation based on the Plain LLM), and the ReAct Agent (leveraging the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Information and Cyber Security · Web Application Security Vulnerabilities