A Systematic Study of LLM-Based Architectures for Automated Patching
Qingxiao Xu, Ze Sheng, Zhicheng Chen, Jeff Huang

TL;DR
This study systematically compares four LLM-based automated patching architectures, revealing that design choices significantly impact effectiveness, efficiency, and robustness, with general-purpose code agents showing the best overall performance.
Contribution
It provides a controlled evaluation of different LLM patching architectures using a unified benchmark, highlighting the importance of architectural design over model capability.
Findings
Multi-agent systems improve generalization but have higher overhead.
General-purpose code agents outperform other architectures overall.
Fixed workflows are efficient but less robust.
Abstract
Large language models (LLMs) have shown promise for automated patching, but their effectiveness depends strongly on how they are integrated into patching systems. While prior work explores prompting strategies and individual agent designs, the field lacks a systematic comparison of patching architectures. In this paper, we present a controlled evaluation of four LLM-based patching paradigms -- fixed workflow, single-agent system, multi-agent system, and general-purpose code agents -- using a unified benchmark and evaluation framework. We analyze patch correctness, failure modes, token usage, and execution time across real-world vulnerability tasks. Our results reveal clear architectural trade-offs: fixed workflows are efficient but brittle, single-agent systems balance flexibility and cost, and multi-agent designs improve generalization at the expense of substantially higher overhead…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Software System Performance and Reliability · Software Reliability and Analysis Research
