Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents
Ratnadira Widyasari, Martin Weyssow, Ivana Clairine Irsan, Han Wei Ang, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, Hong Jin Kang, David Lo

TL;DR
VulTrial introduces a courtroom-inspired multi-agent framework using GPT-4o to improve vulnerability detection in source code, providing explanations and uncovering zero-day vulnerabilities with high efficacy.
Contribution
The paper presents VulTrial, a novel multi-agent approach with role-specific instruction tuning that significantly enhances vulnerability detection over previous methods.
Findings
Almost doubles the efficacy of prior baselines
Effective across multiple LLMs including open-source models
Generates high-quality explanations and uncovers zero-day vulnerabilities
Abstract
Detecting vulnerabilities in source code remains a critical yet challenging task, especially when benign and vulnerable functions share significant similarities. In this work, we introduce VulTrial, a courtroom-inspired multi-agent framework designed to identify vulnerable code and to provide explanations. It employs four role-specific agents, which are security researcher, code author, moderator, and review board. Using GPT-4o as the base LLM, VulTrial almost doubles the efficacy of prior best-performing baselines. Additionally, we show that role-specific instruction tuning with small quantities of data significantly further boosts VulTrial's efficacy. Our extensive experiments demonstrate the efficacy of VulTrial across different LLMs, including an open-source, in-house-deployable model (LLaMA-3.1-8B), as well as the high quality of its generated explanations and its ability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · {Dispute@FaQ-s}How to file a dispute with Expedia? · Softmax · Cosine Annealing · Attention Dropout · Residual Connection · Linear Layer · Weight Decay · Dropout
