Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization

Marco Simoni; Aleksandar Fontana; Giulio Rossolini; Andrea Saracino

arXiv:2507.03051·cs.CR·July 8, 2025

Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization

Marco Simoni, Aleksandar Fontana, Giulio Rossolini, Andrea Saracino

PDF

TL;DR

This paper investigates how Group Relative Policy Optimization (GRPO), a reinforcement learning technique, can improve the reasoning and detection capabilities of Large Language Models in identifying software vulnerabilities, surpassing standard finetuning methods.

Contribution

It introduces a novel application of GRPO for LLMs in vulnerability detection, redefining reward functions and demonstrating performance and reasoning improvements.

Findings

01

GRPO enhances LLM generalization in vulnerability detection

02

RL-based training improves reasoning abilities of LLMs

03

Performance surpasses standard supervised finetuning

Abstract

Improving and understanding the training dynamics and reasoning of Large Language Models (LLMs) has become essential for their deployment in AI-based security tools, such as software vulnerability detection. In this work, we present an extensive study aimed at advancing recent RL-based finetuning techniques for LLMs in the context of vulnerability detection. We start by highlighting key limitations of commonly adopted LLMs, such as their tendency to over-predict certain types of vulnerabilities while failing to detect others. To address this challenge, we explore the use of Group Relative Policy Optimization (GRPO), a recent policy-gradient method, for guiding LLM behavior through structured, rule-based rewards. We enable its application to the vulnerability detection task by redefining its advantage functions and reward signals using annotations from widely used datasets in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.