From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection

Youpeng Li; Fuxun Yu; Xinda Wang

arXiv:2602.14012·cs.CR·May 5, 2026

From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection

Youpeng Li, Fuxun Yu, Xinda Wang

PDF

TL;DR

This paper systematically investigates post-training techniques for LLM-based vulnerability detection, demonstrating on-policy RL with GRPO outperforms other methods and providing new guidelines for data curation, training stages, rewards, and evaluation.

Contribution

It is the first comprehensive study applying post-training pipelines to vulnerability detection, revealing effective strategies and insights beyond common practices.

Findings

01

On-policy RL with GRPO outperforms SFT and preference optimization.

02

Rejection sampling-based SFT is more effective than rationalization supervision.

03

Root-cause analysis-based evaluation offers more robust assessment.

Abstract

The integration of LLMs into vulnerability detection (VD) has shifted the field toward more interpretable and context-aware analysis. While post-training techniques have shown promise in general coding tasks, their systematic application to VD remains underexplored. In this paper, we present the first comprehensive investigation into the post-training pipeline for LLM-based VD, demonstrating that on-policy RL with GRPO consistently outperforms SFT, off-policy preference optimization methods, and specialized VD LLMs. Our study further reveals VD-specific post-training guidelines and insights beyond common practices: (1) For data curation, contrary to the widespread use of rationalization-based supervision in prior VD work, SFT based on rejection sampling proves more effective, as rationalization can introduce hallucinations; in RL training, the inherently skewed difficulty distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.