Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

Andr\'e Storhaug; Jiamou Sun; Jingyue Li

arXiv:2602.12500·cs.SE·February 16, 2026

Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

Andr\'e Storhaug, Jiamou Sun, Jingyue Li

PDF

Open Access 4 Datasets

TL;DR

Favia is a forensic framework that combines scalable ranking and deep semantic reasoning with LLMs to accurately identify vulnerability-fixing commits in large code repositories, outperforming existing methods.

Contribution

The paper introduces Favia, a novel agent-based framework that improves vulnerability-fix identification by integrating efficient candidate ranking with deep LLM reasoning, addressing limitations of prior approaches.

Findings

01

Favia achieves higher precision-recall trade-offs than baselines.

02

It effectively identifies complex, indirect, and multi-file fixes.

03

Outperforms state-of-the-art methods on large-scale datasets.

Abstract

Identifying vulnerability-fixing commits corresponding to disclosed CVEs is essential for secure software maintenance but remains challenging at scale, as large repositories contain millions of commits of which only a small fraction address security issues. Existing automated approaches, including traditional machine learning techniques and recent large language model (LLM)-based methods, often suffer from poor precision-recall trade-offs. Frequently evaluated on randomly sampled commits, we uncover that they are substantially underestimating real-world difficulty, where candidate commits are already security-relevant and highly similar. We propose Favia, a forensic, agent-based framework for vulnerability-fix identification that combines scalable candidate ranking with deep and iterative semantic reasoning. Favia first employs an efficient ranking stage to narrow the search space of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Software Testing and Debugging Techniques