Vulnerability Detection with Interprocedural Context in Multiple Languages: Assessing Effectiveness and Cost of Modern LLMs

Kevin Lira; Baldoino Fonseca; Davy Ba\'ia; M\'arcio Ribeiro; Wesley K. G. Assun\c{c}\~ao

arXiv:2604.08417·cs.SE·April 10, 2026

Vulnerability Detection with Interprocedural Context in Multiple Languages: Assessing Effectiveness and Cost of Modern LLMs

Kevin Lira, Baldoino Fonseca, Davy Ba\'ia, M\'arcio Ribeiro, Wesley K. G. Assun\c{c}\~ao

PDF

TL;DR

This study evaluates the effectiveness, cost, and explanation quality of four modern LLMs in detecting interprocedural vulnerabilities across C, C++, and Python, highlighting Gemini 3 Flash's cost-effectiveness and Claude Haiku 4.5's high accuracy.

Contribution

It systematically assesses the performance of recent LLMs in interprocedural vulnerability detection across multiple languages, emphasizing the importance of context and providing practical insights.

Findings

01

Gemini 3 Flash achieves F1 >= 0.978 for C vulnerabilities at low cost.

02

Claude Haiku 4.5 correctly explains vulnerabilities in 93.6% of cases.

03

Interprocedural context improves detection effectiveness across languages.

Abstract

Large Language Models (LLMs) have been a promising way for automated vulnerability detection. However, most prior studies have explored the use of LLMs to detect vulnerabilities only within single functions, disregarding those related to interprocedural dependencies. These studies overlook vulnerabilities that arise from data and control flows that span multiple functions. Thus, leveraging the context provided by callers and callees may help identify vulnerabilities. This study empirically investigates the effectiveness of detection, the inference cost, and the quality of explanations of four modern LLMs (Claude Haiku 4.5, GPT-4.1 Mini, GPT-5 Mini, and Gemini 3 Flash) in detecting vulnerabilities related to interprocedural dependencies. To do that, we conducted an empirical study on 509 vulnerabilities from the ReposVul dataset, systematically varying the level of interprocedural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.