Learning Generalizable Multimodal Representations for Software Vulnerability Detection

Zeming Dong; Yuejun Guo; Qiang Hu; Yao Zhang; Maxime Cordy; Hao Liu; Mike Papadakis; Yongqiang Lyu

arXiv:2604.25711·cs.SE·May 1, 2026

Learning Generalizable Multimodal Representations for Software Vulnerability Detection

Zeming Dong, Yuejun Guo, Qiang Hu, Yao Zhang, Maxime Cordy, Hao Liu, Mike Papadakis, Yongqiang Lyu

PDF

TL;DR

This paper introduces MultiVul, a multimodal contrastive learning framework that aligns code and comment representations to improve software vulnerability detection across large language models.

Contribution

It presents a novel multimodal contrastive approach that leverages code-comment pairs to enhance vulnerability detection and generalization.

Findings

01

Achieves up to 27.07% F1 improvement over prompting-based methods.

02

Outperforms code-only fine-tuning by 13.37% in F1 score.

03

Maintains comparable inference efficiency.

Abstract

Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information embedded in comments and thus limiting their generalization across complex code structures and logical relationships. To address this, we propose MultiVul, a multimodal contrastive framework that aligns code and comment representations through dual similarity learning and consistency regularization, augmented with diverse code-text pairs to improve robustness. Experiments on widely adopted DiverseVul and Devign datasets across four large language models (LLMs) (i.e., DeepSeek-Coder-6.7B, Qwen2.5-Coder-7B, StarCoder2-7B, and CodeLlama-7B) show that MultiVul achieves up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.