Evaluating Large Language Models in Vulnerability Detection Under   Variable Context Windows

Jie Lin; David Mohaisen

arXiv:2502.00064·cs.CR·February 4, 2025

Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows

Jie Lin, David Mohaisen

PDF

Open Access

TL;DR

This paper evaluates how the length of tokenized Java code affects the accuracy of ten major large language models in vulnerability detection, highlighting robustness in some models and suggesting preprocessing techniques for improvement.

Contribution

It provides a comparative analysis of LLM performance based on input length and offers recommendations for future model development and preprocessing strategies.

Findings

01

GPT-4, Mistral, and Mixtral are robust to input length variations.

02

Other models show performance degradation with longer tokenized code.

03

Preprocessing techniques can improve vulnerability detection accuracy.

Abstract

This study examines the impact of tokenized Java code length on the accuracy and explicitness of ten major LLMs in vulnerability detection. Using chi-square tests and known ground truth, we found inconsistencies across models: some, like GPT-4, Mistral, and Mixtral, showed robustness, while others exhibited a significant link between tokenized length and performance. We recommend future LLM development focus on minimizing the influence of input length for better vulnerability detection. Additionally, preprocessing techniques that reduce token count while preserving code structure could enhance LLM accuracy and explicitness in these tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Network Security and Intrusion Detection · Topic Modeling

MethodsAttention Is All You Need · Label Smoothing · Layer Normalization · Linear Layer · Byte Pair Encoding · Dense Connections · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam