Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review
Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos, Diomidis Spinellis

TL;DR
This study reveals how Large Language Models used in automated code review are susceptible to framing effects and contextual bias injection, which can be exploited to reintroduce vulnerabilities in real-world pipelines.
Contribution
The paper demonstrates the prevalence of framing effects in LLM-based vulnerability detection and introduces a novel iterative attack method that achieves complete success in biasing security judgments.
Findings
Framing effects are widespread in LLM-based vulnerability detection.
Template-based attacks are generally ineffective and may backfire.
Iterative refinement attacks can achieve 100% success in biasing security judgments.
Abstract
Automated Code Review (ACR) systems integrating Large Language Models (LLMs) are increasingly adopted in software development workflows, ranging from interactive assistants to autonomous agents in CI/CD pipelines. In this paper, we study how LLM-based vulnerability detection in ACR is affected by the framing effect: the tendency to let the presentation of information override its semantic content in forming judgments. We examine whether adversaries can exploit this through contextual-bias injection: crafting PR metadata to bias ACR security judgments as a supply-chain attack vector against real-world ACR pipelines. To this end, we first conduct a large-scale exploratory study across 6 LLMs under five framing conditions, establishing the framing effect as a systematic and widespread phenomenon in LLM-based vulnerability detection, with bug-free framing producing the strongest effect.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
