Model See, Model Do? Exposure-Aware Evaluation of Bug-vs-Fix Preference in Code LLMs

Ali Al-Kaswan; Claudio Spiess; Prem Devanbu; Arie van Deursen; Maliheh Izadi

arXiv:2601.10496·cs.SE·January 16, 2026

Model See, Model Do? Exposure-Aware Evaluation of Bug-vs-Fix Preference in Code LLMs

Ali Al-Kaswan, Claudio Spiess, Prem Devanbu, Arie van Deursen, Maliheh Izadi

PDF

Open Access

TL;DR

This paper introduces an exposure-aware evaluation framework for code LLMs to understand how prior exposure to buggy or fixed code influences their preferences, revealing biases and propagation risks of memorized errors.

Contribution

It presents a novel exposure-aware evaluation method using Data Portraits and the ManySStuBs4J benchmark to analyze model biases towards bugs or fixes based on training data exposure.

Findings

01

Models reproduce bugs more often than fixes.

02

Likelihood metrics favor fixed code regardless of exposure.

03

Exposure influences model bias and propagation of errors.

Abstract

Large language models are increasingly used for code generation and debugging, but their outputs can still contain bugs, that originate from training data. Distinguishing whether an LLM prefers correct code, or a familiar incorrect version might be influenced by what it's been exposed to during training. We introduce an exposure-aware evaluation framework that quantifies how prior exposure to buggy versus fixed code influences a model's preference. Using the ManySStuBs4J benchmark, we apply Data Portraits for membership testing on the Stack-V2 corpus to estimate whether each buggy and fixed variant was seen during training. We then stratify examples by exposure and compare model preference using code completion as well as multiple likelihood-based scoring metrics We find that most examples (67%) have neither variant in the training data, and when only one is present, fixes are more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Topic Modeling