Coding in a Bubble? Evaluating LLMs in Resolving Context Adaptation Bugs During Code Adaptation

Tanghaoran Zhang; Xinjun Mao; Shangwen Wang; Yuxin Zhao; Yao Lu; Zezhou Tang; Wenyu Xu; Longfei Sun; Changrong Xie; Kang Yang; Yue Yu

arXiv:2601.06497·cs.SE·January 13, 2026

Coding in a Bubble? Evaluating LLMs in Resolving Context Adaptation Bugs During Code Adaptation

Tanghaoran Zhang, Xinjun Mao, Shangwen Wang, Yuxin Zhao, Yao Lu, Zezhou Tang, Wenyu Xu, Longfei Sun, Changrong Xie, Kang Yang, Yue Yu

PDF

Open Access

TL;DR

This paper evaluates the ability of large language models to resolve context adaptation bugs in code, revealing significant limitations and proposing a framework to generate and analyze such bugs.

Contribution

The paper introduces CtxBugGen, a novel framework for generating context adaptation bugs to evaluate LLMs, and provides an empirical assessment of their performance in resolving these bugs.

Findings

01

LLMs perform poorly in resolving CtxBugs, with the best achieving only 55.93% Pass@1.

02

Presence of CtxBugs reduces LLMs' code adaptation performance by up to 30%.

03

LLMs often overlook and replicate CtxBugs, indicating a weakness in cross-context reasoning.

Abstract

Code adaptation is a fundamental but challenging task in software development, requiring developers to modify existing code for new contexts. A key challenge is to resolve Context Adaptation Bugs (CtxBugs), which occurs when code correct in its original context violates constraints in the target environment. Unlike isolated bugs, CtxBugs cannot be resolved through local fixes and require cross-context reasoning to identify semantic mismatches. Overlooking them may lead to critical failures in adaptation. Although Large Language Models (LLMs) show great potential in automating code-related tasks, their ability to resolve CtxBugs remains a significant and unexplored obstacle to their practical use in code adaptation. To bridge this gap, we propose CtxBugGen, a novel framework for generating CtxBugs to evaluate LLMs. Its core idea is to leverage LLMs' tendency to generate plausible but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability