Inside Out: Uncovering How Comment Internalization Steers LLMs for Better or Worse
Aaron Imani, Mohammad Moshirpour, Iftekhar Ahmed

TL;DR
This study investigates how Large Language Models internally represent comments in source code and how manipulating these representations affects their performance on software engineering tasks.
Contribution
It introduces a concept-level interpretability approach using Concept Activation Vectors to analyze comment internalization in LLMs for SE tasks, revealing task-dependent effects.
Findings
LLMs internalize comments as distinct latent concepts
Different comment types are internally differentiated by LLMs
Manipulating comment concepts significantly impacts model performance
Abstract
While comments are non-functional elements of source code, Large Language Models (LLM) frequently rely on them to perform Software Engineering (SE) tasks. Yet, where in the model this reliance resides, and how it affects performance, remains poorly understood. We present the first concept-level interpretability study of LLMs in SE, analyzing three tasks - code completion, translation, and refinement - through the lens of internal comment representation. Using Concept Activation Vectors (CAV), we show that LLMs not only internalize comments as distinct latent concepts but also differentiate between subtypes such as Javadocs, inline, and multiline comments. By systematically activating and deactivating these concepts in the LLMs' embedding space, we observed significant, model-specific, and task-dependent shifts in performance ranging from -90% to +67%. Finally, we conducted a controlled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Topic Modeling
