Inside Out: Uncovering How Comment Internalization Steers LLMs for Better or Worse

Aaron Imani; Mohammad Moshirpour; Iftekhar Ahmed

arXiv:2512.16790·cs.SE·December 19, 2025

Inside Out: Uncovering How Comment Internalization Steers LLMs for Better or Worse

Aaron Imani, Mohammad Moshirpour, Iftekhar Ahmed

PDF

Open Access

TL;DR

This study investigates how Large Language Models internally represent comments in source code and how manipulating these representations affects their performance on software engineering tasks.

Contribution

It introduces a concept-level interpretability approach using Concept Activation Vectors to analyze comment internalization in LLMs for SE tasks, revealing task-dependent effects.

Findings

01

LLMs internalize comments as distinct latent concepts

02

Different comment types are internally differentiated by LLMs

03

Manipulating comment concepts significantly impacts model performance

Abstract

While comments are non-functional elements of source code, Large Language Models (LLM) frequently rely on them to perform Software Engineering (SE) tasks. Yet, where in the model this reliance resides, and how it affects performance, remains poorly understood. We present the first concept-level interpretability study of LLMs in SE, analyzing three tasks - code completion, translation, and refinement - through the lens of internal comment representation. Using Concept Activation Vectors (CAV), we show that LLMs not only internalize comments as distinct latent concepts but also differentiate between subtypes such as Javadocs, inline, and multiline comments. By systematically activating and deactivating these concepts in the LLMs' embedding space, we observed significant, model-specific, and task-dependent shifts in performance ranging from -90% to +67%. Finally, we conducted a controlled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Topic Modeling