Toward a Theory of Causation for Interpreting Neural Code Models
David N. Palacio, Alejandro Velasco, Nathan Cooper, Alvaro, Rodriguez, Kevin Moran, Denys Poshyvanyk

TL;DR
This paper introduces $do_{code}$, a causal inference-based interpretability method for neural code models, revealing their sensitivity to code syntax and potential biases, thus advancing understanding of their decision-making processes.
Contribution
The paper presents a novel $do_{code}$ interpretability framework tailored for neural code models, grounded in causal inference, to explain model predictions and identify biases.
Findings
NCMs are sensitive to code syntax changes
Most models predict code tokens related to code blocks with less bias
$do_{code}$ helps detect confounding biases in NCMs
Abstract
Neural Language Models of Code, or Neural Code Models (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions. To this end, this paper introduces , a post hoc interpretability method specific to NCMs that is capable of explaining model predictions. is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of are extensible to exploring different model properties, we provide a concrete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
