Toward a Theory of Causation for Interpreting Neural Code Models

David N. Palacio; Alejandro Velasco; Nathan Cooper; Alvaro; Rodriguez; Kevin Moran; Denys Poshyvanyk

arXiv:2302.03788·cs.SE·March 29, 2024·6 cites

Toward a Theory of Causation for Interpreting Neural Code Models

David N. Palacio, Alejandro Velasco, Nathan Cooper, Alvaro, Rodriguez, Kevin Moran, Denys Poshyvanyk

PDF

Open Access

TL;DR

This paper introduces $do_{code}$, a causal inference-based interpretability method for neural code models, revealing their sensitivity to code syntax and potential biases, thus advancing understanding of their decision-making processes.

Contribution

The paper presents a novel $do_{code}$ interpretability framework tailored for neural code models, grounded in causal inference, to explain model predictions and identify biases.

Findings

01

NCMs are sensitive to code syntax changes

02

Most models predict code tokens related to code blocks with less bias

03

$do_{code}$ helps detect confounding biases in NCMs

Abstract

Neural Language Models of Code, or Neural Code Models (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions. To this end, this paper introduces $d o_{co d e}$ , a post hoc interpretability method specific to NCMs that is capable of explaining model predictions. $d o_{co d e}$ is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of $d o_{co d e}$ are extensible to exploring different model properties, we provide a concrete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques