What do pre-trained code models know about code?

Anjan Karmakar; Romain Robbes

arXiv:2108.11308·cs.SE·August 26, 2021

What do pre-trained code models know about code?

Anjan Karmakar, Romain Robbes

PDF

1 Repo

TL;DR

This paper investigates what pre-trained code models understand about source code by using diagnostic probing tasks to analyze their knowledge of code properties across different models.

Contribution

It introduces four probing tasks to evaluate the understanding of code in various pre-trained models and compares their capabilities across different layers and properties.

Findings

01

GraphCodeBERT performs most consistently overall.

02

BERT performs surprisingly well on some code tasks.

03

Probing reveals model deficiencies and layer-specific information.

Abstract

Pre-trained models of code built on the transformer architecture have performed well on software engineering (SE) tasks such as predictive code generation, code summarization, among others. However, whether the vector representations from these pre-trained models comprehensively encode characteristics of source code well enough to be applicable to a broad spectrum of downstream tasks remains an open question. One way to investigate this is with diagnostic tasks called probes. In this paper, we construct four probing tasks (probing for surface-level, syntactic, structural, and semantic information) for pre-trained code models. We show how probes can be used to identify whether models are deficient in (understanding) certain code properties, characterize different model layers, and get insight into the model sample-efficiency. We probe four models that vary in their expected knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

giganticode/probes
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Attention Dropout · Dense Connections · Dropout · Weight Decay · Residual Connection · Multi-Head Attention · Adam · Softmax