TL;DR
This paper investigates what pre-trained code models understand about source code by using diagnostic probing tasks to analyze their knowledge of code properties across different models.
Contribution
It introduces four probing tasks to evaluate the understanding of code in various pre-trained models and compares their capabilities across different layers and properties.
Findings
GraphCodeBERT performs most consistently overall.
BERT performs surprisingly well on some code tasks.
Probing reveals model deficiencies and layer-specific information.
Abstract
Pre-trained models of code built on the transformer architecture have performed well on software engineering (SE) tasks such as predictive code generation, code summarization, among others. However, whether the vector representations from these pre-trained models comprehensively encode characteristics of source code well enough to be applicable to a broad spectrum of downstream tasks remains an open question. One way to investigate this is with diagnostic tasks called probes. In this paper, we construct four probing tasks (probing for surface-level, syntactic, structural, and semantic information) for pre-trained code models. We show how probes can be used to identify whether models are deficient in (understanding) certain code properties, characterize different model layers, and get insight into the model sample-efficiency. We probe four models that vary in their expected knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Attention Dropout · Dense Connections · Dropout · Weight Decay · Residual Connection · Multi-Head Attention · Adam · Softmax
