INSPECT: Intrinsic and Systematic Probing Evaluation for Code Transformers
Anjan Karmakar, Romain Robbes

TL;DR
This paper introduces INSPECT, a probing framework to analyze what pre-trained source code models learn about code characteristics, revealing structural advantages and opportunities for improvement in code understanding.
Contribution
The paper presents a comprehensive probing framework with 15 tasks to evaluate pre-trained code models' understanding of code features, highlighting structural model strengths and gaps.
Findings
Models with structural info like GraphCodeBERT perform better on code characteristics.
BERT is competitive with specialized models on some probing tasks.
Opportunities exist to enhance pre-training for better code understanding.
Abstract
Pre-trained models of source code have recently been successfully applied to a wide variety of Software Engineering tasks; they have also seen some practical adoption in practice, e.g. for code completion. Yet, we still know very little about what these pre-trained models learn about source code. In this article, we use probing--simple diagnostic tasks that do not further train the models--to discover to what extent pre-trained models learn about specific aspects of source code. We use an extensible framework to define 15 probing tasks that exercise surface, syntactic, structural and semantic characteristics of source code. We probe 8 pre-trained source code models, as well as a natural language model (BERT) as our baseline. We find that models that incorporate some structural information (such as GraphCodeBERT) have a better representation of source code characteristics. Surprisingly,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · WordPiece · Weight Decay · Linear Layer · Dense Connections · Linear Warmup With Linear Decay · Adam · Attention Dropout
