What does Transformer learn about source code?

Kechi Zhang; Ge Li; Zhi Jin

arXiv:2207.08466·cs.SE·July 19, 2022·6 cites

What does Transformer learn about source code?

Kechi Zhang, Ge Li, Zhi Jin

PDF

Open Access

TL;DR

This paper investigates what structural information transformer models learn about source code by proposing methods to extract and analyze program graphs from the models, demonstrating their effectiveness in code understanding tasks.

Contribution

It introduces the aggregated attention score and aggregated attention graph methods to automatically extract meaningful program graphs from transformer models trained on source code.

Findings

01

Automatically extracted graphs are meaningful and effective.

02

Semantic graphs improve variable misuse detection.

03

Provides new insights into transformer understanding of code.

Abstract

In the field of source code processing, the transformer-based representation models have shown great powerfulness and have achieved state-of-the-art (SOTA) performance in many tasks. Although the transformer models process the sequential source code, pieces of evidence show that they may capture the structural information (\eg, in the syntax tree, data flow, control flow, \etc) as well. We propose the aggregated attention score, a method to investigate the structural information learned by the transformer. We also put forward the aggregated attention graph, a new way to extract program graphs from the pre-trained models automatically. We measure our methods from multiple perspectives. Furthermore, based on our empirical findings, we use the automatically extracted graphs to replace those ingenious manual designed graphs in the Variable Misuse task. Experimental results show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software System Performance and Reliability