Code Representation Learning with Pr\"ufer Sequences

Tenzin Jinpa; Yong Gao

arXiv:2111.07263·cs.AI·November 16, 2021

Code Representation Learning with Pr\"ufer Sequences

Tenzin Jinpa, Yong Gao

PDF

Open Access

TL;DR

This paper introduces a novel code representation using Prüfer sequences of Abstract Syntax Trees, enabling more effective deep learning models for code summarization by capturing structural information efficiently.

Contribution

The paper proposes a lossless, concise AST encoding method with Prüfer sequences, improving deep learning performance in code understanding tasks.

Findings

01

Outperforms recent deep-learning models in code summarization tasks

02

Provides a lossless and concise structural code representation

03

Enhances the exploitation of syntactic roles in code modeling

Abstract

An effective and efficient encoding of the source code of a computer program is critical to the success of sequence-to-sequence deep neural network models for tasks in computer program comprehension, such as automated code summarization and documentation. A significant challenge is to find a sequential representation that captures the structural/syntactic information in a computer program and facilitates the training of the learning models. In this paper, we propose to use the Pr\"ufer sequence of the Abstract Syntax Tree (AST) of a computer program to design a sequential representation scheme that preserves the structural information in an AST. Our representation makes it possible to develop deep-learning models in which signals carried by lexical tokens in the training examples can be exploited automatically and selectively based on their syntactic role and importance. Unlike other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Software Testing and Debugging Techniques