Modular Tree Network for Source Code Representation Learning

Wenhan Wang; Ge Li; Sijie Shen; Xin Xia; Zhi Jin

arXiv:2104.00196·cs.SE·April 2, 2021

Modular Tree Network for Source Code Representation Learning

Wenhan Wang, Ge Li, Sijie Shen, Xin Xia, Zhi Jin

PDF

Open Access

TL;DR

This paper introduces a Modular Tree Network (MTN) that dynamically composes neural units based on abstract syntax trees to better capture program structure, improving performance in program classification and clone detection.

Contribution

The paper presents a novel MTN model that better captures AST substructure semantics by dynamically composing neural units, outperforming existing models.

Findings

01

Achieves state-of-the-art results in program classification.

02

Outperforms existing models in code clone detection.

03

Effectively leverages detailed structural information of source code.

Abstract

Learning representation for source code is a foundation of many program analysis tasks. In recent years, neural networks have already shown success in this area, but most existing models did not make full use of the unique structural information of programs. Although abstract syntax tree-based neural models can handle the tree structure in the source code, they cannot capture the richness of different types of substructure in programs. In this paper, we propose a modular tree network (MTN) which dynamically composes different neural network units into tree structures based on the input abstract syntax tree. Different from previous tree-structural neural network models, MTN can capture the semantic differences between types of ASTsubstructures. We evaluate our model on two tasks: program classification and code clone detection. Ourmodel achieves the best performance compared with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques