Convolutional Neural Networks over Tree Structures for Programming Language Processing
Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin

TL;DR
This paper introduces a novel tree-based convolutional neural network (TBCNN) designed to effectively process programming languages by capturing their structural information through abstract syntax trees, outperforming existing models.
Contribution
The paper presents a new TBCNN architecture that applies convolution over program syntax trees, specifically tailored for programming language analysis tasks.
Findings
TBCNN outperforms baseline neural models in program classification.
TBCNN effectively detects specific code patterns.
TBCNN demonstrates versatility across different program analysis tasks.
Abstract
Programming language processing (similar to natural language processing) is a hot research topic in the field of software engineering; it has also aroused growing interest in the artificial intelligence community. However, different from a natural language sentence, a program contains rich, explicit, and complicated structural information. Hence, traditional NLP models may be inappropriate for programs. In this paper, we propose a novel tree-based convolutional neural network (TBCNN) for programming language processing, in which a convolution kernel is designed over programs' abstract syntax trees to capture structural information. TBCNN is a generic architecture for programming language processing; our experiments show its effectiveness in two different program analysis tasks: classifying programs according to functionality, and detecting code snippets of certain patterns. TBCNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Software Testing and Debugging Techniques
MethodsConvolution
