Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting
Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li,, Rongxin Wu

TL;DR
This paper introduces BASTS, a novel method that improves code summarization by splitting code into blocks based on ASTs and using Tree-LSTM and Transformer models to generate more accurate summaries.
Contribution
The paper proposes BASTS, a new approach that leverages block-wise AST splitting and local syntax encoding to enhance code summarization performance.
Findings
BASTS outperforms state-of-the-art methods on benchmark datasets.
The block-wise AST splitting improves the quality of code summaries.
Pre-training of Tree-LSTM captures local syntax effectively.
Abstract
Automatic code summarization frees software developers from the heavy burden of manual commenting and benefits software development and maintenance. Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been incorporated to guide the generation of code summaries. However, existing AST based methods suffer from the difficulty of training and generate inadequate code summaries. In this paper, we present the Block-wise Abstract Syntax Tree Splitting method (BASTS for short), which fully utilizes the rich tree-form syntax structure in ASTs, for improving code summarization. BASTS splits the code of a method based on the blocks in the dominator tree of the Control Flow Graph, and generates a split AST for each code split. Each split AST is then modeled by a Tree-LSTM using a pre-training strategy to capture local non-linear syntax encoding. The learned syntax…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Web Data Mining and Analysis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Byte Pair Encoding · Attention Is All You Need · Label Smoothing · Dropout · Residual Connection
