TL;DR
code2seq introduces a novel method that leverages the syntactic structure of code, using AST paths and attention mechanisms, to improve sequence generation tasks like code summarization and documentation.
Contribution
It presents a new approach that encodes source code as AST paths, outperforming previous models in code sequence generation tasks across multiple datasets.
Findings
Significantly outperforms previous models on code summarization tasks.
Effective across two programming languages and four datasets.
Utilizes syntactic structure for better code representation.
Abstract
The ability to generate natural language sequences from source code snippets has a variety of applications such as code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine translation (NMT), have achieved state-of-the-art performance on these tasks by treating source code as a sequence of tokens. We present : an alternative approach that leverages the syntactic structure of programming languages to better encode source code. Our model represents a code snippet as the set of compositional paths in its abstract syntax tree (AST) and uses attention to select the relevant paths while decoding. We demonstrate the effectiveness of our approach for two tasks, two programming languages, and four datasets of up to M examples. Our model significantly outperforms previous models that were specifically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
