Revisiting File Context for Source Code Summarization
Aakash Bansal, Chia-Yi Su, and Collin McMillan

TL;DR
This paper explores enhancing source code summarization by incorporating file context into neural models, demonstrating improvements especially on challenging examples where traditional methods fall short.
Contribution
The paper introduces a novel Transformer-based approach that encodes file context, improving code summarization performance over existing methods.
Findings
File context improves summarization on difficult examples.
The proposed model outperforms baseline approaches.
Encoding file context captures additional relevant information.
Abstract
Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder-decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself -- that information often resides in other nearby code. In this paper, we revisit the idea of ``file context'' for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
MethodsAttention Is All You Need · Dense Connections · Softmax · Linear Layer · Multi-Head Attention · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Dropout
