Revisiting File Context for Source Code Summarization

Aakash Bansal; Chia-Yi Su; and Collin McMillan

arXiv:2309.02326·cs.SE·September 6, 2023

Revisiting File Context for Source Code Summarization

Aakash Bansal, Chia-Yi Su, and Collin McMillan

PDF

Open Access 1 Repo

TL;DR

This paper explores enhancing source code summarization by incorporating file context into neural models, demonstrating improvements especially on challenging examples where traditional methods fall short.

Contribution

The paper introduces a novel Transformer-based approach that encodes file context, improving code summarization performance over existing methods.

Findings

01

File context improves summarization on difficult examples.

02

The proposed model outperforms baseline approaches.

03

Encoding file context captures additional relevant information.

Abstract

Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder-decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself -- that information often resides in other nearby code. In this paper, we revisit the idea of ``file context'' for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apcl-research/transformerfc
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Dense Connections · Softmax · Linear Layer · Multi-Head Attention · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Dropout