Adapting Neural Text Classification for Improved Software Categorization

Alexander LeClair; Zachary Eberhart; Collin McMillan

arXiv:1806.01742·cs.SE·June 18, 2018

Adapting Neural Text Classification for Improved Software Categorization

Alexander LeClair, Zachary Eberhart, Collin McMillan

PDF

1 Repo

TL;DR

This paper enhances neural text classification methods for software categorization, addressing the challenges of applying NLP techniques directly to source code and comments, and demonstrates improved accuracy through tailored adaptations.

Contribution

The paper introduces specific adaptations to neural classification algorithms that significantly improve software categorization accuracy over existing methods.

Findings

01

Achieved higher classification accuracy than previous techniques.

02

Neural adaptations perform well on both Debian programs and annotated C/C++ libraries.

03

Proposed method outperforms standard neural text classifiers on software data.

Abstract

Software Categorization is the task of organizing software into groups that broadly describe the behavior of the software, such as "editors" or "science." Categorization plays an important role in several maintenance tasks, such as repository navigation and feature elicitation. Current approaches attempt to cast the problem as text classification, to make use of the rich body of literature from the NLP domain. However, as we will show in this paper, text classification algorithms are generally not applicable off-the-shelf to source code; we found that they work well when high-level project descriptions are available, but suffer very large performance penalties when classifying source code and comments only. We propose a set of adaptations to a state-of-the-art neural classification algorithm and perform two evaluations: one with reference data from Debian end-user programs, and one with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

paqs2020/paqs2020
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.