TL;DR
This paper enhances neural text classification methods for software categorization, addressing the challenges of applying NLP techniques directly to source code and comments, and demonstrates improved accuracy through tailored adaptations.
Contribution
The paper introduces specific adaptations to neural classification algorithms that significantly improve software categorization accuracy over existing methods.
Findings
Achieved higher classification accuracy than previous techniques.
Neural adaptations perform well on both Debian programs and annotated C/C++ libraries.
Proposed method outperforms standard neural text classifiers on software data.
Abstract
Software Categorization is the task of organizing software into groups that broadly describe the behavior of the software, such as "editors" or "science." Categorization plays an important role in several maintenance tasks, such as repository navigation and feature elicitation. Current approaches attempt to cast the problem as text classification, to make use of the rich body of literature from the NLP domain. However, as we will show in this paper, text classification algorithms are generally not applicable off-the-shelf to source code; we found that they work well when high-level project descriptions are available, but suffer very large performance penalties when classifying source code and comments only. We propose a set of adaptations to a state-of-the-art neural classification algorithm and perform two evaluations: one with reference data from Debian end-user programs, and one with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
