On the Generation, Structure, and Semantics of Grammar Patterns in Source Code Identifiers
Christian D. Newman, Reem S. AlSuhaibani, Michael J. Decker, Anthony, Peruma, Dishant Kaushik, Mohamed Wiem Mkaouer, Emily Hill

TL;DR
This paper analyzes naming patterns in source code identifiers using part-of-speech sequences, examining their structure, semantics, and how well current models can automatically identify these patterns to aid code comprehension.
Contribution
It establishes common naming patterns across identifier types, analyzes their impact on understanding code, and evaluates the accuracy of state-of-the-art POS tagging techniques for modeling identifiers.
Findings
Identified common naming patterns in class and attribute identifiers
Analyzed how patterns influence code comprehension
Evaluated POS tagging accuracy and its limitations
Abstract
Identifiers make up a majority of the text in code. They are one of the most basic mediums through which developers describe the code they create and understand the code that others create. Therefore, understanding the patterns latent in identifier naming practices and how accurately we are able to automatically model these patterns is vital if researchers are to support developers and automated analysis approaches in comprehending and creating identifiers correctly and optimally. This paper investigates identifiers by studying sequences of part-of-speech annotations, referred to as grammar patterns. This work advances our understanding of these patterns and our ability to model them by 1) establishing common naming patterns in different types of identifiers, such as class and attribute names; 2) analyzing how different patterns influence comprehension; and 3) studying the accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
