When Are Names Similar Or the Same? Introducing the Code Names Matcher Library
Moshe Munk, Dror G. Feitelson

TL;DR
This paper introduces the Code Names Matcher library, a set of functions designed to compare code element names by capturing human-like perceptions of similarity, accommodating variations, errors, and synonyms.
Contribution
The paper presents a novel library of comparison functions tailored for code names, addressing variability and human perception in name similarity assessment.
Findings
Provides multiple similarity functions for code names
Addresses spelling, reordering, and synonym variations
Facilitates more human-like code name comparison
Abstract
Program code contains functions, variables, and data structures that are represented by names. To promote human understanding, these names should describe the role and use of the code elements they represent. But the names given by developers show high variability, reflecting the tastes of each developer, with different words used for the same meaning or the same words used for different meanings. This makes comparing names hard. A precise comparison should be based on matching identical words, but also take into account possible variations on the words (including spelling and typing errors), reordering of the words, matching between synonyms, and so on. To facilitate this we developed a library of comparison functions specifically targeted to comparing names in code. The different functions calculate the similarity between names in different ways, so a researcher can choose the one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Reliability and Analysis Research
