IdBench: Evaluating Semantic Representations of Identifier Names in   Source Code

Yaza Wainakh; Moiz Rauf; Michael Pradel

arXiv:1910.05177·cs.LG·January 15, 2021

IdBench: Evaluating Semantic Representations of Identifier Names in Source Code

Yaza Wainakh, Moiz Rauf, Michael Pradel

PDF

1 Repo

TL;DR

This paper introduces IdBench, a benchmark for evaluating semantic representations of source code identifiers, revealing current techniques' strengths and weaknesses and proposing an ensemble model that improves performance.

Contribution

It presents the first benchmark for assessing identifier semantic embeddings against developer ratings and proposes an ensemble approach to enhance representation quality.

Findings

01

Existing embeddings vary in effectiveness.

02

No current technique fully captures semantic similarity.

03

Ensemble models outperform individual techniques.

Abstract

Identifier names convey useful information about the intended semantics of code. Name-based program analyses use this information, e.g., to detect bugs, to predict types, and to improve the readability of code. At the core of name-based analyses are semantic representations of identifiers, e.g., in the form of learned embeddings. The high-level goal of such a representation is to encode whether two identifiers, e.g., len and size, are semantically similar. Unfortunately, it is currently unclear to what extent semantic representations match the semantic relatedness and similarity perceived by developers. This paper presents IdBench, the first benchmark for evaluating semantic representations against a ground truth created from thousands of ratings by 500 software developers. We use IdBench to study state-of-the-art embedding techniques proposed for natural language, an embedding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sola-st/IdBench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.