Two Sides of the Same Coin: Exploiting the Impact of Identifiers in   Neural Code Comprehension

Shuzheng Gao; Cuiyun Gao; Chaozheng Wang; Jun Sun; David Lo; Yue Yu

arXiv:2207.11104·cs.SE·February 8, 2023

Two Sides of the Same Coin: Exploiting the Impact of Identifiers in Neural Code Comprehension

Shuzheng Gao, Cuiyun Gao, Chaozheng Wang, Jun Sun, David Lo, Yue Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces CREAM, a novel framework that models the dual impact of identifiers in neural code comprehension using causal and counterfactual reasoning, significantly improving robustness and accuracy.

Contribution

The paper proposes a causal, counterfactual reasoning-based framework called CREAM to better exploit identifiers' dual effects in neural code comprehension models.

Findings

01

CREAM outperforms baselines in robustness (+37.9% F1 on function naming)

02

CREAM improves accuracy on original datasets (+0.5% F1)

03

Effective in tasks: function naming, defect detection, code classification

Abstract

Previous studies have demonstrated that neural code comprehension models are vulnerable to identifier naming. By renaming as few as one identifier in the source code, the models would output completely irrelevant results, indicating that identifiers can be misleading for model prediction. However, identifiers are not completely detrimental to code comprehension, since the semantics of identifier names can be related to the program semantics. Well exploiting the two opposite impacts of identifiers is essential for enhancing the robustness and accuracy of neural code comprehension, and still remains under-explored. In this work, we propose to model the impact of identifiers from a novel causal perspective, and propose a counterfactual reasoning-based framework named CREAM. CREAM explicitly captures the misleading information of identifiers through multi-task learning in the training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reliablecoding/cream
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Advanced Malware Detection Techniques