Dependency-Aware Code Naturalness
Chen Yang, Junjie Chen, Jiajun Jiang, Yuliang Huang

TL;DR
This paper introduces DAN, a new method that incorporates program dependency graphs to measure code naturalness more accurately, improving bug detection and data cleansing in software engineering tasks.
Contribution
It is the first empirical study to incorporate code dependency into measuring code naturalness, demonstrating improved precision over existing line-based methods.
Findings
DAN better distinguishes natural and unnatural code.
DAN enhances bug detection accuracy.
DAN improves data cleansing for code models.
Abstract
Code naturalness, which captures repetitiveness and predictability in programming languages, has proven valuable for various code-related tasks in software engineering. However, precisely measuring code naturalness remains a fundamental challenge. Existing methods measure code naturalness over individual lines of code while ignoring the deep semantic relations among different lines, e.g., program dependency, which may negatively affect the precision of the measure. In this study, we aim to perform the first empirical study to investigate whether incorporating code dependency, instead of analyzing individual lines, can enhance the precision of measuring code naturalness. To achieve that, we first propose a new method named DAN for measuring code naturalness by incorporating the rich dependency information in the code. Specifically, DAN extracts multiple sequences of code lines by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
