Do People Prefer "Natural" code?
Casey Casalnuovo, Kevin Lee, Hulin Wang, Prem Devanbu, Emily Morgan

TL;DR
This paper investigates why natural code is highly repetitive, proposing that it is due to human preferences for familiar structures, and demonstrates that language models can predict these preferences, aligning with human judgments.
Contribution
The study introduces a theory that code repetitiveness stems from human preferences for familiar forms and validates it through modeling, transformations, and human experiments.
Findings
Transformations often produce less common code structures.
Language model scores correlate with human preferences.
Familiarity influences code writing choices.
Abstract
Natural code is known to be very repetitive (much more so than natural language corpora); furthermore, this repetitiveness persists, even after accounting for the simpler syntax of code. However, programming languages are very expressive, allowing a great many different ways (all clear and unambiguous) to express even very simple computations. So why is natural code repetitive? We hypothesize that the reasons for this lie in fact that code is bimodal: it is executed by machines, but also read by humans. This bimodality, we argue, leads developers to write code in certain preferred ways that would be familiar to code readers. To test this theory, we 1) model familiarity using a language model estimated over a large training corpus and 2) run an experiment applying several meaning preserving transformations to Java and Python expressions in a distinct test corpus to see if forms more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
MethodsTest
