Do People Prefer "Natural" code?

Casey Casalnuovo; Kevin Lee; Hulin Wang; Prem Devanbu; Emily Morgan

arXiv:1910.03704·cs.CL·October 10, 2019·1 cites

Do People Prefer "Natural" code?

Casey Casalnuovo, Kevin Lee, Hulin Wang, Prem Devanbu, Emily Morgan

PDF

Open Access

TL;DR

This paper investigates why natural code is highly repetitive, proposing that it is due to human preferences for familiar structures, and demonstrates that language models can predict these preferences, aligning with human judgments.

Contribution

The study introduces a theory that code repetitiveness stems from human preferences for familiar forms and validates it through modeling, transformations, and human experiments.

Findings

01

Transformations often produce less common code structures.

02

Language model scores correlate with human preferences.

03

Familiarity influences code writing choices.

Abstract

Natural code is known to be very repetitive (much more so than natural language corpora); furthermore, this repetitiveness persists, even after accounting for the simpler syntax of code. However, programming languages are very expressive, allowing a great many different ways (all clear and unambiguous) to express even very simple computations. So why is natural code repetitive? We hypothesize that the reasons for this lie in fact that code is bimodal: it is executed by machines, but also read by humans. This bimodality, we argue, leads developers to write code in certain preferred ways that would be familiar to code readers. To test this theory, we 1) model familiarity using a language model estimated over a large training corpus and 2) run an experiment applying several meaning preserving transformations to Java and Python expressions in a distinct test corpus to see if forms more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques

MethodsTest