Investigating the Impact of Vocabulary Difficulty and Code Naturalness on Program Comprehension
Bin Lin, Gregorio Robles

TL;DR
This study investigates how vocabulary difficulty and code naturalness influence program comprehension, aiming to improve readability prediction by analyzing correlations with source code characteristics.
Contribution
It introduces a novel approach to assess code readability by examining vocabulary difficulty and naturalness, and explores their potential to enhance prediction models.
Findings
Code naturalness correlates with readability scores.
Vocabulary difficulty impacts understandability assessments.
Naturalness and vocabulary metrics can improve prediction accuracy.
Abstract
Context: Developers spend most of their time comprehending source code during software development. Automatically assessing how readable and understandable source code is can provide various benefits in different tasks, such as task triaging and code reviews. While several studies have proposed approaches to predict software readability and understandability, most of them only focus on local characteristics of source code. Besides, the performance of understandability prediction is far from satisfactory. Objective: In this study, we aim to assess readability and understandability from the perspective of language acquisition. More specifically, we would like to investigate whether code readability and understandability are correlated with the naturalness and vocabulary difficulty of source code. Method: To assess code naturalness, we adopted the cross-entropy metric, while we use a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Engineering Techniques and Practices
