Do Names Echo Semantics? A Large-Scale Study of Identifiers Used in C++'s Named Casts
Constantin Cezar Petrescu, Sam Smith, Rafail Giavrimis, Santanu, Kumar Dash

TL;DR
This paper introduces an information-theoretic method to evaluate the correctness of explicit cast operations in C++, using a large dataset from Chromium, achieving high precision and recall in identifying poor casts.
Contribution
It proposes a novel entropy-based approach to detect poor type casts in C++, validated on a large real-world dataset from Chromium.
Findings
Achieved 81% precision and 90% recall in vetting casts.
Identified and fixed notable incorrect casts in Chromium.
Demonstrated effectiveness of entropy-based analysis for code quality.
Abstract
Developers relax restrictions on a type to reuse methods with other types. While type casts are prevalent, in weakly typed languages such as C++, they are also extremely permissive. Assignments where a source expression is cast into a new type and assigned to a target variable of the new type, can lead to software bugs if performed without care. In this paper, we propose an information-theoretic approach to identify poor implementations of explicit cast operations. Our approach measures accord between the source expression and the target variable using conditional entropy. We collect casts from 34 components of the Chromium project, which collectively account for 27MLOC and random-uniformly sample this dataset to create a manually labelled dataset of 271 casts. Information-theoretic vetting of these 271 casts achieves a peak precision of 81% and a recall of 90%. We additionally present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities
