How Do Programmers Express High-Level Concepts using Primitive Data Types?
Yusuke Shinyama, Yoshitaka Arahori, Katsuhiko Gondow

TL;DR
This paper presents a method to identify high-level conceptual expressions in source code using primitive data types by leveraging API call analysis and machine learning, achieving 83% accuracy.
Contribution
It introduces a novel approach to classify conceptual types in code based on API call patterns, aiding bug detection and documentation.
Findings
Classifier achieved 83% F-score in predicting conceptual types.
Expressions for conceptual types can be inferred reasonably well from source code.
The approach can be used for bug detection, test case generation, and documentation.
Abstract
We investigated how programmers express high-level concepts such as path names and coordinates using primitive data types. While relying too much on primitive data types is sometimes criticized as a bad smell, it is still a common practice among programmers. We propose a novel way to accurately identify expressions for certain predefined concepts by examining API calls. We defined twelve conceptual types used in the Java Standard API. We then obtained expressions for each conceptual type from 26 open source projects. Based on the expressions obtained, we trained a decision tree-based classifier. It achieved 83% F-score for correctly predicting the conceptual type for a given expression. Our result indicates that it is possible to infer a conceptual type from a source code reasonably well once enough examples are given. The obtained classifier can be used for potential bug detection,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
