Grounded Discovery of Coordinate Term Relationships between Software Entities
Dana Movshovitz-Attias, William W. Cohen

TL;DR
This paper introduces a novel method for detecting coordinate-term relationships between Java classes by combining grounded information from software usage with corpus statistics, significantly improving accuracy.
Contribution
It develops a similarity measure for Java classes that integrates grounded software data with text corpus statistics, enhancing relation detection accuracy.
Findings
Cross-validation accuracy improved from 60% to 88%.
Classifier achieved an F1 score of 86% on top predictions.
Method effectively combines grounded and corpus data for relation discovery.
Abstract
We present an approach for the detection of coordinate-term relationships between entities from the software domain, that refer to Java classes. Usually, relations are found by examining corpus statistics associated with text entities. In some technical domains, however, we have access to additional information about the real-world objects named by the entities, suggesting that coupling information about the "grounded" entities with corpus statistics might lead to improved methods for relation discovery. To this end, we develop a similarity measure for Java classes using distributional information about how they are used in software, which we combine with corpus statistics on the distribution of contexts in which the classes appear in text. Using our approach, cross-validation accuracy on this dataset can be improved dramatically, from around 60% to 88%. Human labeling results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
