Enabling Open-World Specification Mining via Unsupervised Learning
Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, Thomas Reps

TL;DR
This paper introduces an unsupervised learning framework for open-world specification mining that automatically identifies usage patterns in complex, noisy API interactions without predefined rules or templates.
Contribution
It presents a miner-agnostic framework that leverages word embeddings to recover meaningful clusters of API interactions in an open-world setting, simplifying subsequent specification mining tasks.
Findings
Unsupervised learning effectively recovers API usage clusters.
Sub-word information improves embedding quality in software engineering.
The framework works on a benchmark of 71 clusters from open-source projects.
Abstract
Many programming tasks require using both domain-specific code and well-established patterns (such as routines concerned with file IO). Together, several small patterns combine to create complex interactions. This compounding effect, mixed with domain-specific idiosyncrasies, creates a challenging environment for fully automatic specification inference. Mining specifications in this environment, without the aid of rule templates, user-directed feedback, or predefined API surfaces, is a major challenge. We call this challenge Open-World Specification Mining. In this paper, we present a framework for mining specifications and usage patterns in an Open-World setting. We design this framework to be miner-agnostic and instead focus on disentangling complex and noisy API interactions. To evaluate our framework, we introduce a benchmark of 71 clusters extracted from five open-source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Advanced Malware Detection Techniques
