Parameter-Free Probabilistic API Mining across GitHub

Jaroslav Fowkes; Charles Sutton

arXiv:1512.05558·cs.SE·November 14, 2016

Parameter-Free Probabilistic API Mining across GitHub

Jaroslav Fowkes, Charles Sutton

PDF

1 Repo

TL;DR

This paper introduces PAM, a parameter-free probabilistic algorithm for mining API call patterns from GitHub, which outperforms existing methods and highlights limitations of hand-written examples in capturing real API usage.

Contribution

PAM is a novel near parameter-free probabilistic algorithm that improves API pattern mining accuracy and reduces the need for parameter tuning compared to prior methods.

Findings

01

PAM achieves 69% test-set precision in API call sequence retrieval.

02

PAM significantly outperforms MAPO and UPMiner.

03

Hand-written API examples have limited coverage of real API usages.

Abstract

Existing API mining algorithms can be difficult to use as they require expensive parameter tuning and the returned set of API calls can be large, highly redundant and difficult to understand. To address this, we present PAM (Probabilistic API Miner), a near parameter-free probabilistic algorithm for mining the most interesting API call patterns. We show that PAM significantly outperforms both MAPO and UPMiner, achieving 69% test-set precision, at retrieving relevant API call sequences from GitHub. Moreover, we focus on libraries for which the developers have explicitly provided code examples, yielding over 300,000 LOC of hand-written API example code from the 967 client projects in the data set. This evaluation suggests that the hand-written examples actually have limited coverage of real API usages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mast-group/api-mining
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.