TL;DR
This paper systematically evaluates static API-misuse detectors, revealing their limitations in precision and recall, and suggests directions for developing more effective detection methods.
Contribution
It introduces a comprehensive evaluation framework, including a misuse classification and benchmark, to compare existing detectors and identify their key limitations.
Findings
Detectors have highly variable capabilities.
Existing detectors suffer from low precision and recall.
Detectors need to incorporate more usage examples beyond naive deviation detection.
Abstract
Application Programming Interfaces (APIs) often have usage constraints, such as restrictions on call order or call conditions. API misuses, i.e., violations of these constraints, may lead to software crashes, bugs, and vulnerabilities. Though researchers developed many API-misuse detectors over the last two decades, recent studies show that API misuses are still prevalent. Therefore, we need to understand the capabilities and limitations of existing detectors in order to advance the state of the art. In this paper, we present the first-ever qualitative and quantitative evaluation that compares static API-misuse detectors along the same dimensions, and with original author validation. To accomplish this, we develop MUC, a classification of API misuses, and MUBenchPipe, an automated benchmark for detector comparison, on top of our misuse dataset, MUBench. Our results show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
