Deep API Learning Revisited
James Martin, Jin L.C. Guo

TL;DR
This paper compares deep learning methods, including RNN and Transformer-based CodeBERT, for predicting API usage sequences from natural language queries, highlighting the impact of data cleaning and the superior performance of CodeBERT.
Contribution
It reproduces prior RNN-based API sequence prediction results and demonstrates that CodeBERT significantly outperforms previous methods on Python APIs.
Findings
Data cleaning reduces model performance.
CodeBERT outperforms RNN-based methods.
Pretraining on source code enhances API sequence prediction.
Abstract
Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest the sequence of APIs given natural language queries representing the information needs from programmers. Among such efforts, Gu et al. adopted a deep learning method, in particular an RNN Encoder-Decoder architecture, to perform this task and obtained promising results on common APIs in Java. In this work, we aim to reproduce their results and apply the same methods for APIs in Python. Additionally, we compare the performance with a more recent Transformer-based method, i.e., CodeBERT, for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software System Performance and Reliability
