On the Effectiveness of Pretrained Models for API Learning

Mohammad Abdul Hadi; Imam Nur Bani Yusuf; Ferdian Thung; Kien Gia; Luong; Jiang Lingxiao; Fatemeh H. Fard; David Lo

arXiv:2204.03498·cs.SE·April 8, 2022

On the Effectiveness of Pretrained Models for API Learning

Mohammad Abdul Hadi, Imam Nur Bani Yusuf, Ferdian Thung, Kien Gia, Luong, Jiang Lingxiao, Fatemeh H. Fard, David Lo

PDF

TL;DR

This paper evaluates the effectiveness of pre-trained Transformer models for automatic API sequence generation from natural language queries, showing they outperform previous methods by around 11% on a large GitHub dataset.

Contribution

It is the first to systematically assess pre-trained Transformer models for API learning, demonstrating their superior performance and exploring tokenization strategies to enhance results.

Findings

01

PTMs generate more accurate API sequences than previous methods.

02

PTMs outperform existing approaches by approximately 11%.

03

Tokenization approaches significantly boost PTMs' performance.

Abstract

Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences. As it stands, the first approach treats queries and API names as bags of words. It lacks deep comprehension of the semantics of the queries. The latter approach adapts a neural language model to encode a user query into a fixed-length context vector and generate API sequences from the context vector. We want to understand the effectiveness of recent Pre-trained Transformer based Models (PTMs) for the API…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Dropout · Absolute Position Encodings · Label Smoothing · Softmax · Layer Normalization · Adam · Residual Connection · Byte Pair Encoding