When Language Model Meets Private Library
Daoguang Zan, Bei Chen, Zeqi Lin, Bei Guan, Yongji Wang, Jian-Guang, Lou

TL;DR
This paper introduces a framework enabling pre-trained language models to generate code using private libraries by retrieving relevant APIs and generating code accordingly, addressing the challenge of unseen private APIs.
Contribution
The paper proposes a novel two-module framework with API retrieval and code generation, capable of adapting pre-trained models to private libraries using public data.
Findings
Effective API retrieval system with user interaction
Pre-trained models can be adapted to private libraries
Framework achieves strong performance on new benchmarks
Abstract
With the rapid development of pre-training techniques, a number of language models have been pre-trained on large-scale code corpora and perform well in code generation. In this paper, we investigate how to equip pre-trained language models with the ability of code generation for private libraries. In practice, it is common for programmers to write code using private libraries. However, this is a challenge for language models since they have never seen private APIs during training. Motivated by the fact that private libraries usually come with elaborate API documentation, we propose a novel framework with two modules: the APIRetriever finds useful APIs, and then the APICoder generates code using these APIs. For APIRetriever, we present a dense retrieval system and also design a friendly interaction to involve uses. For APICoder, we can directly use off-the-shelf language models, or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Web Data Mining and Analysis
MethodsBalanced Selection
