Private-Library-Oriented Code Generation with Large Language Models
Daoguang Zan, Bei Chen, Yongshun Gong, Junzhi Cao, Fengji Zhang,, Bingchao Wu, Bei Guan, Yilong Yin, Yongji Wang

TL;DR
This paper introduces a framework for generating code that utilizes private libraries by retrieving relevant APIs and leveraging large language models, addressing the challenge of private API exposure during pre-training.
Contribution
It proposes a novel two-module framework, APIFinder and APICoder, combined with a reinforced version, CodeGenAPI, to enable LLMs to generate code for private libraries effectively.
Findings
The approach outperforms baseline models on private library benchmarks.
The framework generalizes well from public to private libraries.
Experiments demonstrate significant improvements in private API code generation.
Abstract
Large language models (LLMs), such as Codex and GPT-4, have recently showcased their remarkable code generation abilities, facilitating a significant boost in coding efficiency. This paper will delve into utilizing LLMs for code generation in private libraries, as they are widely employed in everyday programming. Despite their remarkable capabilities, generating such private APIs poses a formidable conundrum for LLMs, as they inherently lack exposure to these private libraries during pre-training. To address this challenge, we propose a novel framework that emulates the process of programmers writing private code. This framework comprises two modules: APIFinder first retrieves potentially useful APIs from API documentation; and APICoder then leverages these retrieved APIs to generate private code. Specifically, APIFinder employs vector retrieval techniques and allows user involvement in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Data Storage Technologies · Topic Modeling
