Domain Adaptive Code Completion via Language Models and Decoupled Domain   Databases

Ze Tang; Jidong Ge; Shangqing Liu; Tingwei Zhu; Tongtong Xu; Liguo; Huang; Bin Luo

arXiv:2308.09313·cs.SE·September 21, 2023·1 cites

Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

Ze Tang, Jidong Ge, Shangqing Liu, Tingwei Zhu, Tongtong Xu, Liguo, Huang, Bin Luo

PDF

Open Access 1 Repo

TL;DR

This paper introduces $k$NM-LM, a retrieval-augmented, domain-adaptive code completion method that enhances large language models without fine-tuning, effectively integrating domain knowledge for improved performance across various scenarios.

Contribution

The paper presents a novel retrieval-augmented approach that adapts to different language models and domains without fine-tuning, using Bayesian inference to incorporate domain knowledge.

Findings

01

$k$NM-LM outperforms CodeGPT and UnixCoder in intra-project and intra-scenario tasks.

02

The approach operates efficiently with satisfactory speed and storage usage.

03

It seamlessly integrates with black-box models without requiring access to model parameters.

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example completing the library names. Although there are several works that have confirmed the effectiveness of fine-tuning techniques to adapt language models for code completion in specific domains. They are limited by the need for constant fine-tuning of the model when the project is in constant iteration. To address this limitation, in this paper, we propose $k$ NM-LM, a retrieval-augmented language model (R-LM), that integrates domain knowledge into language models without fine-tuning. Different from previous techniques, our approach is able to automatically adapt to different language models and domains. Specifically, it utilizes the in-domain code to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zetang94/ase2023_knm-lm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques