Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes
Ulrich Finkler, Irene Manotas, Wei Zhang, Geert Janssen, Octavian Popescu, Shyam Ramji

TL;DR
This paper introduces an automated method for customizing large language models for enterprise code repositories using semantic scopes, significantly improving code completion accuracy in private settings.
Contribution
The paper presents a novel approach leveraging semantic scopes for automated LLM customization, enhancing code generation for private repositories without extensive retraining.
Findings
Customized models outperform larger uncustomized models in code completion
Retrieval-Augmented Generation and supervised Fine-Tuning are effective strategies
Improved code accuracy boosts developer productivity
Abstract
Code completion (CC) is a task frequently used by developers when working in collaboration with LLM-based programming assistants. Despite the increased performance of LLMs on public benchmarks, out of the box LLMs still have a hard time generating code that aligns with a private code repository not previously seen by the model's training data. Customizing code LLMs to a private repository provides a way to improve the model performance. In this paper we present our approach for automated LLM customization based on semantic scopes in the code. We evaluate LLMs on real industry cases with two private enterprise code repositories with two customization strategies: Retrieval-Augmented Generation (RAG) and supervised Fine-Tuning (FT). Our mechanism for ingesting the repository's data and formulating the training data pairs with semantic scopes helps models to learn the underlying patterns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Model-Driven Software Engineering Techniques · Software Engineering Techniques and Practices
