Exploring different approaches to customize language models for domain-specific text-to-code generation

Lu\'is Freire; Fernanda A. Andal\'o; Nicki Skafte Detlefsen

arXiv:2603.16526·cs.AI·March 18, 2026

Exploring different approaches to customize language models for domain-specific text-to-code generation

Lu\'is Freire, Fernanda A. Andal\'o, Nicki Skafte Detlefsen

PDF

Open Access

TL;DR

This paper compares methods for customizing small language models to generate domain-specific code, finding that fine-tuning with LoRA improves accuracy while prompting methods are more cost-effective.

Contribution

It systematically evaluates three customization strategies—few-shot prompting, retrieval-augmented generation, and LoRA fine-tuning—for domain-specific code generation with small models.

Findings

01

LoRA fine-tuning yields higher accuracy and better domain alignment.

02

Prompting methods improve domain relevance but have limited impact on accuracy.

03

Trade-offs exist between cost, flexibility, and performance in model customization.

Abstract

Large language models (LLMs) have demonstrated strong capabilities in generating executable code from natural language descriptions. However, general-purpose models often struggle in specialized programming contexts where domain-specific libraries, APIs, or conventions must be used. Customizing smaller open-source models offers a cost-effective alternative to relying on large proprietary systems. In this work, we investigate how smaller language models can be adapted for domain-specific code generation using synthetic datasets. We construct datasets of programming exercises across three domains within the Python ecosystem: general Python programming, Scikit-learn machine learning workflows, and OpenCV-based computer vision tasks. Using these datasets, we evaluate three customization strategies: few-shot prompting, retrieval-augmented generation (RAG), and parameter-efficient fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Machine Learning in Materials Science