SLM Finetuning for Natural Language to Domain Specific Code Generation in Production
Renjini R. Nair (Microsoft), Damian K. Kowalczyk (Microsoft), Marco Gaudesi (Microsoft), Chhaya Methani (Microsoft)

TL;DR
This paper demonstrates that fine-tuning small language models for domain-specific code generation improves performance and latency, offering an efficient alternative to large models in production environments.
Contribution
It evaluates fine-tuning small models like Mistral for domain-specific code generation, showing improved accuracy, latency, and adaptability without degrading general performance.
Findings
Fine-tuned small models outperform larger models in test accuracy.
Fine-tuning enables quick adaptation to customer-specific scenarios.
Load testing confirms optimal latency and quality in production.
Abstract
Many applications today use large language models for code generation; however, production systems have strict latency requirements that can be difficult to meet with large models. Small language models with a few billion parameters are resource efficient but may suffer from limited reasoning, hallucinations, or poor retention of longer context. Fine tuning improves task specific accuracy by embedding domain knowledge directly into model weights, reducing reliance on runtime context. We previously implemented a baseline natural language to code generation approach using a retrieval augmented generation pipeline that dynamically selected few shot examples to embed domain specific language context for a large language model. In this study, we evaluate small language models for generating domain specific language from natural language by fine tuning variants of Mistral and other models on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
