Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

Sivajeet Chand; Kevin Nguyen; Peter Kuntz; Alexander Pretschner

arXiv:2604.24678·cs.SE·April 28, 2026

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

Sivajeet Chand, Kevin Nguyen, Peter Kuntz, Alexander Pretschner

PDF

TL;DR

This paper explores adapting large language models to generate and modify multi-file, repository-scale domain-specific language artifacts in an industrial setting, demonstrating effective fine-tuning and evaluation methods.

Contribution

It presents an end-to-end pipeline for multi-file code generation from natural language instructions, including dataset creation, model adaptation, and specialized evaluation metrics.

Findings

01

Fine-tuning significantly improves code generation accuracy and structural fidelity.

02

One-shot in-context learning offers modest but consistent improvements.

03

The approach effectively encodes folder hierarchies for cross-file dependency learning.

Abstract

Large language models (LLMs) perform strongly on general-purpose code generation, yet their applicability to enterprise domain-specific languages (DSLs) remains underexplored, especially for repository-scale change generation spanning multiple files and folder structures from a single natural-language (NL) instruction. We report an industrial case study at BMW that adapts code-oriented LLMs to generate and modify project-root DSL artifacts for an Xtext-based DSL that drives downstream Java/TypeScript code generation. We develop an end-to-end pipeline for dataset construction, multi-file task representation, model adaptation, and evaluation. We encode DSL folder hierarchies as structured, path-preserving JSON, allowing single-response generation at repository scale and learning cross-file dependencies. We evaluate two instruction-tuned code LLMs (Qwen2.5-Coder and DeepSeek-Coder, 7B)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.