Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study
Sivajeet Chand, Kevin Nguyen, Peter Kuntz, Alexander Pretschner

TL;DR
This paper explores adapting large language models to generate and modify multi-file, repository-scale domain-specific language artifacts in an industrial setting, demonstrating effective fine-tuning and evaluation methods.
Contribution
It presents an end-to-end pipeline for multi-file code generation from natural language instructions, including dataset creation, model adaptation, and specialized evaluation metrics.
Findings
Fine-tuning significantly improves code generation accuracy and structural fidelity.
One-shot in-context learning offers modest but consistent improvements.
The approach effectively encodes folder hierarchies for cross-file dependency learning.
Abstract
Large language models (LLMs) perform strongly on general-purpose code generation, yet their applicability to enterprise domain-specific languages (DSLs) remains underexplored, especially for repository-scale change generation spanning multiple files and folder structures from a single natural-language (NL) instruction. We report an industrial case study at BMW that adapts code-oriented LLMs to generate and modify project-root DSL artifacts for an Xtext-based DSL that drives downstream Java/TypeScript code generation. We develop an end-to-end pipeline for dataset construction, multi-file task representation, model adaptation, and evaluation. We encode DSL folder hierarchies as structured, path-preserving JSON, allowing single-response generation at repository scale and learning cross-file dependencies. We evaluate two instruction-tuned code LLMs (Qwen2.5-Coder and DeepSeek-Coder, 7B)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
