COBOL-Coder: Domain-Adapted Large Language Models for COBOL Code Generation and Translation
Anh T. V. Dau, Shin Hwei Tan, Jinqiu Yang, Nghi D. Q. Bui, Anh Tuan Nguyen

TL;DR
This paper presents COBOL-Coder, a domain-adapted large language model for COBOL code generation and translation, achieving higher accuracy and reliability than general-purpose models through specialized data curation and fine-tuning.
Contribution
We developed a high-quality COBOL training dataset and fine-tuned a specialized LLM, COBOL-Coder, to improve COBOL code generation and translation performance.
Findings
COBOL-Coder achieves up to 73.95% compilation success rate.
COBOL-Coder outperforms GPT-4o and open-source baselines in code generation.
Participants find COBOL-Coder more reliable and aligned with enterprise practices.
Abstract
COBOL remains a critical language for mainframe systems, yet existing large language models (LLMs) struggle to generate and translate COBOL code correctly. This paper reports our experience in developing and evaluating domain-adapted LLMs for COBOL and mainframe software engineering. We introduce (1) an automated data curation pipeline that combines compiler-guided validation with multi-stage similarity-based filtering to construct high-quality COBOL training data, and (2) COBOL-Coder, a COBOL-specialized LLM fine-tuned on the curated COBOL domain data. We evaluate COBOL-Coder on two tasks: code generation (on COBOLEval and COBOLCodeBench) and code translation (on COBOL-JavaTrans, our proposed benchmark for bidirectional COBOL-Java translation). In our experiments, COBOL-Coder achieves up to a 73.95 percent compilation success rate and 49.33 Pass-1 on COBOLEval, compared to 41.8 percent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
