Quality Evaluation of COBOL to Java Code Transformation
Shmulik Froimovich, Raviv Gal, Wesam Ibraheem, Avi Ziv

TL;DR
This paper introduces an automated, scalable evaluation system combining analytic checkers and LLM-based judgment to assess COBOL-to-Java code translation quality, supporting continuous integration and large-scale benchmarking.
Contribution
It presents a novel evaluation framework that addresses challenges in assessing LLM-based code translation, integrating analytic tools with LLMs for comprehensive quality assessment.
Findings
Supports continuous integration workflows
Enables large-scale benchmarking
Reduces manual review reliance
Abstract
We present an automated evaluation system for assessing COBOL-to-Java code translation within IBM's watsonx Code Assistant for Z (WCA4Z). The system addresses key challenges in evaluating LLM-based translators, including model opacity and the complexity of translation quality assessment. Our approach combines analytic checkers with LLM-as-a-judge (LaaJ) techniques to deliver scalable, multi-faceted evaluations. The system supports continuous integration workflows, enables large-scale benchmarking, and reduces reliance on manual review. We describe the system architecture, evaluation strategies, and reporting mechanisms that provide actionable insights for developers and project managers, facilitating the evolution of high-quality, modernized codebases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Model-Driven Software Engineering Techniques
