Leveraging LLMs for Legacy Code Modernization: Challenges and   Opportunities for LLM-Generated Documentation

Colin Diggs; Michael Doyle; Amit Madan; Siggy Scott; Emily Escamilla,; Jacob Zimmer; Naveed Nekoo; Paul Ursino; Michael Bartholf; Zachary Robin,; Anand Patel; Chris Glasz; William Macke; Paul Kirk; Jasper Phillips; Arun; Sridharan; Doug Wendt; Scott Rosen; Nitin Naik; Justin F. Brunelle; Samruddhi; Thaker

arXiv:2411.14971·cs.LG·November 25, 2024·3 cites

Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation

Colin Diggs, Michael Doyle, Amit Madan, Siggy Scott, Emily Escamilla,, Jacob Zimmer, Naveed Nekoo, Paul Ursino, Michael Bartholf, Zachary Robin,, Anand Patel, Chris Glasz, William Macke, Paul Kirk, Jasper Phillips, Arun, Sridharan, Doug Wendt, Scott Rosen, Nitin Naik

PDF

Open Access

TL;DR

This study explores the potential of large language models to generate documentation for legacy code in outdated languages, evaluating their effectiveness and limitations through human and automated assessments.

Contribution

It introduces a prompting strategy for LLMs to produce line-wise comments on legacy code and evaluates their quality, revealing current metrics' limitations.

Findings

01

LLM-generated comments are generally hallucination-free, complete, readable, and useful.

02

Automated metrics do not strongly correlate with human judgments of comment quality.

03

A significant challenge remains in developing better evaluation metrics for LLM-generated legacy code documentation.

Abstract

Legacy software systems, written in outdated languages like MUMPS and mainframe assembly, pose challenges in efficiency, maintenance, staffing, and security. While LLMs offer promise for modernizing these systems, their ability to understand legacy languages is largely unknown. This paper investigates the utilization of LLMs to generate documentation for legacy code using two datasets: an electronic health records (EHR) system in MUMPS and open-source applications in IBM mainframe Assembly Language Code (ALC). We propose a prompting strategy for generating line-wise code comments and a rubric to evaluate their completeness, readability, usefulness, and hallucination. Our study assesses the correlation between human evaluations and automated metrics, such as code complexity and reference-based metrics. We find that LLM-generated comments for MUMPS and ALC are generally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Law, AI, and Intellectual Property · Digital Rights Management and Security