Can LLMs Replace Humans During Code Chunking?
Christopher Glasz, Emily Escamilla, Eric O. Scott, Anand Patel, Jacob Zimmer, Colin Diggs, Michael Doyle, Scott Rosen, Nitin Naik, Justin F. Brunelle, Samruddhi Thaker, Parthav Poudel, Arun Sridharan, Amit Madan, Doug Wendt, William Macke, Thomas Schill

TL;DR
This paper explores the use of large language models to automate code chunking in legacy government software, demonstrating that LLMs can effectively replace humans in partitioning large codebases for documentation and modernization.
Contribution
It introduces novel code-chunking methods tailored for legacy languages and empirically evaluates their effectiveness across multiple LLMs, highlighting improvements in documentation quality.
Findings
LLMs can select partition points similar to human experts
Chunking methods significantly influence documentation quality
LLM-generated partitions improve factual accuracy and usefulness of comments
Abstract
Large language models (LLMs) have become essential tools in computer science, especially for tasks involving code understanding and generation. However, existing work does not address many of the unique challenges presented by code written for government applications. In particular, government enterprise software is often written in legacy languages like MUMPS or assembly language code (ALC) and the overall token lengths of these systems exceed the context window size for current commercially available LLMs. Additionally, LLMs are primarily trained on modern software languages and have undergone limited testing with legacy languages, making their ability to understand legacy languages unknown and, hence, an area for empirical study. This paper examines the application of LLMs in the modernization of legacy government code written in ALC and MUMPS, addressing the challenges of input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLLaMA
