Verbatim Data Transcription Failures in LLM Code Generation: A State-Tracking Stress Test
Mohd Ariful Haque, Kishor Datta Gupta, Mohammad Ashiqur Rahman, Roy George

TL;DR
This paper presents a stress test benchmark to evaluate the reliability of large language models in accurately transcribing high-precision data into code, highlighting potential silent failures in critical software tasks.
Contribution
It introduces a minimal, targeted benchmark for assessing LLMs' ability to verbatim transcribe data, emphasizing data integrity in code generation tasks.
Findings
Identifies state-tracking failures in LLM code generation
Provides evaluation protocols for exact-string inclusion
Highlights importance of data fidelity in sensitive applications
Abstract
Many real-world software tasks require exact transcription of provided data into code, such as cryptographic constants, protocol test vectors, allowlists, and calibration tables. These tasks are operationally sensitive because small omissions or alterations can remain silent while producing syntactically valid programs. This paper introduces a deliberately minimal transcription-to-code benchmark to isolate this reliability concern in LLM-based code generation. Given a list of high-precision decimal constants, a model must generate Python code that embeds the constants verbatim and performs a simple aggregate computation. We describe the prompting variants, evaluation protocol based on exact-string inclusion, and analysis framework used to characterize state-tracking and long-horizon generation failures. The benchmark is intended as a compact stress test that complements existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Advanced Malware Detection Techniques
