Verbatim Data Transcription Failures in LLM Code Generation: A State-Tracking Stress Test

Mohd Ariful Haque; Kishor Datta Gupta; Mohammad Ashiqur Rahman; Roy George

arXiv:2601.03640·cs.SE·January 8, 2026

Verbatim Data Transcription Failures in LLM Code Generation: A State-Tracking Stress Test

Mohd Ariful Haque, Kishor Datta Gupta, Mohammad Ashiqur Rahman, Roy George

PDF

Open Access

TL;DR

This paper presents a stress test benchmark to evaluate the reliability of large language models in accurately transcribing high-precision data into code, highlighting potential silent failures in critical software tasks.

Contribution

It introduces a minimal, targeted benchmark for assessing LLMs' ability to verbatim transcribe data, emphasizing data integrity in code generation tasks.

Findings

01

Identifies state-tracking failures in LLM code generation

02

Provides evaluation protocols for exact-string inclusion

03

Highlights importance of data fidelity in sensitive applications

Abstract

Many real-world software tasks require exact transcription of provided data into code, such as cryptographic constants, protocol test vectors, allowlists, and calibration tables. These tasks are operationally sensitive because small omissions or alterations can remain silent while producing syntactically valid programs. This paper introduces a deliberately minimal transcription-to-code benchmark to isolate this reliability concern in LLM-based code generation. Given a list of high-precision decimal constants, a model must generate Python code that embeds the constants verbatim and performs a simple aggregate computation. We describe the prompting variants, evaluation protocol based on exact-string inclusion, and analysis framework used to characterize state-tracking and long-horizon generation failures. The benchmark is intended as a compact stress test that complements existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Advanced Malware Detection Techniques