Poisoned Identifiers Survive LLM Deobfuscation: A Case Study on Claude Opus 4.6
Luis Guzm\'an Lorenzo

TL;DR
This study investigates whether poisoned identifier names in JavaScript survive deobfuscation by large language models, revealing persistent patterns despite semantic correctness and task framing.
Contribution
It demonstrates that poisoned identifiers can persist through deobfuscation even when models understand the code semantically, highlighting challenges in model robustness.
Findings
Poisoned names persisted in every baseline run across artifacts.
Persistence occurred alongside correct semantic commentary.
Reframing tasks reduced propagation of poisoned identifiers.
Abstract
When an LLM deobfuscates JavaScript, can poisoned identifier names in the string table survive into the model's reconstructed code, even when the model demonstrably understands the correct semantics? Using Claude Opus 4.6 across 192 inference runs on two code archetypes (force-directed graph simulation, A* pathfinding; 50 conditions, N=3-6), we found three consistent patterns: (1) Poisoned names persisted in every baseline run on both artifacts (physics: 8/8; pathfinding: 5/5). Matched controls showed this extends to terms with zero semantic fit when the string table does not form a coherent alternative domain. (2) Persistence coexisted with correct semantic commentary: in 15/17 runs the model wrote wrong variable names while correctly describing the actual operation in comments. (3) Task framing changed persistence: explicit verification prompts had no effect (12/12 across 4 variants),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
