The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces

Subramanyam Sahoo

arXiv:2512.13821·cs.LG·February 6, 2026

The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces

Subramanyam Sahoo

PDF

Open Access

TL;DR

This paper introduces CTVP, a novel verification framework for detecting malicious behavior in code-generating language models by analyzing consistency in predicted execution traces across program transformations, supported by theoretical bounds.

Contribution

It proposes a new semantic orbit analysis method and the ARQ metric for provably detecting backdoors in untrusted code models, grounded in information theory.

Findings

01

Exponential growth of verification cost with orbit size

02

Theoretical bounds show non-gamifiability of adversaries

03

High false positive rates observed in initial tests

Abstract

Large language models (LLMs) increasingly generate code with minimal human oversight, raising critical concerns about backdoor injection and malicious behavior. We present Cross-Trace Verification Protocol (CTVP), a novel AI control framework that verifies untrusted code-generating models through semantic orbit analysis. Rather than directly executing potentially malicious code, CTVP leverages the model's own predictions of execution traces across semantically equivalent program transformations. By analyzing consistency patterns in these predicted traces, we detect behavioral anomalies indicative of backdoors. Our approach introduces the Adversarial Robustness Quotient (ARQ), which quantifies the computational cost of verification relative to baseline generation, demonstrating exponential growth with orbit size. Theoretical analysis establishes information-theoretic bounds showing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Software Engineering Research