Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics

Victor May; Aaditya Salgarkar; Yishan Wang; Diganta Misra; Huu Nguyen

arXiv:2603.01209·cs.AI·March 6, 2026

Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics

Victor May, Aaditya Salgarkar, Yishan Wang, Diganta Misra, Huu Nguyen

PDF

Open Access 3 Datasets

TL;DR

This paper investigates how the persistence of interpreter state during training influences the behavior and efficiency of language model agents in tool-augmented tasks, revealing that aligning training and runtime semantics improves performance.

Contribution

It introduces a novel experimental framework isolating interpreter persistence as a training-time variable and demonstrates its impact on agent behavior and efficiency.

Findings

01

Persistent-trained models better handle multi-turn control flow.

02

Statistically similar solution success rates across conditions.

03

Token efficiency and stability are significantly affected by training-runtime alignment.

Abstract

Tool-augmented LLMs are increasingly deployed as agents that interleave natural-language reasoning with executable Python actions, as in CodeAct-style frameworks. In deployment, these agents rely on runtime state that persists across steps. By contrast, the traces used to post-train these models rarely encode how interpreter state is managed. We ask whether interpreter persistence is merely a runtime scaffold, or a property of the training data that shapes how agents learn to use the interpreter. We isolate state persistence as a training-time variable. We introduce Opaque Knapsack, a procedurally generated family of partially observable optimization tasks designed to prevent one-shot solutions. Item attributes and constraints are hidden behind budgeted tool calls, forcing multi-turn control flow and iterative state revision. Holding task instances, prompts, tools, model, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Machine Learning and Data Classification