Sharing State Between Prompts and Programs

Ellie Y. Cheng; Logan Weber; Tian Jin; Michael Carbin

arXiv:2512.14805·cs.PL·March 18, 2026

Sharing State Between Prompts and Programs

Ellie Y. Cheng, Logan Weber, Tian Jin, Michael Carbin

PDF

Open Access 3 Reviews

TL;DR

This paper introduces shared program state, a new abstraction that allows natural language prompts to directly access and manipulate program variables, improving interoperability between prompts and code in LLM-based programming systems.

Contribution

It proposes a schema for shared program state as a natural function interface and implements it in the Nightjar system, enabling prompts to share Python program state.

Findings

01

Nightjar achieves 4-19% higher task accuracy than manual code.

02

Nightjar reduces code lines by 39.6% on average.

03

Runtime overhead of Nightjar ranges from 0.4 to 4.3 times manual implementations.

Abstract

The rise of large language models (LLMs) has introduced a new type of programming: natural language programming. Users write prompts, which are instructions in natural language, to direct LLMs to perform tasks such as natural language processing, code generation, reasoning, etc. An emerging area of research enables interoperability between prompts and programs. We present a novel programming abstraction, shared program state, that removes the manual work required to enable interoperability between prompts and program states. With shared program state, programmers can write prompts that directly access program variables, compute with program objects, and implement control flow in the program. We present a schema for specifying natural function interfaces that extend programming systems to support programs with prompts and leverage this schema to specify shared program state as a…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 3

Strengths

I think the general idea of natural language programming interfaces is intriguing and has a rich history that goes well beyond the current LLM hype - indeed beyond deep learning.

Weaknesses

I am not convinced ICLR is the best avenue to publish this work. The empirical study is at best preliminary and I am not convinced at all that programmers would like to give up control and lose out on execution time for saving a few extra lines of code - with LLM-assisted code writing and analysis tools these come essentially for free. I suggest to greatly expand the scope of the empirical analysis, include evaluations on known public benchmarks as well as to incorporate qualitative feedback, a

Reviewer 02Rating 8Confidence 4

Strengths

- **Novel conceptual contribution**: The idea of exposing host-language state and control to an LLM in a principled and programmable way is novel. This moves beyond existing systems that treat LLMs as isolated components that make tool calls using programmer defined functions. - **Strong formal framework**: The authors present a formal framework for NFIs, including variable scopes, heap references, and control state. - **System implementation**: The NIGHTJAR system demonstrates that the abstract

Weaknesses

- **Limited empirical evaluation**: The evaluation on program pass rates and conciseness, while adequate as proof of concept, does not deeply explore scalability, robustness, or user experience. - **Benchmarks are synthetic**: SPSBench appears to consist mainly of small programs adapted from documentation examples. It would be great to have analyses on real-world user code or larger-scale applications. - **Safety and correctness**: Though acknowledged in discussion, the implications of allowing

Reviewer 03Rating 6Confidence 4

Strengths

- Generalisation of tool use to shared program state (memory, control) - competitive results even with what seems like a naive implementation

Weaknesses

- It is unclear how/where serialisation/data marshalling is to be implemented; it is also unclear if grammar-based sampling is compatible with the approach or if any such grammar file should be augmented to enable emission of effect tokens. - perhaps out of scope, but the handler loop seems unintuitive: if emitted effects are side-effect free wrt the program state that is observed by the NL code, why not stage an execution plan and delay the interrupts until the last moment possible to reduce ov

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, programming, and type systems · Software Engineering Research · Parallel Computing and Optimization Techniques