A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions
Stephen Mell, Botong Zhang, David Mell, Shuo Li, Ramya Ramalingam, Nathan Yu, Steve Zdancewic, Osbert Bastani

TL;DR
The paper introduces Quasar, a new programming language designed for LLM agents to perform code actions more securely, reliably, and efficiently, with automatic parallelization and uncertainty quantification.
Contribution
It presents Quasar, a novel language for code actions that enhances performance, security, and reliability of LLM agents compared to using Python.
Findings
Reduced execution time by 42%
Decreased user approval interactions by 52%
Achieved targeted reliability with conformal prediction
Abstract
Modern large language models (LLMs) are often deployed as agents, calling external tools adaptively to solve tasks. Rather than directly calling tools, it can be more effective for LLMs to write code to perform the tool calls, enabling them to automatically generate complex control flow such as conditionals and loops. Such code actions are typically provided as Python code, since LLMs are quite proficient at it; however, Python may not be the ideal language due to limited built-in support for performance, security, and reliability. We propose a novel programming language for code actions, called Quasar, which has several benefits: (1) automated parallelization to improve performance, (2) uncertainty quantification to improve reliability and mitigate hallucinations, and (3) security features enabling the user to validate actions. LLMs can write code in a subset of Python, which is…
Peer Reviews
Decision·Submitted to ICLR 2026
+ The rewrite-rule semantics and external call dispatch mechanism are rigorously formalized. + The ability to propagate model uncertainty at the program level is a novel contribution that could inspire future work on trustworthy agent execution. + The use of a Python subset and a transpiler ensures backward compatibility with current LLMs, addressing real-world deployability concerns (without performance degradation).
- The paper does not specify how QUASAR manages external call failures, exceptions, or thread-level errors. For example, what happens if an external API call fails, times out, or returns an invalid response? Is the failure propagated, retried, or absorbed? - While QUASAR executes external calls “as soon as all their arguments are available,” it is not clear whether “futures” or deferred results are explicitly represented in the language. How does the interpreter manage dependencies among pendin
- Designing an LLM-native programming language for code generation action is innovative and promising. - QUASAR introduces a pure functional core that separates computation from side effects. This separation allows deterministic execution, simplifies formal reasoning, and makes program behavior easier to verify and audit. - The runtime system can automatically detect independent external calls and execute them concurrently. Experiments show up to 56% reduction in total execution time, demonstra
- **Narrow evaluation scope:** The experiments are confined to small, synthetic benchmarks (GQA and AgentDojo). These tasks are short and prestructured, which limits the external validity of the claims. There is no evaluation in complex or dynamic environments that real LLM agents operate in. - **Limited language expressiveness:** QUASAR only supports a very restricted subset of Python (functions, variables, simple control flow). It does not handle classes, exceptions, pattern matching, or early
- The paper effectively identifies challenges in LLM-based agents which write Python code to invoke tool APIs, and presents a practical solution through QUASAR. - The "internal computation - external side effects separation" architecture and the introduction of conformal semantics are novel and offer significant advantages in performance, security, and reliability. - Experiments on real-world agents like ViperGPT and CaMeL, covering performance, security, and reliability, demonstrate the practic
- Lack of Detailed Technical Explanation: The paper lacks in-depth descriptions of key components like QUASAR’s rewrite rules, Python subset syntax, and transpiler implementation, which could impact reproducibility and understanding. - Flexibility Concerns in Tool-Calling Scenarios: While QUASAR improves upon Python in certain areas, there is a concern about whether it can maintain the same flexibility as Python in all tool-calling scenarios. Python’s ecosystem is rich with libraries that facili
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Agent-Based Network Management · Multi-Agent Systems and Negotiation · Distributed and Parallel Computing Systems
