A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions

Stephen Mell; Botong Zhang; David Mell; Shuo Li; Ramya Ramalingam; Nathan Yu; Steve Zdancewic; Osbert Bastani

arXiv:2506.12202·cs.PL·June 17, 2025

A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions

Stephen Mell, Botong Zhang, David Mell, Shuo Li, Ramya Ramalingam, Nathan Yu, Steve Zdancewic, Osbert Bastani

PDF

Open Access 3 Reviews

TL;DR

The paper introduces Quasar, a new programming language designed for LLM agents to perform code actions more securely, reliably, and efficiently, with automatic parallelization and uncertainty quantification.

Contribution

It presents Quasar, a novel language for code actions that enhances performance, security, and reliability of LLM agents compared to using Python.

Findings

01

Reduced execution time by 42%

02

Decreased user approval interactions by 52%

03

Achieved targeted reliability with conformal prediction

Abstract

Modern large language models (LLMs) are often deployed as agents, calling external tools adaptively to solve tasks. Rather than directly calling tools, it can be more effective for LLMs to write code to perform the tool calls, enabling them to automatically generate complex control flow such as conditionals and loops. Such code actions are typically provided as Python code, since LLMs are quite proficient at it; however, Python may not be the ideal language due to limited built-in support for performance, security, and reliability. We propose a novel programming language for code actions, called Quasar, which has several benefits: (1) automated parallelization to improve performance, (2) uncertainty quantification to improve reliability and mitigate hallucinations, and (3) security features enabling the user to validate actions. LLMs can write code in a subset of Python, which is…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 2

Strengths

+ The rewrite-rule semantics and external call dispatch mechanism are rigorously formalized. + The ability to propagate model uncertainty at the program level is a novel contribution that could inspire future work on trustworthy agent execution. + The use of a Python subset and a transpiler ensures backward compatibility with current LLMs, addressing real-world deployability concerns (without performance degradation).

Weaknesses

- The paper does not specify how QUASAR manages external call failures, exceptions, or thread-level errors. For example, what happens if an external API call fails, times out, or returns an invalid response? Is the failure propagated, retried, or absorbed? - While QUASAR executes external calls “as soon as all their arguments are available,” it is not clear whether “futures” or deferred results are explicitly represented in the language. How does the interpreter manage dependencies among pendin

Reviewer 02Rating 6Confidence 2

Strengths

- Designing an LLM-native programming language for code generation action is innovative and promising. - QUASAR introduces a pure functional core that separates computation from side effects. This separation allows deterministic execution, simplifies formal reasoning, and makes program behavior easier to verify and audit. - The runtime system can automatically detect independent external calls and execute them concurrently. Experiments show up to 56% reduction in total execution time, demonstra

Weaknesses

- **Narrow evaluation scope:** The experiments are confined to small, synthetic benchmarks (GQA and AgentDojo). These tasks are short and prestructured, which limits the external validity of the claims. There is no evaluation in complex or dynamic environments that real LLM agents operate in. - **Limited language expressiveness:** QUASAR only supports a very restricted subset of Python (functions, variables, simple control flow). It does not handle classes, exceptions, pattern matching, or early

Reviewer 03Rating 4Confidence 4

Strengths

- The paper effectively identifies challenges in LLM-based agents which write Python code to invoke tool APIs, and presents a practical solution through QUASAR. - The "internal computation - external side effects separation" architecture and the introduction of conformal semantics are novel and offer significant advantages in performance, security, and reliability. - Experiments on real-world agents like ViperGPT and CaMeL, covering performance, security, and reliability, demonstrate the practic

Weaknesses

- Lack of Detailed Technical Explanation: The paper lacks in-depth descriptions of key components like QUASAR’s rewrite rules, Python subset syntax, and transpiler implementation, which could impact reproducibility and understanding. - Flexibility Concerns in Tool-Calling Scenarios: While QUASAR improves upon Python in certain areas, there is a concern about whether it can maintain the same flexibility as Python in all tool-calling scenarios. Python’s ecosystem is rich with libraries that facili

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Agent-Based Network Management · Multi-Agent Systems and Negotiation · Distributed and Parallel Computing Systems