Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution
Boyang Yan

TL;DR
This paper introduces a fault-tolerant sandboxing framework for AI coding agents that ensures safety through transactional execution and policy-based interception, enabling autonomous operation with minimal latency overhead.
Contribution
It proposes a novel transactional sandboxing approach combining policy interception and filesystem snapshots to enhance safety for autonomous AI agents.
Findings
100% interception of high-risk commands
100% rollback success for failed states
14.5% performance overhead per transaction
Abstract
The transition of Large Language Models (LLMs) from passive code generators to autonomous agents introduces significant safety risks, specifically regarding destructive commands and inconsistent system states. Existing commercial solutions often prioritize interactive user safety, enforcing authentication barriers that break the headless loops required for true autonomy. This paper presents a Fault-Tolerant Sandboxing framework designed to mitigate these risks through a policy-based interception layer and a transactional filesystem snapshot mechanism. We hypothesize that wrapping agent actions in atomic transactions can guarantee safety with acceptable latency, outperforming the heavy initialization overhead of containers or the interactive friction of commercial CLIs. We validated this approach by deploying the Minimind-MoE LLM served via nano-vllm on a custom Proxmox-based testbed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Distributed systems and fault tolerance · Blockchain Technology Applications and Security
