GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications
Shishir G. Patil, Tianjun Zhang, Vivian Fang, Noppapon C. and, Roy Huang, Aaron Hao, Martin Casado, Joseph E. Gonzalez, Raluca, Ada Popa, Ion Stoica

TL;DR
This paper introduces GoEX, a runtime system that enables safe autonomous LLM applications through post-facto validation, undo features, and damage confinement, reducing human oversight in real-world interactions.
Contribution
The paper presents the design and implementation of GoEX, an open-source runtime that facilitates safe autonomous LLM actions with minimal human supervision.
Findings
GoEX enables effective undo and damage control for LLM actions.
Post-facto validation simplifies verification of LLM outputs.
GoEX supports autonomous interaction with applications with limited human oversight.
Abstract
Large Language Models (LLMs) are evolving beyond their classical role of providing information within dialogue systems to actively engaging with tools and performing actions on real-world applications and services. Today, humans verify the correctness and appropriateness of the LLM-generated outputs (e.g., code, functions, or actions) before putting them into real-world execution. This poses significant challenges as code comprehension is well known to be notoriously difficult. In this paper, we study how humans can efficiently collaborate with, delegate to, and supervise autonomous LLMs in the future. We argue that in many cases, "post-facto validation" - verifying the correctness of a proposed action after seeing the output - is much easier than the aforementioned "pre-facto validation" setting. The core concept behind enabling a post-facto validation system is the integration of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems
