State and Memory is All You Need for Robust and Reliable AI Agents

Matthew Muhoberac; Atharva Parikh; Nirvi Vakharia; Saniya Virani; Aco Radujevic; Savannah Wood; Meghav Verma; Dimitri Metaxotos; Jeyaraman Soundararajan; Thierry Masquelin; Alexander G. Godfrey; Sean Gardner; Dobrila Rudnicki; Sam Michael; Gaurav Chopra

arXiv:2507.00081·cs.MA·July 2, 2025

State and Memory is All You Need for Robust and Reliable AI Agents

Matthew Muhoberac, Atharva Parikh, Nirvi Vakharia, Saniya Virani, Aco Radujevic, Savannah Wood, Meghav Verma, Dimitri Metaxotos, Jeyaraman Soundararajan, Thierry Masquelin, Alexander G. Godfrey, Sean Gardner, Dobrila Rudnicki, Sam Michael, Gaurav Chopra

PDF

TL;DR

This paper introduces SciBORG, a modular AI agent framework that uses finite-state automata memory for robust, reliable, and scalable autonomous scientific workflows, eliminating manual prompt engineering.

Contribution

The paper presents SciBORG, a novel framework combining LLMs with FSA memory for autonomous, context-aware scientific task execution, enhancing robustness and scalability.

Findings

01

Achieves reliable multi-step scientific task execution.

02

Enables context-aware decision making and recovery from failures.

03

Demonstrates effectiveness in physical and virtual hardware applications.

Abstract

Large language models (LLMs) have enabled powerful advances in natural language understanding and generation. Yet their application to complex, real-world scientific workflows remain limited by challenges in memory, planning, and tool integration. Here, we introduce SciBORG (Scientific Bespoke Artificial Intelligence Agents Optimized for Research Goals), a modular agentic framework that allows LLM-based agents to autonomously plan, reason, and achieve robust and reliable domain-specific task execution. Agents are constructed dynamically from source code documentation and augmented with finite-state automata (FSA) memory, enabling persistent state tracking and context-aware decision-making. This approach eliminates the need for manual prompt engineering and allows for robust, scalable deployment across diverse applications via maintaining context across extended workflows and to recover…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.