Long-Term Memory for VLA-based Agents in Open-World Task Execution
Xu Huang, Weixin Mao, Yinhao Li, Hua Chen, Jiabao Zhao

TL;DR
ChemBot is a hierarchical, memory-augmented AI framework that enhances long-term reasoning and task execution in complex chemical laboratory automation using vision-language-action models.
Contribution
It introduces ChemBot, a dual-layer, closed-loop system with a memory architecture and asynchronous inference to improve VLA-based agent performance in long-horizon tasks.
Findings
ChemBot outperforms existing VLA baselines in safety, precision, and success rates.
The dual-layer memory consolidates successful strategies for future use.
Asynchronous inference mitigates trajectory discontinuities in complex tasks.
Abstract
Vision-Language-Action (VLA) models have demonstrated significant potential for embodied decision-making; however, their application in complex chemical laboratory automation remains restricted by limited long-horizon reasoning and the absence of persistent experience accumulation. Existing frameworks typically treat planning and execution as decoupled processes, often failing to consolidate successful strategies, which results in inefficient trial-and-error in multi-stage protocols. In this paper, we propose ChemBot, a dual-layer, closed-loop framework that integrates an autonomous AI agent with a progress-aware VLA model (Skill-VLA) for hierarchical task decomposition and execution. ChemBot utilizes a dual-layer memory architecture to consolidate successful trajectories into retrievable assets, while a Model Context Protocol (MCP) server facilitates efficient sub-agent and tool…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
