Long-Term Memory for VLA-based Agents in Open-World Task Execution

Xu Huang; Weixin Mao; Yinhao Li; Hua Chen; Jiabao Zhao

arXiv:2604.15671·cs.RO·April 20, 2026

Long-Term Memory for VLA-based Agents in Open-World Task Execution

Xu Huang, Weixin Mao, Yinhao Li, Hua Chen, Jiabao Zhao

PDF

TL;DR

ChemBot is a hierarchical, memory-augmented AI framework that enhances long-term reasoning and task execution in complex chemical laboratory automation using vision-language-action models.

Contribution

It introduces ChemBot, a dual-layer, closed-loop system with a memory architecture and asynchronous inference to improve VLA-based agent performance in long-horizon tasks.

Findings

01

ChemBot outperforms existing VLA baselines in safety, precision, and success rates.

02

The dual-layer memory consolidates successful strategies for future use.

03

Asynchronous inference mitigates trajectory discontinuities in complex tasks.

Abstract

Vision-Language-Action (VLA) models have demonstrated significant potential for embodied decision-making; however, their application in complex chemical laboratory automation remains restricted by limited long-horizon reasoning and the absence of persistent experience accumulation. Existing frameworks typically treat planning and execution as decoupled processes, often failing to consolidate successful strategies, which results in inefficient trial-and-error in multi-stage protocols. In this paper, we propose ChemBot, a dual-layer, closed-loop framework that integrates an autonomous AI agent with a progress-aware VLA model (Skill-VLA) for hierarchical task decomposition and execution. ChemBot utilizes a dual-layer memory architecture to consolidate successful trajectories into retrievable assets, while a Model Context Protocol (MCP) server facilitates efficient sub-agent and tool…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.