VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory

Yuheng Lei; Zhixuan Liang; Hongyuan Zhang; Ping Luo

arXiv:2603.04910·cs.RO·March 6, 2026

VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory

Yuheng Lei, Zhixuan Liang, Hongyuan Zhang, Ping Luo

PDF

Open Access

TL;DR

VPWEM introduces a non-Markovian visuomotor policy with working and episodic memories, enabling robotic control systems to handle long-term dependencies efficiently and outperform existing methods in memory-intensive tasks.

Contribution

The paper presents VPWEM, a novel policy combining working and episodic memories with a Transformer-based compressor, addressing long-term memory challenges in robotic visuomotor tasks.

Findings

01

Outperforms state-of-the-art baselines by over 20% on manipulation tasks.

02

Achieves 5% improvement on the MoMaRT mobile manipulation benchmark.

03

Operates with nearly constant memory and computation per step.

Abstract

Imitation learning from human demonstrations has achieved significant success in robotic control, yet most visuomotor policies still condition on single-step observations or short-context histories, making them struggle with non-Markovian tasks that require long-term memory. Simply enlarging the context window incurs substantial computational and memory costs and encourages overfitting to spurious correlations, leading to catastrophic failures under distribution shift and violating real-time constraints in robotic systems. By contrast, humans can compress important past experiences into long-term memories and exploit them to solve tasks throughout their lifetime. In this paper, we propose VPWEM, a non-Markovian visuomotor policy equipped with working and episodic memories. VPWEM retains a sliding window of recent observation tokens as short-term working memory, and introduces a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Multimodal Machine Learning Applications