Notes-to-Self: Scratchpad Augmented VLAs for Memory Dependent Manipulation Tasks
Sanjay Haresh, Daniel Dijkman, Apratim Bhattacharyya, Roland Memisevic

TL;DR
This paper introduces a language scratchpad to augment vision-language-action models with spatial and temporal memory, significantly improving their ability to handle memory-dependent manipulation tasks in robotics.
Contribution
It proposes a novel memory augmentation method using a language scratchpad for VLAs, enabling better handling of non-markovian, memory-dependent tasks.
Findings
Enhanced generalization on memory-dependent tasks
Improved performance in real-world pick-and-place tasks
Effective memory retention in both non-recurrent and recurrent models
Abstract
Many dexterous manipulation tasks are non-markovian in nature, yet little attention has been paid to this fact in the recent upsurge of the vision-language-action (VLA) paradigm. Although they are successful in bringing internet-scale semantic understanding to robotics, existing VLAs are primarily "stateless" and struggle with memory-dependent long horizon tasks. In this work, we explore a way to impart both spatial and temporal memory to a VLA by incorporating a language scratchpad. The scratchpad makes it possible to memorize task-specific information, such as object positions, and it allows the model to keep track of a plan and progress towards subgoals within that plan. We evaluate this approach on a split of memory-dependent tasks from the ClevrSkills environment, on MemoryBench, as well as on a challenging real-world pick-and-place task. We show that incorporating a language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
