SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Haoquan Fang, Markus Grotz, Wilbert Pumacay, Yi Ru Wang, Dieter Fox, Ranjay Krishna, Jiafei Duan

TL;DR
SAM2Act integrates visual foundation models with a transformer-based policy to improve robotic manipulation across diverse tasks, with SAM2Act+ further enhancing spatial memory capabilities and outperforming existing methods.
Contribution
The paper introduces SAM2Act, a novel multi-view transformer policy leveraging foundation models, and SAM2Act+, a memory-augmented architecture with a new benchmark for memory-dependent tasks.
Findings
Achieves 86.8% success rate on RLBench tasks.
Demonstrates robust generalization with only 4.3% performance gap.
Attains 94.3% success rate on MemoryBench for memory tasks.
Abstract
Robotic manipulation systems operating in diverse, dynamic environments must exhibit three critical abilities: multitask interaction, generalization to unseen scenarios, and spatial memory. While significant progress has been made in robotic manipulation, existing approaches often fall short in generalization to complex environmental variations and addressing memory-dependent tasks. To bridge this gap, we introduce SAM2Act, a multi-view robotic transformer-based policy that leverages multi-resolution upsampling with visual representations from large-scale foundation model. SAM2Act achieves a state-of-the-art average success rate of 86.8% across 18 tasks in the RLBench benchmark, and demonstrates robust generalization on The Colosseum benchmark, with only a 4.3% performance gap under diverse environmental perturbations. Building on this foundation, we propose SAM2Act+, a memory-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotics and Automated Systems · Robotic Path Planning Algorithms
