CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation
Shangning Xia, Hongjie Fang, Cewu Lu, Hao-Shu Fang

TL;DR
CAGE introduces a causal attention-based robotic manipulation policy that, with minimal demonstrations, generalizes effectively across diverse environments, outperforming existing methods in real-world manipulation tasks.
Contribution
The paper presents CAGE, a novel manipulation policy integrating causal attention, vision foundation models, and diffusion-based action prediction for improved generalization with limited data.
Findings
Achieves 42% increase in task completion rate over state-of-the-art methods.
Attains 43% completion and 51% success rates in unseen environments.
Outperforms existing RGB/RGB-D approaches under large distribution shifts.
Abstract
Generalization in robotic manipulation remains a critical challenge, particularly when scaling to new environments with limited demonstrations. This paper introduces CAGE, a novel robotic manipulation policy designed to overcome these generalization barriers by integrating a causal attention mechanism. CAGE utilizes the powerful feature extraction capabilities of the vision foundation model DINOv2, combined with LoRA fine-tuning for robust environment understanding. The policy further employs a causal Perceiver for effective token compression and a diffusion-based action prediction head with attention mechanisms to enhance task-specific fine-grained conditioning. With as few as 50 demonstrations from a single training environment, CAGE achieves robust generalization across diverse visual changes in objects, backgrounds, and viewpoints. Extensive experiments validate that CAGE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Machine Learning and Algorithms · Computability, Logic, AI Algorithms
