CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic   Manipulation

Shangning Xia; Hongjie Fang; Cewu Lu; Hao-Shu Fang

arXiv:2410.14974·cs.RO·December 9, 2024·2 cites

CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation

Shangning Xia, Hongjie Fang, Cewu Lu, Hao-Shu Fang

PDF

Open Access

TL;DR

CAGE introduces a causal attention-based robotic manipulation policy that, with minimal demonstrations, generalizes effectively across diverse environments, outperforming existing methods in real-world manipulation tasks.

Contribution

The paper presents CAGE, a novel manipulation policy integrating causal attention, vision foundation models, and diffusion-based action prediction for improved generalization with limited data.

Findings

01

Achieves 42% increase in task completion rate over state-of-the-art methods.

02

Attains 43% completion and 51% success rates in unseen environments.

03

Outperforms existing RGB/RGB-D approaches under large distribution shifts.

Abstract

Generalization in robotic manipulation remains a critical challenge, particularly when scaling to new environments with limited demonstrations. This paper introduces CAGE, a novel robotic manipulation policy designed to overcome these generalization barriers by integrating a causal attention mechanism. CAGE utilizes the powerful feature extraction capabilities of the vision foundation model DINOv2, combined with LoRA fine-tuning for robust environment understanding. The policy further employs a causal Perceiver for effective token compression and a diffusion-based action prediction head with attention mechanisms to enhance task-specific fine-grained conditioning. With as few as 50 demonstrations from a single training environment, CAGE achieves robust generalization across diverse visual changes in objects, backgrounds, and viewpoints. Extensive experiments validate that CAGE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Machine Learning and Algorithms · Computability, Logic, AI Algorithms