TL;DR
SeedPolicy introduces a novel temporal module, SEGA, that enhances diffusion policies for robot manipulation, significantly improving long-horizon task performance with efficient computation.
Contribution
The paper presents Self-Evolving Gated Attention (SEGA), a new temporal module integrated into diffusion policies to extend their effective temporal horizon in robotic manipulation tasks.
Findings
SeedPolicy outperforms diffusion policies and IL baselines on RoboTwin 2.0.
Achieves 36.8% relative improvement in clean settings.
Demonstrates strong efficiency with fewer parameters than vision-language models.
Abstract
Imitation Learning (IL) enables robots to acquire manipulation skills from expert demonstrations. Diffusion Policy (DP) models multi-modal expert behaviors but degrades when naively increasing stacked observation horizons, limiting long-horizon manipulation. We propose Self-Evolving Gated Attention (SEGA), a temporal module that maintains a time-evolving latent state via gated attention, enabling efficient recurrent updates that accumulate long-term context into a compact latent representation while filtering irrelevant temporal information. Integrating SEGA into DP yields Self-Evolving Diffusion Policy (SeedPolicy), which resolves the temporal modeling bottleneck and extends the effective temporal horizon with moderate overhead. On the RoboTwin 2.0 benchmark with 50 manipulation tasks, SeedPolicy outperforms DP and other IL baselines. Averaged across both CNN and Transformer backbones,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
