TL;DR
STAR introduces a novel framework for learning diverse robot skill abstractions by addressing codebook collapse and modeling skill dependencies, leading to improved performance in manipulation tasks.
Contribution
It proposes rotation-augmented residual skill quantization and a causal skill transformer to enhance skill diversity and causal understanding in robotic manipulation.
Findings
Achieves around 12% improvement over baselines on LIBERO benchmark.
Effectively prevents codebook collapse with rotation-augmented residual skill quantization.
Models skill dependencies explicitly through a causal skill transformer.
Abstract
Transforming complex actions into discrete skill abstractions has demonstrated strong potential for robotic manipulation. Existing approaches mainly leverage latent variable models, e.g., VQ-VAE, to learn skill abstractions through learned vectors (codebooks), while they suffer from codebook collapse and modeling the causal relationship between learned skills. To address these limitations, we present \textbf{S}kill \textbf{T}raining with \textbf{A}ugmented \textbf{R}otation (\textbf{STAR}), a framework that advances both skill learning and composition to complete complex behaviors. Specifically, to prevent codebook collapse, we devise rotation-augmented residual skill quantization (RaRSQ). It encodes relative angles between encoder outputs into the gradient flow by rotation-based gradient mechanism. Points within the same skill code are forced to be either pushed apart or pulled closer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
