TL;DR
Stellar VLA introduces a parameter-efficient continual learning framework for vision-language-action models, enabling knowledge accumulation and task specialization without increasing model size, validated on benchmarks and real-world platforms.
Contribution
It proposes a novel knowledge-driven continual learning approach with hierarchical variants that enhance scalability and knowledge transfer in VLA models.
Findings
Achieves strong performance with only 1% data replay on LIBERO benchmark.
Enables effective knowledge transfer in real-world dual-arm manipulation.
Visualizations show robust knowledge retention and task discovery.
Abstract
Vision-language-action (VLA) models show promising knowledge accumulation ability from pretraining, yet continual learning in VLA remains challenging, especially for efficient adaptation. Existing continual imitation learning (CIL) methods often rely on additional parameters or external modules, limiting scalability for large VLA models. We propose Stellar VLA, a knowledge-driven CIL framework without increasing network parameters. Two progressively extended variants are designed: T-Stellar for flat task-centric modeling and TS-Stellar for hierarchical task-skill structure. Stellar VLA enables self-evolving knowledge learning by jointly optimizing task representations and a learned knowledge space. We propose a knowledge-guided expert routing mechanism conditioned on knowledge relation and Top-K semantic embeddings, enabling task specialization without increasing model size. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
