XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Guanyu Jiang, Zhaochen Su, Xiaoye Qu, Yi R. Fung

TL;DR
XSkill introduces a dual-stream continual learning framework for multimodal agents that leverages experiences and skills grounded in visual observations to improve tool use and reasoning without parameter updates.
Contribution
It proposes a novel dual-stream framework for continual learning from experience and skills, grounded in visual observations, enhancing multimodal agent reasoning and generalization.
Findings
XSkill outperforms baselines across five benchmarks.
The two knowledge streams complement each other in reasoning.
XSkill shows superior zero-shot generalization.
Abstract
Multimodal agents can now tackle complex reasoning tasks with diverse tools, yet they still suffer from inefficient tool use and inflexible orchestration in open-ended settings. A central challenge is enabling such agents to continually improve without parameter updates by learning from past trajectories. We identify two complementary forms of reusable knowledge essential for this goal: experiences, providing concise action-level guidance for tool selection and decision making, and skills, providing structured task-level guidance for planning and tool use. To this end, we propose XSkill, a dual-stream framework for continual learning from experience and skills in multimodal agents. XSkill grounds both knowledge extraction and retrieval in visual observations. During accumulation, XSkill distills and consolidates experiences and skills from multi-path rollouts via visually grounded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics
