ELITE: Experiential Learning and Intent-Aware Transfer for Self-improving Embodied Agents
Bingqing Wei, Zhongyu Xia, Dingai Liu, Xiaoyu Zhou, Zhiwei Lin, Yongtao Wang

TL;DR
ELITE is a framework that enables embodied agents to learn from their environment and transfer knowledge to similar tasks, significantly improving their performance in complex, real-world scenarios.
Contribution
The paper introduces ELITE, a novel embodied agent framework that combines experiential learning and intent-aware transfer to enhance task execution and generalization.
Findings
ELITE improves performance by 9% on EB-ALFRED benchmark.
ELITE achieves 5% higher success rate on EB-Habitat benchmark.
ELITE generalizes well to unseen task categories.
Abstract
Vision-language models (VLMs) have shown remarkable general capabilities, yet embodied agents built on them fail at complex tasks, often skipping critical steps, proposing invalid actions, and repeating mistakes. These failures arise from a fundamental gap between the static training data of VLMs and the physical interaction for embodied tasks. VLMs can learn rich semantic knowledge from static data but lack the ability to interact with the world. To address this issue, we introduce ELITE, an embodied agent framework with {E}xperiential {L}earning and {I}ntent-aware {T}ransfer that enables agents to continuously learn from their own environment interaction experiences, and transfer acquired knowledge to procedurally similar tasks. ELITE operates through two synergistic mechanisms, \textit{i.e.,} self-reflective knowledge construction and intent-aware retrieval. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
