TL;DR
Ella is a novel embodied social agent with a structured lifelong multimodal memory system that enables autonomous learning, social interaction, and decision-making in a complex 3D open world, demonstrating advanced social capabilities.
Contribution
Introducing Ella, the first embodied agent integrating structured long-term memory with foundation models for lifelong learning and social interaction in a 3D environment.
Findings
Ella effectively learns from visual observations and social interactions.
Ella can influence, lead, and cooperate with other agents.
The memory system enhances decision-making and autonomous evolution.
Abstract
We introduce Ella, an embodied social agent capable of lifelong learning within a community in a 3D open world, where agents accumulate experiences and acquire knowledge through everyday visual observations and social interactions. At the core of Ella's capabilities is a structured, long-term multimodal memory system that stores, updates, and retrieves information effectively. It consists of a name-centric semantic memory for organizing acquired knowledge and a spatiotemporal episodic memory for capturing multimodal experiences. By integrating this lifelong memory system with foundation models, Ella retrieves relevant information for decision-making, plans daily activities, builds social relationships, and evolves autonomously while coexisting with other intelligent beings in the open world. We conduct capability-oriented evaluations in a dynamic 3D open world where 15 agents engage in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
