Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization
Taeyoon Kwon, Dongwook Choi, Hyojun Kim, Sunghwan Kim, Seungjun Moon, Beong-woo Kwak, Kuan-Hao Huang, Jinyoung Yeo

TL;DR
This paper examines the challenges of personalizing embodied agents using memory, identifying key bottlenecks, and proposing a hierarchical memory architecture that improves performance on personalized tasks.
Contribution
It introduces MEMENTO, a comprehensive evaluation framework, and proposes a hierarchical memory module to enhance personalization in embodied agents.
Findings
Agents recall object semantics well but struggle with user routines.
Information overload and coordination failures are key bottlenecks.
Hierarchical memory improves personalization and task performance.
Abstract
LLM-powered embodied agents have shown success on conventional object-rearrangement tasks, but providing personalized assistance that leverages user-specific knowledge from past interactions presents new challenges. We investigate these challenges through the lens of agents' memory utilization along two critical dimensions: object semantics (identifying objects based on personal meaning) and user patterns (recalling sequences from behavioral routines). To assess these capabilities, we construct MEMENTO, an end-to-end two-stage evaluation framework comprising single-memory and joint-memory tasks. Our experiments reveal that current agents can recall simple object semantics but struggle to apply sequential user patterns to planning. Through in-depth analysis, we identify two critical bottlenecks: information overload and coordination failures when handling multiple memories. Based on…
Peer Reviews
Decision·ICLR 2026 Poster
- The two-stage design (memory acquisition vs. utilization) provides a principled way to isolate memory effects from other capabilities, with identical goals but varying instructions across stages, enabling precise measurement of memory utilization through metrics. - The dataset construction process is rigorous with multiple quality controls. - The experimental analysis is comprehensive with insightful qualitative analysis
- The paper introduces the personalized object rearrangement task as a Partially Observable Markov Decision Process in Section 3.1. The actual agent implementation, however, is a hierarchical controller that uses an LLM as a high-level policy planner in a ReAct-style prompting format. The POMDP formalism and its associated equations feel disconnected from the practical, prompt-based system that is actually built and evaluated. The formalism is not explicitly used to derive the agent architecture
- The paper provides a systematic analysis of the memory utilization of LLM-powered agents for personalized scenarios, including multiple personal preferences. - Underperformance even with the top-k most relevant information for a specific personalization seems interesting. - The quantitative comparisons are conducted through diverse open- and closed-sourced LLMs.
- It is unclear why the problem addressed in this paper is specifically related to embodied agents. Given that the issues (Sec. 3) and underperformances (Sec. 4 and 5) come from the incapabilities of LLMs, it looks more related to LLMs than embodied agents. - As it is explicitly given how some objects and routines are to be described, the problem addressed in this paper is rather related to using the context, which should not necessarily have to be personalization, provided well. In this sense,
- The work addresses an interesting problem of personalization of embodied agents. Eventually, household agents would more likely than not exist around a user, and would require specializing to their needs. Memory is an interesting and useful component; the analysis done in this work can be useful guidance for future work. - The paper is very well written and makes it easy to follow and understand the results and the analysis presented. - I liked the idea of the graph-based hierarchical memory
- In Section 5.1, an analysis is conducted, which ablates the value of k, which is the number of retrieved memories. It seems to show that as the value increases how the performance of various agents decreases, credit to information overload. - However, consider this: the overload is also indicative of the LLM’s lack of a longer context, which, once it goes beyond a certain number of memories, is unable to grasp and summarize this information, and hence resorts to general semantic knowledge,
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Social Robot Interaction and HRI · Multimodal Machine Learning Applications
MethodsFocus
