O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
Piaohong Wang, Motong Tian, Jiaxian Li, Yuan Liang, Yuqing Wang, Qianben Chen, Tiannan Wang, Zhicong Lu, Jiawei Ma, Yuchen Eleanor Jiang, Wangchunshu Zhou

TL;DR
O-Mem is a novel memory system for personalized, long-term AI agents that dynamically updates user profiles and context, significantly improving response coherence and efficiency in complex interactions.
Contribution
This paper introduces O-Mem, a hierarchical, active user profiling memory framework that enhances personalization and contextual consistency in long-horizon AI interactions.
Findings
Achieves 51.67% on LoCoMo benchmark, 3% above state-of-the-art.
Attains 62.99% on PERSONAMEM, 3.5% higher than previous best.
Improves token and response time efficiency over prior memory systems.
Abstract
Recent advancements in LLM-powered agents have demonstrated significant potential in generating human-like responses; however, they continue to face challenges in maintaining long-term interactions within complex environments, primarily due to limitations in contextual consistency and dynamic personalization. Existing memory systems often depend on semantic grouping prior to retrieval, which can overlook semantically irrelevant yet critical user information and introduce retrieval noise. In this report, we propose the initial design of O-Mem, a novel memory framework based on active user profiling that dynamically extracts and updates user characteristics and event records from their proactive interactions with agents. O-Mem supports hierarchical retrieval of persona attributes and topic-related context, enabling more adaptive and coherent personalized responses. O-Mem achieves 51.67%…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The combination of memory systems with dynamic user profiling is an interesting and promising direction. 2. The paper is well-structured and clearly written, making it easy to follow the technical concepts and their practical applications.
1. The main innovation in the paper appears to be the creation of a dedicated persona memory system that retrieves information based on user attributes. This seems like an incremental improvement over existing systems, with limited novel insights or breakthroughs. The contribution could be seen as a modification. 2. The paper primarily evaluates the framework on only two models—GPT-4.1 and GPT-4o-mini. Expanding the evaluation to include other model families ( both open-source and closed-source
• A new dataset is introduced • The designed memory system differentiates user past histories into three different types which aligns with personalization purpose
• The presentation of this paper needs to be improved, e.g., the tables in experiment sections are sparse; a lot of content such as in line209-214 in Section 3 should be moved to appendix; the formulas in Section 3.2 and Section 3.3 are not necessary (they can either be put into appendix or in line with text), lacking more organized descriptions; in line 34-41, this would better be put into related work (intro would better contain more conclusion-like statements to explain current solutions to t
1. The motivation of the paper is clear: it clearly illustrates why existing chunk-based or semantic retrieval memories fail at dynamic personalization and positions O-Mem as a principled solution. 2. The integration of persona, episodic, and working memories into a unified retrieval pipeline (Eqs. 8–12) with active LLM-driven updates is interesting and reasonable. 3. Experiments across three datasets (LoCoMo, PERSONAMEM, Personalized Deep Research Bench) show consistent gains.
1. Several metrics rely on “LLM-as-a-Judge,” which introduces bias; no human validation or inter-rater reliability checks are reported. 2. The efficiency and ablation results are strong, but in-depth qualitative analyses of what the model remembers or how errors occur are missing. 3. More discussion on how often the persona-update operation (Op(ai) / Op(ei)) introduces noise or incorrect updates, and how error accumulation is mitigated, is not thoroughly discussed.
- Clear motivation and good problem definition. - The tri-memory structure (working, episodic, persona) is easy to understand. - Empirical results are consistent across several benchmarks and show clear efficiency improvements (up to 80% lower latency and 94% fewer tokens). - The token-controlled ablation is a nice touch to show that gains are not simply due to longer context. - Well-written and logically structured paper; figures and tables support the narrative.
- Episodic retrieval is defined by selecting a single clue word, but the paper does not specify how it behaves for unseen or multi-word clues. - The retrieval hyperparameters (top-k, thresholds, similarity cutoffs) are not clearly stated. Since efficiency is a main claim, this missing detail makes replication difficult. - The new Personalized Deep Research Bench dataset is relatively small and not publicly available, which limits the impact towards the community. - Reproducibility: while the pap
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Social Robot Interaction and HRI · Artificial Intelligence in Games
