TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation
Xinliang Frederick Zhang, Lu Wang

TL;DR
TSUBASA enhances long-horizon personalization in large language models by evolving memory and self-learning, outperforming existing methods in efficiency and quality on benchmark tasks.
Contribution
Introduces a novel approach combining dynamic memory evolution and context distillation for improved long-term personalization in PLLMs.
Findings
TSUBASA surpasses Mem0 and Memory-R1 in long-horizon benchmarks.
Achieves Pareto improvements in quality and efficiency.
Reduces token budget for high-fidelity personalization.
Abstract
Personalized large language models (PLLMs) have garnered significant attention for their ability to align outputs with individual's needs and preferences. However, they still struggle with long-horizon tasks, such as tracking a user's extensive history of conversations or activities. Existing memory mechanisms often fail to capture evolving behaviors, and RAG paradigms are trapped by a quality-efficiency tradeoff. Meanwhile, parametric adaptation is bottlenecked by train-inference gap due to the scarcity of labeled data. To enhance the long-horizon capabilities of PLLMs, we introduce TSUBASA, a two-pronged approach designed to improve memory writing via dynamic memory evolution, and memory reading via self-learning with a context distillation objective to internalize user experiences. Extensive evaluations on long-horizon benchmarks using the Qwen-3 model family (4B to 32B) validate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
