Latent Preference Modeling for Cross-Session Personalized Tool Calling
Yejin Yoon, Minseo Kim, Taeuk Kim

TL;DR
This paper introduces MPT, a benchmark for personalized tool calling in multi-session dialogues, and proposes PRefine, a memory-augmented method that enhances tool accuracy by modeling user preferences as evolving hypotheses.
Contribution
The paper presents a new benchmark and a novel test-time memory-augmented approach for personalized tool calling in multi-session dialogues.
Findings
PRefine improves tool-calling accuracy using only 1.24% of tokens compared to full-history prompts.
MPT benchmark covers three key challenges: Preference Recall, Induction, and Transfer.
Memory capturing user preference reasons enhances personalization in agent systems.
Abstract
Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265 multi-session dialogues that cover three challenges: Preference Recall, Preference Induction, and Preference Transfer. We also propose PRefine, a test-time memory-augmented method that represents user preferences as evolving hypotheses. Through a generate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
