TL;DR
This paper introduces SiPeR, a framework for reasoning about dynamic and implicit user preferences in situated conversational recommendation, improving accuracy and response quality by integrating scene transition estimation and Bayesian inference.
Contribution
The paper presents a novel framework, SiPeR, that enhances situated conversational recommendation by explicitly modeling scene satisfaction and user preferences using multimodal large language models.
Findings
SiPeR outperforms baselines in recommendation accuracy.
SiPeR improves response generation quality.
Experiments validate the effectiveness of scene transition estimation and Bayesian inference.
Abstract
Situated conversational recommendation (SCR), which utilizes visual scenes grounded in specific environments and natural language dialogue to deliver contextually appropriate recommendations, has emerged as a promising research direction due to its close alignment with real-world scenarios. Compared to traditional recommendations, SCR requires a deeper understanding of dynamic and implicit user preferences, as the surrounding scene often influences users' underlying interests, while both may evolve across conversations. This complexity significantly impacts the timing and relevance of recommendations. To address this, we propose situated preference reasoning (SiPeR), a novel framework that integrates two core mechanisms: (1) Scene transition estimation, which estimates whether the current scene satisfies user needs, and guides the user toward a more suitable scene when necessary; and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
