Leveraging the Power of Conversations: Optimal Key Term Selection in Conversational Contextual Bandits
Maoli Liu, Zhuohua Li, Xiangxiang Dai, John C.S. Lui

TL;DR
This paper introduces three novel algorithms for conversational contextual bandits that improve preference learning exploration and conversation initiation, achieving near-optimal regret bounds and significant empirical performance gains.
Contribution
The paper proposes CLiSK, CLiME, and CLiSK-ME algorithms with theoretical regret bounds and practical improvements for conversational bandit systems.
Findings
Achieve a regret upper bound of $O( oot{2}dT ext{log}T)$, tighter than previous methods.
Establish a matching lower bound of $ oot{2}dT$, showing near-minimax optimality.
Improve cumulative regret by at least 14.6% on real-world datasets.
Abstract
Conversational recommender systems proactively query users with relevant "key terms" and leverage the feedback to elicit users' preferences for personalized recommendations. Conversational contextual bandits, a prevalent approach in this domain, aim to optimize preference learning by balancing exploitation and exploration. However, several limitations hinder their effectiveness in real-world scenarios. First, existing algorithms employ key term selection strategies with insufficient exploration, often failing to thoroughly probe users' preferences and resulting in suboptimal preference estimation. Second, current algorithms typically rely on deterministic rules to initiate conversations, causing unnecessary interactions when preferences are well-understood and missed opportunities when preferences are uncertain. To address these limitations, we propose three novel algorithms: CLiSK,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
