Conversational Contextual Bandit: Algorithm and Application
Xiaoying Zhang, Hong Xie, Hang Li, John C.S. Lui

TL;DR
This paper introduces the conversational contextual bandit framework and the ConUCB algorithm, which accelerates learning by incorporating conversational feedbacks, outperforming traditional methods in speed and efficiency.
Contribution
It generalizes traditional contextual bandits to include conversational feedbacks and proposes ConUCB, a novel algorithm with proven faster learning capabilities.
Findings
ConUCB achieves lower regret bounds than LinUCB.
Experiments show ConUCB outperforms traditional algorithms on real datasets.
Incorporating conversational feedback accelerates learning in recommender systems.
Abstract
Contextual bandit algorithms provide principled online learning solutions to balance the exploitation-exploration trade-off in various applications such as recommender systems. However, the learning speed of the traditional contextual bandit algorithms is often slow due to the need for extensive exploration. This poses a critical issue in applications like recommender systems, since users may need to provide feedbacks on a lot of uninterested items. To accelerate the learning speed, we generalize contextual bandit to conversational contextual bandit. Conversational contextual bandit leverages not only behavioral feedbacks on arms (e.g., articles in news recommendation), but also occasional conversational feedbacks on key-terms from the user. Here, a key-term can relate to a subset of arms, for example, a category of articles in news recommendation. We then design the Conversational UCB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Recommender Systems and Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
