Conversational Contextual Bandit: Algorithm and Application

Xiaoying Zhang; Hong Xie; Hang Li; John C.S. Lui

arXiv:1906.01219·cs.LG·January 28, 2020·5 cites

Conversational Contextual Bandit: Algorithm and Application

Xiaoying Zhang, Hong Xie, Hang Li, John C.S. Lui

PDF

Open Access

TL;DR

This paper introduces the conversational contextual bandit framework and the ConUCB algorithm, which accelerates learning by incorporating conversational feedbacks, outperforming traditional methods in speed and efficiency.

Contribution

It generalizes traditional contextual bandits to include conversational feedbacks and proposes ConUCB, a novel algorithm with proven faster learning capabilities.

Findings

01

ConUCB achieves lower regret bounds than LinUCB.

02

Experiments show ConUCB outperforms traditional algorithms on real datasets.

03

Incorporating conversational feedback accelerates learning in recommender systems.

Abstract

Contextual bandit algorithms provide principled online learning solutions to balance the exploitation-exploration trade-off in various applications such as recommender systems. However, the learning speed of the traditional contextual bandit algorithms is often slow due to the need for extensive exploration. This poses a critical issue in applications like recommender systems, since users may need to provide feedbacks on a lot of uninterested items. To accelerate the learning speed, we generalize contextual bandit to conversational contextual bandit. Conversational contextual bandit leverages not only behavioral feedbacks on arms (e.g., articles in news recommendation), but also occasional conversational feedbacks on key-terms from the user. Here, a key-term can relate to a subset of arms, for example, a category of articles in news recommendation. We then design the Conversational UCB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Recommender Systems and Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings