TL;DR
This paper introduces ConDuel, a novel conversational dueling bandit algorithm for recommendation systems that incorporates relative feedback and generalized linear models, addressing limitations of existing methods.
Contribution
It proposes a new algorithm integrating dueling bandits with GLMs for better handling of relative feedback in conversational recommendations.
Findings
ConDuel achieves low regret bounds theoretically.
Empirical results show ConDuel outperforms existing methods.
Extension to multinomial logit bandits is feasible and effective.
Abstract
Conversational recommendation systems elicit user preferences by interacting with users to obtain their feedback on recommended commodities. Such systems utilize a multi-armed bandit framework to learn user preferences in an online manner and have received great success in recent years. However, existing conversational bandit methods have several limitations. First, they only enable users to provide explicit binary feedback on the recommended items or categories, leading to ambiguity in interpretation. In practice, users are usually faced with more than one choice. Relative feedback, known for its informativeness, has gained increasing popularity in recommendation system design. Moreover, current contextual bandit methods mainly work under linear reward assumptions, ignoring practical non-linear reward structures in generalized linear models. Therefore, in this paper, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
