Knowledge Infused Policy Gradients with Upper Confidence Bound for   Relational Bandits

Kaushik Roy; Qi Zhang; Manas Gaur; Amit Sheth

arXiv:2106.13895·cs.LG·June 29, 2021

Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits

Kaushik Roy, Qi Zhang, Manas Gaur, Amit Sheth

PDF

TL;DR

This paper introduces a novel approach combining knowledge infusion with policy gradients and upper confidence bounds to improve exploration efficiency in relational contextual bandits, especially in complex, real-world scenarios.

Contribution

It proposes a new Knowledge Infused Policy Gradients algorithm with UCB for relational bandits, enabling better exploration using expert knowledge in complex contexts.

Findings

01

Expert knowledge reduces total regret in real datasets.

02

The method outperforms traditional bandit algorithms in relational settings.

03

Relational context modeling improves recommendation accuracy.

Abstract

Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc. However, most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context. For example, in a music recommendation system, the user context contains what music they listen to, which artists create this music, the artist albums, etc. Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder. To improve the efficiency of exploration-exploitation knowledge about the context can be infused to guide the exploration-exploitation strategy. Relational context representations allow a natural way for humans to specify knowledge owing to their descriptive nature. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.