Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits
Kaushik Roy, Qi Zhang, Manas Gaur, Amit Sheth

TL;DR
This paper introduces a novel approach combining knowledge infusion with policy gradients and upper confidence bounds to improve exploration efficiency in relational contextual bandits, especially in complex, real-world scenarios.
Contribution
It proposes a new Knowledge Infused Policy Gradients algorithm with UCB for relational bandits, enabling better exploration using expert knowledge in complex contexts.
Findings
Expert knowledge reduces total regret in real datasets.
The method outperforms traditional bandit algorithms in relational settings.
Relational context modeling improves recommendation accuracy.
Abstract
Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc. However, most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context. For example, in a music recommendation system, the user context contains what music they listen to, which artists create this music, the artist albums, etc. Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder. To improve the efficiency of exploration-exploitation knowledge about the context can be infused to guide the exploration-exploitation strategy. Relational context representations allow a natural way for humans to specify knowledge owing to their descriptive nature. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
