Provably Efficient Interactive-Grounded Learning with Personalized   Reward

Mengxiao Zhang; Yuheng Zhang; Haipeng Luo; Paul Mineiro

arXiv:2405.20677·cs.LG·June 3, 2024

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

PDF

Open Access 1 Video

TL;DR

This paper introduces provably efficient algorithms for Interactive-Grounded Learning with personalized, context-dependent rewards, featuring a novel Lipschitz reward estimator that ensures sublinear regret and improved generalization.

Contribution

It provides the first theoretical guarantees for personalized IGL algorithms using a new Lipschitz reward estimator and explores practical applications with image and text feedback.

Findings

01

The Lipschitz reward estimator outperforms previous step-function estimators.

02

Algorithms achieve sublinear regret under realizability assumptions.

03

Experimental results confirm the effectiveness of the proposed methods.

Abstract

Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with context-dependent feedback, but their algorithm does not come with theoretical guarantees. In this work, we consider the same problem and provide the first provably efficient algorithms with sublinear regret under realizability. Our analysis reveals that the step-function estimator of prior work can deviate uncontrollably due to finite-sample effects. Our solution is a novel Lipschitz reward estimator which underestimates the true reward and enjoys favorable generalization performances. Building on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably Efficient Interactive-Grounded Learning with Personalized Reward· slideslive

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Domain Adaptation and Few-Shot Learning · Online Learning and Analytics