Provably Efficient Interactive-Grounded Learning with Personalized Reward
Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

TL;DR
This paper introduces provably efficient algorithms for Interactive-Grounded Learning with personalized, context-dependent rewards, featuring a novel Lipschitz reward estimator that ensures sublinear regret and improved generalization.
Contribution
It provides the first theoretical guarantees for personalized IGL algorithms using a new Lipschitz reward estimator and explores practical applications with image and text feedback.
Findings
The Lipschitz reward estimator outperforms previous step-function estimators.
Algorithms achieve sublinear regret under realizability assumptions.
Experimental results confirm the effectiveness of the proposed methods.
Abstract
Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with context-dependent feedback, but their algorithm does not come with theoretical guarantees. In this work, we consider the same problem and provide the first provably efficient algorithms with sublinear regret under realizability. Our analysis reveals that the step-function estimator of prior work can deviate uncontrollably due to finite-sample effects. Our solution is a novel Lipschitz reward estimator which underestimates the true reward and enjoys favorable generalization performances. Building on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Domain Adaptation and Few-Shot Learning · Online Learning and Analytics
