Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback
Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

TL;DR
This paper introduces a new algorithm for Interaction-Grounded Learning in sequential decision-making, enabling personalized feedback handling in complex MDPs with provable guarantees and practical effectiveness.
Contribution
It extends IGL to multi-step MDPs with personalized feedback, providing a computationally efficient algorithm with regret guarantees for realistic sequential tasks.
Findings
The proposed algorithm achieves sublinear regret in episodic MDPs.
Effective in learning personalized objectives from multi-turn interactions.
Validated on synthetic and real-world datasets.
Abstract
In this paper, we study Interaction-Grounded Learning (IGL) [Xie et al., 2021], a paradigm designed for realistic scenarios where the learner receives indirect feedback generated by an unknown mechanism, rather than explicit numerical rewards. While prior work on IGL provides efficient algorithms with provable guarantees, those results are confined to single-step settings, restricting their applicability to modern sequential decision-making systems such as multi-turn Large Language Model (LLM) deployments. To bridge this gap, we propose a computationally efficient algorithm that achieves a sublinear regret guarantee for contextual episodic Markov Decision Processes (MDPs) with personalized feedback. Technically, we extend the reward-estimator construction of Zhang et al. [2024a] from the single-step to the multi-step setting, addressing the unique challenges of decoding latent rewards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Domain Adaptation and Few-Shot Learning
