Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback

Mengxiao Zhang; Yuheng Zhang; Haipeng Luo; Paul Mineiro

arXiv:2602.08307·cs.LG·February 10, 2026

Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback

Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

PDF

Open Access

TL;DR

This paper introduces a new algorithm for Interaction-Grounded Learning in sequential decision-making, enabling personalized feedback handling in complex MDPs with provable guarantees and practical effectiveness.

Contribution

It extends IGL to multi-step MDPs with personalized feedback, providing a computationally efficient algorithm with regret guarantees for realistic sequential tasks.

Findings

01

The proposed algorithm achieves sublinear regret in episodic MDPs.

02

Effective in learning personalized objectives from multi-turn interactions.

03

Validated on synthetic and real-world datasets.

Abstract

In this paper, we study Interaction-Grounded Learning (IGL) [Xie et al., 2021], a paradigm designed for realistic scenarios where the learner receives indirect feedback generated by an unknown mechanism, rather than explicit numerical rewards. While prior work on IGL provides efficient algorithms with provable guarantees, those results are confined to single-step settings, restricting their applicability to modern sequential decision-making systems such as multi-turn Large Language Model (LLM) deployments. To bridge this gap, we propose a computationally efficient algorithm that achieves a sublinear regret guarantee for contextual episodic Markov Decision Processes (MDPs) with personalized feedback. Technically, we extend the reward-estimator construction of Zhang et al. [2024a] from the single-step to the multi-step setting, addressing the unique challenges of decoding latent rewards…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Topic Modeling · Domain Adaptation and Few-Shot Learning