Towards Learning Reward Functions from User Interactions
Ziming Li, Julia Kiseleva, Maarten de Rijke, Artem Grotov

TL;DR
This paper introduces a novel method to learn individual user reward functions from interaction data using inverse reinforcement learning, enabling personalized evaluation and system optimization.
Contribution
It presents a dynamic approach to uncover user reward functions directly from interaction logs, incorporating user features and demonstrating feasibility with real-world data.
Findings
Successfully uncovered different reward functions for diverse user groups.
Demonstrated the approach's feasibility with interaction logs from a cultural heritage institution.
Highlighted the importance of modeling user rewards for personalized system evaluation.
Abstract
In the physical world, people have dynamic preferences, e.g., the same situation can lead to satisfaction for some humans and to frustration for others. Personalization is called for. The same observation holds for online behavior with interactive systems. It is natural to represent the behavior of users who are engaging with interactive systems such as a search engine or a recommender system, as a sequence of actions where each next action depends on the current situation and the user reward of taking a particular action. By and large, current online evaluation metrics for interactive systems such as search engines or recommender systems, are static and do not reflect differences in user behavior. They rarely capture or model the reward experienced by a user while interacting with an interactive system. We argue that knowing a user's reward function is essential for an interactive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
