Learning Utilities from Demonstrations in Markov Decision Processes
Filippo Lazzati, Alberto Maria Metelli

TL;DR
This paper introduces a new approach to learning the risk attitudes of agents from demonstrations in Markov Decision Processes by modeling behavior with utility functions, addressing limitations of traditional IRL.
Contribution
It proposes a novel utility-based model for behavior in MDPs, defines the utility learning problem, and develops efficient algorithms with theoretical analysis and empirical validation.
Findings
Successfully infers agents' risk attitudes from demonstrations.
Provides provably efficient algorithms with sample complexity analysis.
Empirically validates the utility learning framework.
Abstract
Our goal is to extract useful knowledge from demonstrations of behavior in sequential decision-making problems. Although it is well-known that humans commonly engage in risk-sensitive behaviors in the presence of stochasticity, most Inverse Reinforcement Learning (IRL) models assume a risk-neutral agent. Beyond introducing model misspecification, these models do not directly capture the risk attitude of the observed agent, which can be crucial in many applications. In this paper, we propose a novel model of behavior in Markov Decision Processes (MDPs) that explicitly represents the agent's risk attitude through a utility function. We then define the Utility Learning (UL) problem as the task of inferring the observed agent's risk attitude, encoded via a utility function, from demonstrations in MDPs, and we analyze the partial identifiability of the agent's utility. Furthermore, we devise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Stream Mining Techniques · Bayesian Modeling and Causal Inference
