Reasoning about Counterfactuals to Improve Human Inverse Reinforcement Learning
Michael S. Lee, Henny Admoni, Reid Simmons

TL;DR
This paper introduces a method for robots to select demonstrations that improve human understanding by reasoning about human beliefs and counterfactuals, enhancing collaboration.
Contribution
It proposes incorporating human belief models into inverse reinforcement learning to optimize robot demonstrations for better human comprehension.
Findings
The difficulty measure correlates with human performance and confidence.
Considering human beliefs improves performance on difficult tests.
Counterfactual reasoning can decrease performance on easy tests.
Abstract
To collaborate well with robots, we must be able to understand their decision making. Humans naturally infer other agents' beliefs and desires by reasoning about their observable behavior in a way that resembles inverse reinforcement learning (IRL). Thus, robots can convey their beliefs and desires by providing demonstrations that are informative for a human learner's IRL. An informative demonstration is one that differs strongly from the learner's expectations of what the robot will do given their current understanding of the robot's decision making. However, standard IRL does not model the learner's existing expectations, and thus cannot do this counterfactual reasoning. We propose to incorporate the learner's current understanding of the robot's decision making into our model of human IRL, so that a robot can select demonstrations that maximize the human's understanding. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications
MethodsCounterfactuals Explanations
