Admissible Policy Teaching through Reward Design
Kiarash Banihashem, Adish Singla, Jiarui Gan, Goran Radanovic

TL;DR
This paper explores reward design strategies in reinforcement learning to incentivize agents to adopt admissible policies, addressing computational challenges and proposing approximation methods with practical algorithms.
Contribution
It introduces a novel reward design framework for admissible policy teaching, analyzes its computational complexity, and develops an approximation approach with a local search algorithm.
Findings
Reward design problem is NP-hard to solve optimally.
A surrogate problem formulation enables practical approximation.
The proposed local search algorithm effectively incentivizes admissible policies.
Abstract
We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies. The goal of the reward designer is to modify the underlying reward function cost-efficiently while ensuring that any approximately optimal deterministic policy under the new reward function is admissible and performs well under the original reward function. This problem can be viewed as a dual to the problem of optimal reward poisoning attacks: instead of forcing an agent to adopt a specific policy, the reward designer incentivizes an agent to avoid taking actions that are inadmissible in certain states. Perhaps surprisingly, and in contrast to the problem of optimal reward poisoning attacks, we first show that the reward design problem for admissible policy teaching is computationally challenging, and it is NP-hard to find an approximately optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
