UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations
Huy Hoang, Tien Mai, Pradeep Varakantham

TL;DR
This paper introduces UNIQ, a novel offline inverse Q-learning method designed to learn policies that avoid undesirable behaviors by maximizing the distance from undesirable demonstrations, effectively leveraging unlabeled data.
Contribution
The paper proposes a new inverse Q-learning framework for avoiding undesirable demonstrations, with a novel training objective and algorithm that outperform existing methods.
Findings
Outperforms state-of-the-art baselines on benchmark environments
Effectively leverages unlabeled data for training
Provides a new approach to avoid undesirable behaviors in offline learning
Abstract
We address the problem of offline learning a policy that avoids undesirable demonstrations. Unlike conventional offline imitation learning approaches that aim to imitate expert or near-optimal demonstrations, our setting involves avoiding undesirable behavior (specified using undesirable demonstrations). To tackle this problem, unlike standard imitation learning where the aim is to minimize the distance between learning policy and expert demonstrations, we formulate the learning task as maximizing a statistical distance, in the space of state-action stationary distributions, between the learning policy and the undesirable policy. This significantly different approach results in a novel training objective that necessitates a new algorithm to address it. Our algorithm, UNIQ, tackles these challenges by building on the inverse Q-learning framework, framing the learning problem as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsQ-Learning
