UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations

Huy Hoang; Tien Mai; Pradeep Varakantham

arXiv:2410.08307·cs.LG·October 14, 2024

UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations

Huy Hoang, Tien Mai, Pradeep Varakantham

PDF

Open Access

TL;DR

This paper introduces UNIQ, a novel offline inverse Q-learning method designed to learn policies that avoid undesirable behaviors by maximizing the distance from undesirable demonstrations, effectively leveraging unlabeled data.

Contribution

The paper proposes a new inverse Q-learning framework for avoiding undesirable demonstrations, with a novel training objective and algorithm that outperform existing methods.

Findings

01

Outperforms state-of-the-art baselines on benchmark environments

02

Effectively leverages unlabeled data for training

03

Provides a new approach to avoid undesirable behaviors in offline learning

Abstract

We address the problem of offline learning a policy that avoids undesirable demonstrations. Unlike conventional offline imitation learning approaches that aim to imitate expert or near-optimal demonstrations, our setting involves avoiding undesirable behavior (specified using undesirable demonstrations). To tackle this problem, unlike standard imitation learning where the aim is to minimize the distance between learning policy and expert demonstrations, we formulate the learning task as maximizing a statistical distance, in the space of state-action stationary distributions, between the learning policy and the undesirable policy. This significantly different approach results in a novel training objective that necessitates a new algorithm to address it. Our algorithm, UNIQ, tackles these challenges by building on the inverse Q-learning framework, framing the learning problem as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications

MethodsQ-Learning