Interactively Learning Preference Constraints in Linear Bandits

David Lindner; Sebastian Tschiatschek; Katja Hofmann; Andreas; Krause

arXiv:2206.05255·cs.LG·June 13, 2022·1 cites

Interactively Learning Preference Constraints in Linear Bandits

David Lindner, Sebastian Tschiatschek, Katja Hofmann, Andreas, Krause

PDF

Open Access 1 Repo

TL;DR

This paper introduces ACOL, an algorithm for efficiently learning human preference constraints in decision-making tasks modeled as constrained linear bandits, with theoretical guarantees and practical benefits.

Contribution

The paper formalizes constrained linear bandits, proposes the ACOL algorithm with optimal sample complexity, and demonstrates its effectiveness in synthetic and real driving preference scenarios.

Findings

01

ACOL matches the lower bound on sample complexity in worst-case scenarios.

02

ACOL outperforms baseline methods in synthetic experiments.

03

Learning constraints is more robust than directly encoding preferences in rewards.

Abstract

We study sequential decision-making with known rewards and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. To solve this problem, we propose the Adaptive Constraint Learning (ACOL) algorithm. We provide an instance-dependent lower bound for constrained linear best-arm identification and show that ACOL's sample complexity matches the lower bound in the worst-case. In the average case, ACOL's sample complexity bound is still significantly tighter than bounds of simpler approaches. In synthetic experiments, ACOL performs on par with an oracle solution and outperforms a range of baselines. As an application, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lasgroup/adaptive-constraint-learning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research · Receptor Mechanisms and Signaling