Interactively Learning Preference Constraints in Linear Bandits
David Lindner, Sebastian Tschiatschek, Katja Hofmann, Andreas, Krause

TL;DR
This paper introduces ACOL, an algorithm for efficiently learning human preference constraints in decision-making tasks modeled as constrained linear bandits, with theoretical guarantees and practical benefits.
Contribution
The paper formalizes constrained linear bandits, proposes the ACOL algorithm with optimal sample complexity, and demonstrates its effectiveness in synthetic and real driving preference scenarios.
Findings
ACOL matches the lower bound on sample complexity in worst-case scenarios.
ACOL outperforms baseline methods in synthetic experiments.
Learning constraints is more robust than directly encoding preferences in rewards.
Abstract
We study sequential decision-making with known rewards and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. To solve this problem, we propose the Adaptive Constraint Learning (ACOL) algorithm. We provide an instance-dependent lower bound for constrained linear best-arm identification and show that ACOL's sample complexity matches the lower bound in the worst-case. In the average case, ACOL's sample complexity bound is still significantly tighter than bounds of simpler approaches. In synthetic experiments, ACOL performs on par with an oracle solution and outperforms a range of baselines. As an application, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research · Receptor Mechanisms and Signaling
