Blending Controllers via Multi-Objective Bandits

Parham Gohari; Franck Djeumou; Abraham P. Vinod; Ufuk Topcu

arXiv:2007.15755·eess.SY·August 3, 2020·1 cites

Blending Controllers via Multi-Objective Bandits

Parham Gohari, Franck Djeumou, Abraham P. Vinod, Ufuk Topcu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-objective bandit algorithm to blend safe and performant controllers, achieving a balance that improves overall safety and performance in sequential decision-making tasks.

Contribution

It proposes a novel blending algorithm using contextual multi-armed bandits that balances safety and performance without additional computational complexity.

Findings

01

The blended controller outperforms the safe controller in total reward.

02

The blended controller is safer than the performant controller.

03

The algorithm achieves sublinear Pareto regret.

Abstract

Safety and performance are often two competing objectives in sequential decision-making problems. Existing performant controllers, such as controllers derived from reinforcement learning algorithms, often fall short of safety guarantees. On the contrary, controllers that guarantee safety, such as those derived from classical control theory, require restrictive assumptions and are often conservative in performance. Our goal is to blend a performant and a safe controller to generate a single controller that is safer than the performant and accumulates higher rewards than the safe controller. To this end, we propose a blending algorithm using the framework of contextual multi-armed multi-objective bandits. At each stage, the algorithm observes the environment's current context alongside an immediate reward and cost, which is the underlying safety measure. The algorithm then decides which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

parhamgohari/blending-controllers
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)