Blending Controllers via Multi-Objective Bandits
Parham Gohari, Franck Djeumou, Abraham P. Vinod, Ufuk Topcu

TL;DR
This paper introduces a multi-objective bandit algorithm to blend safe and performant controllers, achieving a balance that improves overall safety and performance in sequential decision-making tasks.
Contribution
It proposes a novel blending algorithm using contextual multi-armed bandits that balances safety and performance without additional computational complexity.
Findings
The blended controller outperforms the safe controller in total reward.
The blended controller is safer than the performant controller.
The algorithm achieves sublinear Pareto regret.
Abstract
Safety and performance are often two competing objectives in sequential decision-making problems. Existing performant controllers, such as controllers derived from reinforcement learning algorithms, often fall short of safety guarantees. On the contrary, controllers that guarantee safety, such as those derived from classical control theory, require restrictive assumptions and are often conservative in performance. Our goal is to blend a performant and a safe controller to generate a single controller that is safer than the performant and accumulates higher rewards than the safe controller. To this end, we propose a blending algorithm using the framework of contextual multi-armed multi-objective bandits. At each stage, the algorithm observes the environment's current context alongside an immediate reward and cost, which is the underlying safety measure. The algorithm then decides which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
