Learning to Act Safely with Limited Exposure and Almost Sure Certainty
Agustin Castellano, Hancheng Min, Juan Bazerque, Enrique Mallada

TL;DR
This paper introduces methods for learning safe actions in unknown environments with finite exploration, balancing optimality, safety exposure, and detection time, applicable to bandit problems and MDPs.
Contribution
It proposes algorithms that guarantee detection of unsafe actions in finite expected steps, revealing trade-offs between safety, exploration, and learning speed.
Findings
Algorithms detect unsafe actions in finite time
Trade-offs between safety exposure and detection speed
Safety constraints can accelerate learning
Abstract
This paper puts forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials. This is indeed possible, provided that one is willing to navigate trade-offs between optimality, level of exposure to unsafe events, and the maximum detection time of unsafe actions. We illustrate this concept in two complementary settings. We first focus on the canonical multi-armed bandit problem and study the intrinsic trade-offs of learning safety in the presence of uncertainty. Under mild assumptions on sufficient exploration, we provide an algorithm that provably detects all unsafe machines in an (expected) finite number of rounds. The analysis also unveils a trade-off between the number of rounds needed to secure the environment and the probability of discarding safe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Data Stream Mining Techniques
