Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation
Samuel Ainsworth, Matt Barnes, Siddhartha Srinivasa

TL;DR
This paper introduces emergency stop mechanisms (e-stops) that leverage small relevant state subsets to improve reinforcement learning efficiency, reducing exploration needs and accelerating learning with minimal performance loss.
Contribution
The paper proposes a novel e-stop technique that enhances sample efficiency and convergence speed in RL by exploiting state space structure, with theoretical analysis and empirical validation.
Findings
E-stops significantly reduce exploration in RL tasks.
Empirical results show order-of-magnitude speedups.
Performance bounds are maintained with small sub-optimality.
Abstract
In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Reinforcement Learning in Robotics
