Online Nonstochastic Model-Free Reinforcement Learning
Udaya Ghai, Arushi Gupta, Wenhan Xia, Karan Singh, Elad Hazan

TL;DR
This paper introduces disturbance-based policies for robust, model-free reinforcement learning in dynamic or adversarial environments, providing algorithms with provable regret guarantees and demonstrating improved robustness on benchmarks.
Contribution
It proposes a novel class of disturbance-centered policies, along with efficient algorithms and regret guarantees that outperform previous methods in linear dynamical systems.
Findings
Algorithms achieve provable regret bounds with no dependence on state dimension.
Methods improve robustness in adversarial and dynamic environments.
Experimental results show enhanced performance on standard RL benchmarks.
Abstract
We investigate robust model-free reinforcement learning algorithms designed for environments that may be dynamic or even adversarial. Traditional state-based policies often struggle to accommodate the challenges imposed by the presence of unmodeled disturbances in such settings. Moreover, optimizing linear state-based policies pose an obstacle for efficient optimization, leading to nonconvex objectives, even in benign environments like linear dynamical systems. Drawing inspiration from recent advancements in model-based control, we introduce a novel class of policies centered on disturbance signals. We define several categories of these signals, which we term pseudo-disturbances, and develop corresponding policy classes based on them. We provide efficient and practical algorithms for optimizing these policies. Next, we examine the task of online adaptation of reinforcement learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Receptor Mechanisms and Signaling · Reinforcement Learning in Robotics
