Swift-Sarsa: Fast and Robust Linear Control
Khurram Javed, Richard S. Sutton

TL;DR
Swift-Sarsa is a new on-policy reinforcement learning algorithm that extends SwiftTD for control tasks, demonstrating robustness and effectiveness in a challenging noisy environment with minimal prior knowledge.
Contribution
It introduces Swift-Sarsa, combining SwiftTD's step-size optimization with True Online Sarsa($ extlambda$), and proposes a novel operant conditioning benchmark for linear control.
Findings
Swift-Sarsa effectively learns to identify relevant signals amidst noise.
The algorithm demonstrates robustness to hyper-parameter choices.
It enables learning representations from large feature sets without performance loss.
Abstract
Javed, Sharifnassab, and Sutton (2024) introduced a new algorithm for TD learning -- SwiftTD -- that augments True Online TD() with step-size optimization, a bound on the effective learning rate, and step-size decay. In their experiments SwiftTD outperformed True Online TD() and TD() on a variety of prediction tasks derived from Atari games, and its performance was robust to the choice of hyper-parameters. In this extended abstract we extend SwiftTD to work for control problems. We combine the key ideas behind SwiftTD with True Online Sarsa() to develop an on-policy reinforcement learning algorithm called . We propose a simple benchmark for linear on-policy control called the . The key challenge in the operant conditioning benchmark is that a very small subset of input signals are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
