Swift-Sarsa: Fast and Robust Linear Control

Khurram Javed; Richard S. Sutton

arXiv:2507.19539·cs.LG·July 29, 2025

Swift-Sarsa: Fast and Robust Linear Control

Khurram Javed, Richard S. Sutton

PDF

TL;DR

Swift-Sarsa is a new on-policy reinforcement learning algorithm that extends SwiftTD for control tasks, demonstrating robustness and effectiveness in a challenging noisy environment with minimal prior knowledge.

Contribution

It introduces Swift-Sarsa, combining SwiftTD's step-size optimization with True Online Sarsa($ extlambda$), and proposes a novel operant conditioning benchmark for linear control.

Findings

01

Swift-Sarsa effectively learns to identify relevant signals amidst noise.

02

The algorithm demonstrates robustness to hyper-parameter choices.

03

It enables learning representations from large feature sets without performance loss.

Abstract

Javed, Sharifnassab, and Sutton (2024) introduced a new algorithm for TD learning -- SwiftTD -- that augments True Online TD( $λ$ ) with step-size optimization, a bound on the effective learning rate, and step-size decay. In their experiments SwiftTD outperformed True Online TD( $λ$ ) and TD( $λ$ ) on a variety of prediction tasks derived from Atari games, and its performance was robust to the choice of hyper-parameters. In this extended abstract we extend SwiftTD to work for control problems. We combine the key ideas behind SwiftTD with True Online Sarsa( $λ$ ) to develop an on-policy reinforcement learning algorithm called $Swift-Sarsa$ . We propose a simple benchmark for linear on-policy control called the $operant conditioning benchmark$ . The key challenge in the operant conditioning benchmark is that a very small subset of input signals are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.