Segmenting Action-Value Functions Over Time-Scales in SARSA via TD($\Delta$)
Mahammad Humayoo

TL;DR
This paper introduces SARSA(Δ), a novel method that decomposes action-value functions over multiple time scales to improve learning efficiency and bias-variance trade-off in reinforcement learning, especially for long-horizon tasks.
Contribution
The paper extends TD(Δ) to SARSA, creating SARSA(Δ), which enhances learning across different time scales and reduces bias compared to traditional SARSA algorithms.
Findings
SARSA(Δ) reduces bias in value updates.
Speeds up convergence in deterministic and stochastic environments.
Outperforms existing TD methods in benchmark tests.
Abstract
In numerous episodic reinforcement learning (RL) environments, SARSA-based methodologies are employed to enhance policies aimed at maximizing returns over long horizons. Traditional SARSA algorithms face challenges in achieving an optimal balance between bias and variation, primarily due to their dependence on a single, constant discount factor (). This investigation enhances the temporal difference decomposition method, TD(), by applying it to the SARSA algorithm, now designated as SARSA(). SARSA is a widely used on-policy RL method that enhances action-value functions via temporal difference updates. By splitting the action-value function down into components that are linked to specific discount factors, SARSA() makes learning easier across a range of time scales. This analysis makes learning more effective and ensures consistency, particularly in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsSarsa
