Time-Scale Separation in Q-Learning: Extending TD($\triangle$) for   Action-Value Function Decomposition

Mahammad Humayoo

arXiv:2411.14019·cs.LG·November 22, 2024

Time-Scale Separation in Q-Learning: Extending TD($\triangle$) for Action-Value Function Decomposition

Mahammad Humayoo

PDF

Open Access

TL;DR

This paper introduces Q($ riangle$)-Learning, an extension of TD($ riangle$), which decomposes Q-functions across multiple time scales to improve stability, scalability, and convergence in long-term reinforcement learning tasks.

Contribution

It extends TD($ riangle$) to Q-Learning, enabling efficient multi-scale learning and better handling of long-term rewards in RL.

Findings

01

Q($ riangle$)-Learning outperforms traditional Q-Learning in benchmarks.

02

It achieves faster convergence on shorter time scales.

03

It demonstrates improved stability in deep RL environments.

Abstract

Q-Learning is a fundamental off-policy reinforcement learning (RL) algorithm that has the objective of approximating action-value functions in order to learn optimal policies. Nonetheless, it has difficulties in reconciling bias with variance, particularly in the context of long-term rewards. This paper introduces Q( $Δ$ )-Learning, an extension of TD( $Δ$ ) for the Q-Learning framework. TD( $Δ$ ) facilitates efficient learning over several time scales by breaking the Q( $Δ$ )-function into distinct discount factors. This approach offers improved learning stability and scalability, especially for long-term tasks where discounting bias may impede convergence. Our methodology guarantees that each element of the Q( $Δ$ )-function is acquired individually, facilitating expedited convergence on shorter time scales and enhancing the learning of extended time scales. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsQ-Learning