Predicting Periodicity with Temporal Difference Learning
Kristopher De Asis, Brendan Bennett, Richard S. Sutton

TL;DR
This paper introduces a novel approach to temporal difference learning by incorporating complex-valued discount rates, enabling online estimation of the DFT and improving the detection of periodic patterns in reinforcement learning environments.
Contribution
It presents a new perspective on discounting in TD learning using complex numbers, extending value functions to capture periodicity in signals.
Findings
Complex discounting enables DFT estimation with TD learning.
Value functions can now represent periodic effects.
Method improves detection of periodic patterns in reward sequences.
Abstract
Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning. A key idea of TD learning is that it is learning predictive knowledge about the environment in the form of value functions, from which it can derive its behavior to address long-term sequential decision making problems. The agent's horizon of interest, that is, how immediate or long-term a TD learning agent predicts into the future, is adjusted through a discount rate parameter. In this paper, we introduce an alternative view on the discount rate, with insight from digital signal processing, to include complex-valued discounting. Our results show that setting the discount rate to appropriately chosen complex numbers allows for online and incremental estimation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics · Sports Analytics and Performance
