Weber-Fechner Law in Temporal Difference learning derived from Control as Inference
Keiichiro Takahashi, Taisuke Kobayashi, Tomoya Yamanokuchi, and Takamitsu Matsubara

TL;DR
This paper introduces a nonlinear update rule in reinforcement learning inspired by the Weber-Fechner law, which biases learning in a way that accelerates reward acquisition and suppresses punishments, inspired by biological findings.
Contribution
It derives a novel nonlinear TD error update rule from control as inference, incorporating Weber-Fechner law to improve RL performance and biological plausibility.
Findings
Accelerates reward acquisition in RL tasks
Suppresses punishments effectively during learning
Validated through simulations and robot experiments
Abstract
This paper investigates a novel nonlinear update rule based on temporal difference (TD) errors in reinforcement learning (RL). The update rule in the standard RL states that the TD error is linearly proportional to the degree of updates, treating all rewards equally without no bias. On the other hand, the recent biological studies revealed that there are nonlinearities in the TD error and the degree of updates, biasing policies optimistic or pessimistic. Such biases in learning due to nonlinearities are expected to be useful and intentionally leftover features in biological learning. Therefore, this research explores a theoretical framework that can leverage the nonlinearity between the degree of the update and TD errors. To this end, we focus on a control as inference framework, since it is known as a generalized formulation encompassing various RL and optimal control methods. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsFocus
