Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment   Framework

Taisuke Kobayashi

arXiv:2106.01516·cs.LG·June 4, 2021

Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

Taisuke Kobayashi

PDF

Open Access

TL;DR

This paper introduces a novel reinforcement learning approach using hyperbolic discounting combined with reward-punishment frameworks, leading to improved policy learning that reflects more realistic discounting behaviors observed in animals.

Contribution

It presents a new hyperbolic discounting scheme integrated with a recursive temporal difference error for reinforcement learning, which outperforms standard methods.

Findings

01

Outperforms standard reinforcement learning in simulations

02

Discount factors for reward and punishment differ, resembling animal behavior

03

Performance depends on reward and punishment design

Abstract

This paper proposes a new reinforcement learning with hyperbolic discounting. Combining a new temporal difference error with the hyperbolic discounting in recursive manner and reward-punishment framework, a new scheme to learn the optimal policy is derived. In simulations, it is found that the proposal outperforms the standard reinforcement learning, although the performance depends on the design of reward and punishment. In addition, the averages of discount factors w.r.t. reward and punishment are different from each other, like a sign effect in animal behaviors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Auction Theory and Applications