Reinforcement Learning by Comparing Immediate Reward

Punit Pandey; Deepshikha Pandey; Shishir Kumar

arXiv:1009.2566·cs.LG·September 15, 2010·5 cites

Reinforcement Learning by Comparing Immediate Reward

Punit Pandey, Deepshikha Pandey, Shishir Kumar

PDF

Open Access

TL;DR

This paper proposes a modified Q-Learning algorithm that compares immediate rewards to improve learning efficiency and reduce episodes needed to reach optimal Q-values in reinforcement learning tasks.

Contribution

It introduces a relative reward-based Q-Learning method that enhances performance by selecting actions with higher immediate rewards, reducing training episodes.

Findings

01

Faster convergence to optimal Q-values in grid world simulations

02

Improved performance over standard Q-Learning in deterministic environments

03

Reduced episodes required for learning optimal policies

Abstract

This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate rewards using a variation of Q-Learning algorithm. Unlike the conventional Q-Learning, the proposed algorithm compares current reward with immediate reward of past move and work accordingly. Relative reward based Q-learning is an approach towards interactive learning. Q-Learning is a model free reinforcement learning method that used to learn the agents. It is observed that under normal circumstances algorithm take more episodes to reach optimal Q-value due to its normal reward or sometime negative reward. In this new form of algorithm agents select only those actions which have a higher immediate reward signal in comparison to previous one. The contribution of this article is the presentation of new Q-Learning Algorithm in order to maximize the performance of algorithm and reduce the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Evolutionary Algorithms and Applications