Reinforcement Learning by Comparing Immediate Reward
Punit Pandey, Deepshikha Pandey, Shishir Kumar

TL;DR
This paper proposes a modified Q-Learning algorithm that compares immediate rewards to improve learning efficiency and reduce episodes needed to reach optimal Q-values in reinforcement learning tasks.
Contribution
It introduces a relative reward-based Q-Learning method that enhances performance by selecting actions with higher immediate rewards, reducing training episodes.
Findings
Faster convergence to optimal Q-values in grid world simulations
Improved performance over standard Q-Learning in deterministic environments
Reduced episodes required for learning optimal policies
Abstract
This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate rewards using a variation of Q-Learning algorithm. Unlike the conventional Q-Learning, the proposed algorithm compares current reward with immediate reward of past move and work accordingly. Relative reward based Q-learning is an approach towards interactive learning. Q-Learning is a model free reinforcement learning method that used to learn the agents. It is observed that under normal circumstances algorithm take more episodes to reach optimal Q-value due to its normal reward or sometime negative reward. In this new form of algorithm agents select only those actions which have a higher immediate reward signal in comparison to previous one. The contribution of this article is the presentation of new Q-Learning Algorithm in order to maximize the performance of algorithm and reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Evolutionary Algorithms and Applications
